Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
agross
Regular Visitor

tag duplicates

Dear Community,

 

Thank you for adding. I am working for a flight consolidator in Germany and have been using PBI recently. I am very excited about the possibilities. 

 

Now I am a little lost when it comes to the identification of duplicates. I know that there have already been several posts. I have already tried many suggestions of them. However, so far no solution fits to my problem.

 

I have a daily error report that shows faulty booking attempts of flights. Some agencies try to book the same flight again and again when it comes to an error. In my visualizations I want to make use of a slicer to distinguish between first and all booking attempts. So far I solved it in Excel with two additional columns. One was to concantenate all crucial cells to generate an unique ID. And the second one was to identify if there is a repeated ID within the same day. So here I used MATCH and COUNTIF formulas:

 

dupes in excel.jpg

 

As this worked very well I now try to add these calculated columnes also to PBI but I fail to do so. Ok, first one (F) is an easy one to concantenate columns. But second (G) I do not know.

 

What formula I have to use to tag the first and and also the repeated attempts/IDs within 1 day? Is there an additional order necessary or does the 'booking date' take over the role here?

 

I'm very grateful for any feedback.

 

Thank you very much in advance.
Alex

 

1 ACCEPTED SOLUTION
v-shex-msft
Community Support
Community Support

Hi @agross,

 

In power bi data model, it not contain column index and row index, so dax formula won't recognize first record and will mark all similar records as duplicate.

 

In my opinion, you can add index column in query editor, then you can use below formula to loop on concatenated text and mark duplicate records.

Duplicate = 
VAR uniqueCount =
    CALCULATE (
        COUNT ( 'Sample'[CONCATENATE] ),
        FILTER (
            ALL ( 'Sample' ),
            [Index] <= EARLIER ( [Index] )
                && [CONCATENATE] = EARLIER ( 'Sample'[CONCATENATE] )
                && [Booking Date] = EARLIER ( [Booking Date] )
        )
    )
RETURN
    IF ( uniqueCount >= 2, "Yes", "No" )

15.PNG

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

View solution in original post

5 REPLIES 5
v-shex-msft
Community Support
Community Support

Hi @agross,

 

In power bi data model, it not contain column index and row index, so dax formula won't recognize first record and will mark all similar records as duplicate.

 

In my opinion, you can add index column in query editor, then you can use below formula to loop on concatenated text and mark duplicate records.

Duplicate = 
VAR uniqueCount =
    CALCULATE (
        COUNT ( 'Sample'[CONCATENATE] ),
        FILTER (
            ALL ( 'Sample' ),
            [Index] <= EARLIER ( [Index] )
                && [CONCATENATE] = EARLIER ( 'Sample'[CONCATENATE] )
                && [Booking Date] = EARLIER ( [Booking Date] )
        )
    )
RETURN
    IF ( uniqueCount >= 2, "Yes", "No" )

15.PNG

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Hi Xiaoxin Sheng (@v-shex-msft),

 

Great! Thank you very much for your assistance. Your proposal seems to work. This is how I get to know the correct use of the formulas.

 

The problem now I am facing is probably the amount of data because PBI interrupts the calculation saying (in German) "There is not enough memory to perform this operation. Please try again later if more memory may be available." So at the moment I have 230.000 lines. Any suggestions how this could be solved? Or are there other ways to achieve the above goal?

 

Next to that I wonder what happens with the index column if I add new data into the master data XLSX. Does it stay and update itself in PBI?

 

Thanks again.

Best regards

Alex

HI @agross,

 

I'd like to suggest you to use 64 bit desktop to instead 32 bit power bi, it has better performance to handle on huge amount records. 

Differences between 32 & 64 bit versions of Power BI Desktop

 

BTW, I'd like to suggest close unused applications to increase idle memory for calculation.

 

>> Does it stay and update itself in PBI?

Yes, custom index function will re-generate new index after update records.

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Hi @v-shex-msft,

 

Thanks a lot for your help again.

 

I already use the 64 bit desktop version and also closed other applications. My computer has 8 GB RAM. Power BI always stops mentioned calculation for my 250,000 lines.

 

Is there maybe a smarter way to have a slicer that distinguishs between unique and all values for my given data?

 

Thanks for any idea on the above.

 

Best regards

Alex

Hi @agross,

 

I think the issue may related to my formula, it will calcutle through full table so cost huge amount of memory(due to records amount). 


Maybe you can try to use below measure to instead, drag it to table visual to display tag.(please add filters on visual to confirm formula not calculate on all records)

Duplicate =
VAR currIndex =
    MAX ( 'Sample'[Index] )
VAR currID =
    SELECTEDVALUE ( 'Sample'[CONCATENATE] )
VAR currDate =
    MAX ( 'Sample'[Booking Date] )
VAR uniqueCount =
    CALCULATE (
        COUNT ( 'Sample'[CONCATENATE] ),
        FILTER (
            ALL ( 'Sample' ),
            [Index] <= currIndex
                && [CONCATENATE] = currID
                && [Booking Date] = currDate
        )
    )
RETURN
    IF ( uniqueCount >= 2, "Yes", "No" )

Regards,
Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.