Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
hansei
Helper V
Helper V

Distinct based on 2nd column

I'm having an issue trying to group the following requirement.

 

SourceFKNoteCreated
a1abc14-Mar
a1abc14-Mar
b1abc14-Mar
b1xyz14-Mar
c1abc14-Mar
c1klm14-Mar

 

I have data coming from various sources (.csv files), and records are not necessarily unique. So in the above table, the 1st and 2nd record are absolutely separate, even though they contain the same data because they were retrieved from the same source "a". The 3rd and 4th row are from last week's source "b" which may contain the same data as this week, and may contain data which was deleted.

 

Above, the 3rd row looks identical to either the 1st or the 2nd row - so a duplicate, and the 4th row is missing from the new source - so a deletion. The 5th row is also a duplicate of either the 1st or 2nd row, while the 6th is, again a unique record.

 

I have no reason to keep duplicate data, but do want to retain what appear to be duplicates from the same source and any new data. So, how would i keep the 1st, 2nd, 4th and 6th row?

3 REPLIES 3
hansei
Helper V
Helper V

Well, with that dearth of replies, I've decided to do the following

  • sort by source to keep most recent at top (and buffer)
  • remove duplicates
  • remove most recent source
  • combine with most recent source

Hi @hansei ,

 

First create an index column for later use to get the latest source:

Untitled picture.png

Next, create a column filter. Its value judgment logic is that when the value of Note is duplicated and the value of Source is different from the latest source, it returns 1; otherwise, it returns 0. Finally create a calculated table to filter the table with filter equal to 0:

 

 

 

Table 2 =

VAR f =

    ADDCOLUMNS (

        'Table',

        "filter",

        VAR a = 'Table'[Index]

        VAR b =

            CALCULATETABLE (

                DISTINCT ( 'Table'[Note] ),

                FILTER ( 'Table', 'Table'[Index] < a )

            )

        VAR c =

            CALCULATE ( MAX ( 'Table'[Source] ), FILTER ( 'Table', 'Table'[Index] = 0 ) )

        RETURN

            IF ( 'Table'[Note] IN b && 'Table'[Source] <> c, 1, 0 )

    )

RETURN

    FILTER ( f, [filter] = 0 )

 

 

Untitled picture1.png

 

Please refer to the pbix file: https://qiuyunus-my.sharepoint.com/:u:/g/personal/pbipro_qiuyunus_onmicrosoft_com/ESq2wtkC4XFMhZcUOn...

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Best Regards,

Dedmon Dai

I cannot have a static solution based on a,b,c. There may be hundreds of sources.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.