Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
IssyB
Frequent Visitor

Remove Duplicates Per Group (keep first) measure help

Hi All

 

Is it possible to create a measure in PBI that remove duplicate values per group, and keeps the first occurrence of the value? Perhaps not a measure but maybe a new table, I'm unsure of how to work around this. For context, the database I'm working with has a funky treatment of some data that has a function back-end but doesn't make sense when trying to visualise user behaviour.

 

Currently, the data looks like:

USERENTRY IDENTRYSTATUSTIME SUBMITTED (HH:mm:ss)
userA1unicornfail10:23:10
userA2forestpass10:30:49
userA1unicornfail10:30:49
userB1unicornfail13:40:22
userB1fairypass13:43:59

 

I want to clean it so it looks like:

USERENTRY IDENTRYSTATUSTIME SUBMITTED (HH:mm:ss)
userA1unicornfail10:23:10
userA2forestpass10:30:49
userB1unicornfail13:40:22
userB1fairypass13:43:59
...............

 

Note the row I want to remove has

  1. duplicated ENTRY from the first instance for userA and
  2. duplicated TIME SUBMITTED from the first instance for userA

Also, the ENTRY ID cannot be used.

Any pointers would be greatly appreciated 🙂 

2 ACCEPTED SOLUTIONS
Pragati11
Super User
Super User

Hi @IssyB ,

 

Check if this existing thread helps:

https://community.powerbi.com/t5/Power-Query/Remove-duplicates-keeping-the-most-recent-row/m-p/75783...

 

Thanks,

Pragati

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

View solution in original post

Anonymous
Not applicable

Here's the M code that does what you want:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKi1OLXJU0lEyBOLSvMzk/KI8ICstMTMHJGhgZWRsZWigFKuDUGkEks8vSi0uATIKEouLIQqNDaxMLFEU4jISVaUTTpXGViZA+40wVALliyqR7AaqM7YyBZoYCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, EntryID = _t, Entry = _t, Status = _t, Time = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"EntryID", Int64.Type}, {"Entry", type text}, {"Status", type text}, {"Time", type time}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "FirstTimeEntry", 
        each List.Min( 
            Table.SelectRows(
                #"Changed Type",
                (r) => r[User] = [User] and r[Entry] = [Entry]
            )[Time] 
        ) = [Time]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FirstTimeEntry] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"FirstTimeEntry"})
in
    #"Removed Columns"

 

Best

D

View solution in original post

5 REPLIES 5
Anonymous
Not applicable

Here's the M code that does what you want:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKi1OLXJU0lEyBOLSvMzk/KI8ICstMTMHJGhgZWRsZWigFKuDUGkEks8vSi0uATIKEouLIQqNDaxMLFEU4jISVaUTTpXGViZA+40wVALliyqR7AaqM7YyBZoYCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, EntryID = _t, Entry = _t, Status = _t, Time = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"EntryID", Int64.Type}, {"Entry", type text}, {"Status", type text}, {"Time", type time}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "FirstTimeEntry", 
        each List.Min( 
            Table.SelectRows(
                #"Changed Type",
                (r) => r[User] = [User] and r[Entry] = [Entry]
            )[Time] 
        ) = [Time]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FirstTimeEntry] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"FirstTimeEntry"})
in
    #"Removed Columns"

 

Best

D

Thanks @Anonymous, had to have a little play around and add in r[entry id] = [entry id] but it's working well now 👍 cheers again

Anonymous
Not applicable

Yeah... That's a perfect job for Power Query. You can do it in DAX as well as a calculated table but it really should be performed in PQ as this is the data-munging tool. I can create some sample M code for you so that you can see how such cleaning is done...

Best
D
Pragati11
Super User
Super User

Hi @IssyB ,

 

Check if this existing thread helps:

https://community.powerbi.com/t5/Power-Query/Remove-duplicates-keeping-the-most-recent-row/m-p/75783...

 

Thanks,

Pragati

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

Thanks @Pragati11 , the buffer got me halfway there! Now it's just removing the wrong duplicate, hopefully the other reply will resolve this 🙂 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors