Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
AliRade
Helper I
Helper I

Power Query Editor - Remove Duplicates (selective column)

Hi Community,

 

I have a dataset where I want to remove 'selective duplicates'. I've tried it by sorting the column order and then removing duplicates (via a Buffer table) but that removes all Duplicates (even though i want to keep some).

 

Example dataset:

 

ITEMIDCustDateLevel
1A200A116/08/20212
1A200A116/07/20211
2A300B115/09/20212
2A300B215/09/20212
2A300B113/07/20211
3A400A118/01/20211
4A500A114/07/20213
4A500B114/07/20213
4A500B214/07/20213
4A500B213/06/20212
4A500B112/06/20212
4A500C114/07/20213
4A500C112/06/20212
4A500A101/04/20211

 

The Output I'm trying to get from the above Dataset is that for EACH "ID", output (or keep) the rows which has the latest "Level" number (even if the latest Level number for a particular "ID" is appearing more than once). So wanting the output table to look like below.

 

ITEMIDCustDateLevel
1A200A116/08/20212
2A300B115/09/20212
2A300B215/09/20212
3A400A118/01/20211
4A500A114/07/20213
4A500B114/07/20213
4A500B214/07/20213
4A500C114/07/20213

 

 

What I've done is a sort (descending) on Colum "Level" and then remove Duplicate Rows Column "ID". But this way removes all duplicates in "ID" even though I would like to keep some (for example for ID=A500, i want to keep all rows that have the latest Level). 

 

Any ideas on how to go about this? I've tried looking into 'grouping' but not getting the output i want.

 

Thanks in advance for any suggestions.

 

1 ACCEPTED SOLUTION

6 REPLIES 6
jennratten
Super User
Super User

Here is how you can accomplish this.  I have added comments in the script below that explains what's going on. 

 

  • Add a column with the date formatted to m/d/yyyy.
  • Group rows by Item, Id, Cust with a new column for the max date and all rows as nested tables.
  • Add a new column that selects the value of the Level column in the nested table, for records whose formatted date equals the max date.

jennratten_0-1631994240244.png

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdBJCsAgDADAv3gWjHGpPVafIf7/GzWBQqylenLJkK1WZZVWFwLQQXcbDSSDgPRA1fQ3OR5imSDFHJPMv8HAOWaRBNeEs7ipkCPiRS/JgB2Jp1gQxIss7k3yBsFN0tuN40RTIfwjZd1LWWfhoftSwIu9tBs=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ITEM = _t, ID = _t, Cust = _t, Date = _t, Level = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ITEM", Int64.Type}, {"ID", type text}, {"Cust", type text}, {"Date", type text}, {"Level", Int64.Type}}),
    // Add a column with the date formatted to m/d/yyyy.
    // You could transform the existing date column instead if desired, 
    // but adding as a new column allows you to retain the existing format in the final table.
    FormattedDate = Table.AddColumn(#"Changed Type", "Formatted Date", each Date.From ( Text.Combine({Text.Middle([Date], 4, 2), Text.Start([Date], 2), Text.Middle([Date], 5)}) ), type date),
    Grouped = Table.Group(FormattedDate, {"ITEM", "ID", "Cust"}, {{"Max Date", each List.Max([Formatted Date]), type date}, {"All", each _, type table [ITEM=nullable number, ID=nullable text, Cust=nullable text, Date=nullable text, Level=nullable number, Formatted Date=date]}}),
    AddLevel = Table.AddColumn ( Grouped, "Level", each 
        // Create variables
        let 
            // max date of each row in the Grouped table
            varMaxDate = [Max Date],
            // Nested table filtered 
            FilterGroupedTables = Table.SelectRows ( [All], each varMaxDate = _[Formatted Date] )
        in 
        // Return the value of the Level column in the nested table.
            FilterGroupedTables[Level]{0}, Int64.Type
    ),
    RemoveHelperColumns = Table.RemoveColumns(AddLevel,{"Max Date", "All"})
in
    RemoveHelperColumns

   

Thanks for the detailed solution @jennratten . This worked perfectly.

 

However, I'm trying to avoid using 'Date' as some of the dates are null, and some of the latest 'Level' have an older date (sorting by max date would then not be an accurate indicator).

 

Ideally, I want to only group by Cust and Level. This will then output all IDs that have the latest 'Level' regardless of taking 'Date' into account. 

 

Example, added Cust "D1"

 

ITEMIDCustDateLevel
1A200A116/08/20212
2A300B115/09/20212
2A300B215/09/20212
3A400A118/01/20211
4A500A114/07/20213
4A500B114/07/20213
4A500B214/07/20213
4A500C114/07/20213
4A600D113/06/20213

wdx223_Daniel_0-1632105017939.png

 

Thanks, worked as desired, and without using the 'date' data. 👍

Jakinta
Solution Sage
Solution Sage

You can try this.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdBJCsAgDADAv3gWjHGpPVafIf7/GzWBQqylenLJkK1WZZVWFwLQQXcbDSSDgPRA1fQ3OR5imSDFHJPMv8HAOWaRBNeEs7ipkCPiRS/JgB2Jp1gQxIss7k3yBsFN0tuN40RTIfwjZd1LWWfhoftSwIu9tBs=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ITEM = _t, ID = _t, Cust = _t, Date = _t, Level = _t]),
    ChangedTypeLocale = Table.TransformColumnTypes(Source, {{"Date", type date}}, "hr-HR"),
    ChangedType = Table.TransformColumnTypes(ChangedTypeLocale,{{"Level", type number}}),
    Grouped = Table.Group(ChangedType, {"ID", "Cust"}, {{"Gr", each Table.FirstN(Table.Sort(_, {{"Date", Order.Descending}, {"Level", Order.Descending}}), 1), type table [ITEM=nullable text, ID=nullable text, Cust=nullable text, Date=nullable date, Level=nullable number]}}),
    Expanded = Table.ExpandTableColumn(Grouped, "Gr", {"Date", "Level"})
in
    Expanded

Thanks Jakinta 🙂

This gives me the desired output.

 

However I'm wanting to avoid using 'Date' as one of the sorting fields (because some of the dates are null or older dates for latest level).

 

Will try to modify this so that only grouping by Cust and Level

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors