cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Concat Member
Member

Selectively remove duplicate rows?

In the query editor, there is a function which allows a developer to remove duplicate rows based on the selected columns.

 

I want to add an additional condition to this.  That is, I want to choose which row to remove based on the value in an unrelated columns.

 

For example, in the below table, assume I want to remove the duplicates according to Columns "ID" and "Date".  However, I want to prioritize the removal of rows where ID = B.  I don't want the removal to be arbitrary.

 

IDTypeDate
1B1/1/2018
1A1/1/2018
2A1/1/2018
3A1/1/2018
4A1/1/2018
4A1/1/2018
5A1/1/2018
5B1/1/2018

 

M Function "Table.Distinct" - 

Table.Distinct(table as table, optional equationCriteria as any)


It seems like I should be able to work it into the "optional equation criteria", but I'm just sort of guessing and coming up short.  Some sort of if statement...?

1 ACCEPTED SOLUTION

Accepted Solutions
jthomson New Contributor
New Contributor

Re: Selectively remove duplicate rows?

Power Query, if I'm reading my own pbix right where I have a similar situation (have appended a new report to an old one, and if there's a matching case number I want to keep the data from the new report), removes duplicates from the bottom up - if you make a custom column that gives anything that's type A the value 1, and type B the value 2, then sort ascending on the new column, this should force it to remove the rows you want

View solution in original post

8 REPLIES 8
jthomson New Contributor
New Contributor

Re: Selectively remove duplicate rows?

Power Query, if I'm reading my own pbix right where I have a similar situation (have appended a new report to an old one, and if there's a matching case number I want to keep the data from the new report), removes duplicates from the bottom up - if you make a custom column that gives anything that's type A the value 1, and type B the value 2, then sort ascending on the new column, this should force it to remove the rows you want

View solution in original post

Concat Member
Member

Re: Selectively remove duplicate rows?

Interesting workaround.  That should work...

Another question - is there a way to view the rows that are removed?  My data set is almost a million rows.  Applying something like this without directly viewing what was changed is a pain.

v-huizhn-msft Super Contributor
Super Contributor

Re: Selectively remove duplicate rows?

Hi @Concat,

After research and test, there is a way to view the rows moved using Query Statement.

You create a copy table of your resource table, remove the duplicate rows in your original table. Then merge the original table( have removed the duplicate rows) to the copy table using inner join. Then you can filtered the merged table to get all removed rows.

For example, my Sample table is named 'Table1', I create a copy table 'Copy_Table1'. Remove the duplicate rows in Table1 based on [Area], then merge it to Copy_Table1. Filter Copy_Table1 based on [Copy_Table1.Record Number]=[Table1.Record Number], eventually delete the unnecessay columns, you will get the expected result. The following is my statement.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdO7EoJADIXhV2G2piBZdoFSe8dCO4bCC63MqBS+vZBjg01OkSKTr/snfR8klOE0P+6XT1kcb+/pOj4LSWWhlTTLabeMaBjKPigho8lIyNpkTchkMhEym8yEbEyu62H6k3krW5MtITuTnS+1MikVQQV0rXSex9fWNluLTKKMRSiJjEUqcVrtV4pW4sQyilji1DKKWrZ7FLmkJSh6SefTiGDqBDOKYCoE/b2VEhS5NBIUtbR2yppFLk2MRS/NjF2CDV8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Record Number" = _t, #"Starting Date" = _t, Area = _t, Count = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Record Number", Int64.Type}, {"Starting Date", type date}, {"Area", type text}, {"Count", Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Area"},Table1,{"Area"},"Table1",JoinKind.Inner),
    #"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "Table1", {"Record Number", "Starting Date", "Area", "Count"}, {"Table1.Record Number", "Table1.Starting Date", "Table1.Area", "Table1.Count"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Table1", "Custom", each if [Record Number]=[Table1.Record Number] then"Yes" else "No"),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Custom] = "No"),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Table1.Record Number", "Table1.Starting Date", "Table1.Area", "Table1.Count", "Custom"})
in
    #"Removed Columns"


You can download the .pbix file for more details.

Best Regards,
Angelia

Highlighted
MarcelBeug Super Contributor
Super Contributor

Re: Selectively remove duplicate rows?

Please note 1 important phenomenon (or bug if you like) when first sorting a table and then removing duplicates: you need to buffer the table after the sort, before removing duplicates, otherwise records may be removed without taking into account the sort order.

 

Sorted = Table.Sort(sort code)

Buffered = Table.Buffer(Sorted)

RemovedDuplicates = Table.Distinct(Buffered, equationCriteria)

 

or:

 

Sorted = Table.Buffer(Table.Sort(sort code))

RemovedDuplicates = Table.Distinct(Sorted, equationCriteria)

 

Parameter equationCritera may include field names and/or comparer functions.

With comparer functions you can ignore the case and/or use a culture code: e.g. "æ" and "ae" are considered equal in some cultures.

The parameter can take on various formats as illustrated in the examples below.

 

let
    Source = #table(type table[Text1 = text, Text2 = text],{{"a", "Encyclopaedia"},{"A","Encyclopædia"}}),

// ****   Format equationCriteria: comparer function
    // Result: both records ("a" and "A" are not equal):
    RemovedDuplicates1 = Table.Distinct(Source, Comparer.FromCulture("en-US")),

    // Result: 1 record (æ and ae are considered equal in "en-US"; true = ignore case))
    RemovedDuplicates2 = Table.Distinct(Source, Comparer.FromCulture("en-US", true)),

    // Result: 2 records (æ and ae are not considered equal in "da-DK")
    RemovedDuplicates3 = Table.Distinct(Source, Comparer.FromCulture("da-DK", true)),

    Indexed = Table.AddIndexColumn(RemovedDuplicates2, "Index", 0, 1),

// ****   Format equationCriteria: (list of) field name(s) (as generated from the user interface if columns were selected):
    RemovedDuplicates4 = Table.Distinct(Indexed, "Text1"),
    RemovedDuplicates5 = Table.Distinct(Indexed, {"Text1"}),
    RemovedDuplicates6 = Table.Distinct(Indexed, {"Text1","Text2"}),

// ****   Format equationCriteria: list of field name with comparer function:
    RemovedDuplicates7 = Table.Distinct(Indexed, {"Text1", Comparer.OrdinalIgnoreCase}),

// ****   Format equationCriteria: list of lists with field name and comparer function:
    RemovedDuplicates8 = Table.Distinct(Indexed, {{"Text1", Comparer.OrdinalIgnoreCase},{"Text2",Comparer.FromCulture("en-US")}})

in
    RemovedDuplicates8

 

Specializing in Power Query Formula Language (M)
Concat Member
Member

Re: Selectively remove duplicate rows?

Thank you @v-huizhn-msft

 

I believe I found a slightly simpler solution.

I took the original table (with sorting applied) and duplicated as you suggested.  In the new table I removed the duplicates.  Then I merged using Anti Join instead of Inner Join.

 

The Anti Join returns all rows from one table not found in the other table.  Saved a few steps with this method.

Concat Member
Member

Re: Selectively remove duplicate rows?

Thank you @MarcelBeug for the knowledge sharing.  I'll include the step to buffer the table prior to removing duplicates.

 

Kudos.

 

edhans New Contributor
New Contributor

Re: Selectively remove duplicate rows?

IMHO the anti-join is the way to go here. I never like sorting and relying on some undocumented logic to always work or not change in the future.

siljamardla Regular Visitor
Regular Visitor

Re: Selectively remove duplicate rows?

The Table.Buffer was a lifesaver. I thought of sorting and then removing duplicates, but for a long time it seemed like it didn't work.

Helpful resources

Announcements
Can You Solve These Challenge

Challenge: Can You Solve These?

Find out how to participate in the first Power BI 'Can You Solve These?' challenge.

Community News & Announcements

Community News & Announcements

Get your latest community news and announcements.

Virtual Launch Event

Microsoft Business Applications October Virtual Launch Event

Join us for an in-depth look at the new innovations across Dynamics 365 and the Microsoft Power Platform.

Community Kudopalooza

Win Power BI Swag with Community Kudopalooza!

Each week, complete activities and be qualified in the drawing for cool Power BI Swag.

Users Online
Currently online: 429 members 4,059 guests
Please welcome our newest community members: