cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
JonasWesth
Frequent Visitor

Identify rows with duplicate values

I am new with Power BI and DAX, and hope the community can help with this specific issue. 

 

The purpose is to identify potential duplicate invoices, which are registrered under a different invoice number (eg. Inv123 scanned again as Inv 123 or Inv.123).

 

As this is part of a larger analysis, I would prefer it could be as a Column, rather than using Power Query. The "Vendor Number", "Posting Date" and "Amount", would all be the same. Does anyone have an idea on how to mark these rows with a "Yes", and all others with a "No", or something similar? 

 

Regards

Jonas

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Look:

 

Power Query - marking dups in a table.PNG

Here's the M code:

let
    Source = Invoices,

#"Merged Queries" = Table.NestedJoin(Source, {"Vendor", "Posting Date", "Amount"}, #"Invoices - Duplicate Invoice Marker", {"Vendor", "Posting Date", "Amount"}, "Invoices - Duplicate Invoice Marker", JoinKind.LeftOuter), #"Expanded Invoices - Duplicate Invoice Marker" = Table.ExpandTableColumn(#"Merged Queries", "Invoices - Duplicate Invoice Marker", {"Inv No"}, {"Inv No.1"}), #"Removed Columns" = Table.RemoveColumns(#"Expanded Invoices - Duplicate Invoice Marker",{"Inv No.1"}), #"Grouped Rows" = Table.Group(#"Removed Columns", {"Inv No", "Vendor", "Posting Date", "Amount"}, {{"Count", each Table.RowCount(_), type number}}), #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Has Dups", each if [Count] > 1 then "Yes" else "No"), #"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Count"}) in #"Removed Columns2"

Here are the table dependencies:

Power Query - table dependencies.PNG

 

Best

Darek

View solution in original post

7 REPLIES 7
Anonymous
Not applicable

Why don't you do this in Power Query where it really should be done? Such calculations belong to Power Query, not DAX, even though, of course, it's possible to do it in DAX.

 

Best

Darek

Thanks for taking your time reviewing my problem!

 

I don't mind using Power Query, but having the specific rows marked, while still retaining all the data, would be greatly preferred. I have a Data Model with many dimension tables and measures, so if at all possible, it would be benificial to just add a new column with "yes" and "no", where "yes" marks the rows which are suspected as duplicates.

 

I tried an If statement in a custom column in power query, but all results are positive:

if [Vendornumber] = [Vendornumber] and
[Amount] = [Amount] and
[Date] = [Date]
then "Yes"
else "No"

 

How would you mark only the rows where alle three criteria are true?

Regards
Jonas

Anonymous
Not applicable

Well, mate, if you think carefully about what you've done in the IF statement, it'll be no surprise that you got "YES" only. IS 1 = 1? IS a = a? Is oaesuthtteasouhe = oaesuthtteasouhe?

 

Whatever you do - THINK.

 

To do what you want in PQ, requires thinking in M, not in DAX.

 

Best

Darek

My core issue is how to mark identical rows based on three criteria? 

 

I understand why the If statement returns all positives, but thought it would help clarify my conundrum with the best example I could conjure up.

 

After hours of searching, I didn't come across any posted problems/solutions. I decided to describe my problem here as best possible, even though it is simple stuff for advanced users.

 

I get what you are saying mate, but when you hit that wall of knowledge, and google doesn’t do the trick, you need help, maybe even to formulate your problem..

Anonymous
Not applicable

Look:

 

Power Query - marking dups in a table.PNG

Here's the M code:

let
    Source = Invoices,

#"Merged Queries" = Table.NestedJoin(Source, {"Vendor", "Posting Date", "Amount"}, #"Invoices - Duplicate Invoice Marker", {"Vendor", "Posting Date", "Amount"}, "Invoices - Duplicate Invoice Marker", JoinKind.LeftOuter), #"Expanded Invoices - Duplicate Invoice Marker" = Table.ExpandTableColumn(#"Merged Queries", "Invoices - Duplicate Invoice Marker", {"Inv No"}, {"Inv No.1"}), #"Removed Columns" = Table.RemoveColumns(#"Expanded Invoices - Duplicate Invoice Marker",{"Inv No.1"}), #"Grouped Rows" = Table.Group(#"Removed Columns", {"Inv No", "Vendor", "Posting Date", "Amount"}, {{"Count", each Table.RowCount(_), type number}}), #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Has Dups", each if [Count] > 1 then "Yes" else "No"), #"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Count"}) in #"Removed Columns2"

Here are the table dependencies:

Power Query - table dependencies.PNG

 

Best

Darek

View solution in original post

Hi Darek

Your solutions is elegant and brilliant. The final table clearly shows which have duplicates and are candidates as same invoices scanned twice!

Thanks for the detailed description, including conjuring up your own example dataset! It was easy to follow, and allowed me a huge leap in knowledge of fundamentals of PQ and M!

 

Much appreciated

Jonas

Anonymous
Not applicable

Well, in PQ you've got something like Merge Queries. You could therefore merge the table with a copy of itself (it must be a copy, a reference will not work) and the merge would be LEFT OUTER JOIN on the three columns you want. For all rows in the left table you'd get matches (in the form of a Table stored in the cell - you have to expand it). Then what happens is that some rows in the left table will be duplicated at least 2 times and some will be left alone (where there's only one match - the row itself). Having this you can remove the column(s) from the right table and do a groupby on the rows that were left. The groupby should use the Count(*) function. If you do that each row will have a count of duplicates. Now you can add a column and say: if count > 1 then "yes" else "no".

 

Easy? Can you follow yourself or should I create a dummy file with all the steps inside and post a link to it? But remember that I'm at work, so it might be faster if you try to implement this yourself 🙂

 

Best

Darek

Helpful resources

Announcements
MBAS on Demand

2021 Release Wave 2 Plan

Power Platform release plan for the 2021 release wave 2 describes all new features releasing from October 2021 through March 2022.

July 2021 Update 768x460.png

Check it out!

Click here to read more about the July 2021 Updates

Power Query PA Forum 768x460.png

Check it out!

Did you know that you can visit the Power Query Forum in Power BI and now Power Apps

Top Solution Authors