cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
JonasWesth Frequent Visitor
Frequent Visitor

Identify rows with duplicate values

I am new with Power BI and DAX, and hope the community can help with this specific issue. 

 

The purpose is to identify potential duplicate invoices, which are registrered under a different invoice number (eg. Inv123 scanned again as Inv 123 or Inv.123).

 

As this is part of a larger analysis, I would prefer it could be as a Column, rather than using Power Query. The "Vendor Number", "Posting Date" and "Amount", would all be the same. Does anyone have an idea on how to mark these rows with a "Yes", and all others with a "No", or something similar? 

 

Regards

Jonas

1 ACCEPTED SOLUTION

Accepted Solutions
Super User
Super User

Re: Identify rows with duplicate values

Look:

 

Power Query - marking dups in a table.PNG

Here's the M code:

let
    Source = Invoices,

#"Merged Queries" = Table.NestedJoin(Source, {"Vendor", "Posting Date", "Amount"}, #"Invoices - Duplicate Invoice Marker", {"Vendor", "Posting Date", "Amount"}, "Invoices - Duplicate Invoice Marker", JoinKind.LeftOuter), #"Expanded Invoices - Duplicate Invoice Marker" = Table.ExpandTableColumn(#"Merged Queries", "Invoices - Duplicate Invoice Marker", {"Inv No"}, {"Inv No.1"}), #"Removed Columns" = Table.RemoveColumns(#"Expanded Invoices - Duplicate Invoice Marker",{"Inv No.1"}), #"Grouped Rows" = Table.Group(#"Removed Columns", {"Inv No", "Vendor", "Posting Date", "Amount"}, {{"Count", each Table.RowCount(_), type number}}), #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Has Dups", each if [Count] > 1 then "Yes" else "No"), #"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Count"}) in #"Removed Columns2"

Here are the table dependencies:

Power Query - table dependencies.PNG

 

Best

Darek

7 REPLIES 7
Super User
Super User

Re: Identify rows with duplicate values

Why don't you do this in Power Query where it really should be done? Such calculations belong to Power Query, not DAX, even though, of course, it's possible to do it in DAX.

 

Best

Darek

JonasWesth Frequent Visitor
Frequent Visitor

Re: Identify rows with duplicate values

Thanks for taking your time reviewing my problem!

 

I don't mind using Power Query, but having the specific rows marked, while still retaining all the data, would be greatly preferred. I have a Data Model with many dimension tables and measures, so if at all possible, it would be benificial to just add a new column with "yes" and "no", where "yes" marks the rows which are suspected as duplicates.

 

I tried an If statement in a custom column in power query, but all results are positive:

if [Vendornumber] = [Vendornumber] and
[Amount] = [Amount] and
[Date] = [Date]
then "Yes"
else "No"

 

How would you mark only the rows where alle three criteria are true?

Regards
Jonas

Super User
Super User

Re: Identify rows with duplicate values

Well, mate, if you think carefully about what you've done in the IF statement, it'll be no surprise that you got "YES" only. IS 1 = 1? IS a = a? Is oaesuthtteasouhe = oaesuthtteasouhe?

 

Whatever you do - THINK.

 

To do what you want in PQ, requires thinking in M, not in DAX.

 

Best

Darek

JonasWesth Frequent Visitor
Frequent Visitor

Re: Identify rows with duplicate values

My core issue is how to mark identical rows based on three criteria? 

 

I understand why the If statement returns all positives, but thought it would help clarify my conundrum with the best example I could conjure up.

 

After hours of searching, I didn't come across any posted problems/solutions. I decided to describe my problem here as best possible, even though it is simple stuff for advanced users.

 

I get what you are saying mate, but when you hit that wall of knowledge, and google doesn’t do the trick, you need help, maybe even to formulate your problem..

Super User
Super User

Re: Identify rows with duplicate values

Well, in PQ you've got something like Merge Queries. You could therefore merge the table with a copy of itself (it must be a copy, a reference will not work) and the merge would be LEFT OUTER JOIN on the three columns you want. For all rows in the left table you'd get matches (in the form of a Table stored in the cell - you have to expand it). Then what happens is that some rows in the left table will be duplicated at least 2 times and some will be left alone (where there's only one match - the row itself). Having this you can remove the column(s) from the right table and do a groupby on the rows that were left. The groupby should use the Count(*) function. If you do that each row will have a count of duplicates. Now you can add a column and say: if count > 1 then "yes" else "no".

 

Easy? Can you follow yourself or should I create a dummy file with all the steps inside and post a link to it? But remember that I'm at work, so it might be faster if you try to implement this yourself Smiley Happy

 

Best

Darek

Super User
Super User

Re: Identify rows with duplicate values

Look:

 

Power Query - marking dups in a table.PNG

Here's the M code:

let
    Source = Invoices,

#"Merged Queries" = Table.NestedJoin(Source, {"Vendor", "Posting Date", "Amount"}, #"Invoices - Duplicate Invoice Marker", {"Vendor", "Posting Date", "Amount"}, "Invoices - Duplicate Invoice Marker", JoinKind.LeftOuter), #"Expanded Invoices - Duplicate Invoice Marker" = Table.ExpandTableColumn(#"Merged Queries", "Invoices - Duplicate Invoice Marker", {"Inv No"}, {"Inv No.1"}), #"Removed Columns" = Table.RemoveColumns(#"Expanded Invoices - Duplicate Invoice Marker",{"Inv No.1"}), #"Grouped Rows" = Table.Group(#"Removed Columns", {"Inv No", "Vendor", "Posting Date", "Amount"}, {{"Count", each Table.RowCount(_), type number}}), #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Has Dups", each if [Count] > 1 then "Yes" else "No"), #"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Count"}) in #"Removed Columns2"

Here are the table dependencies:

Power Query - table dependencies.PNG

 

Best

Darek

JonasWesth Frequent Visitor
Frequent Visitor

Re: Identify rows with duplicate values

Hi Darek

Your solutions is elegant and brilliant. The final table clearly shows which have duplicates and are candidates as same invoices scanned twice!

Thanks for the detailed description, including conjuring up your own example dataset! It was easy to follow, and allowed me a huge leap in knowledge of fundamentals of PQ and M!

 

Much appreciated

Jonas

Helpful resources

Announcements
Virtual Launch Event

Microsoft Business Applications October Virtual Launch Event

Join us for an in-depth look at the new innovations across Dynamics 365 and the Microsoft Power Platform.

Power BI Helps Homeless and Trouble Youth

Power BI Helps Homeless and Trouble Youth

We spoke with Power BI Super User, Greg Deckler, about his charity work

MBAS Gallery

Watch Sessions On Demand!

Continue your learning in our online communities.

Users Online
Currently online: 117 members 1,917 guests
Please welcome our newest community members: