Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Quiny_Harl
Resolver I
Resolver I

How to identify duplicate column records in Power Query

Hello,

Is there a straightforward way to identify/flag duplicate column records in Power Query?
When I say straightforward, I mean something that doesn't involve aggregations like this method for example, because I don't want to aggregate my table or add too many steps.
I'm looking for the Power Query equivalent of this DAX code.

2 ACCEPTED SOLUTIONS

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMgKTxkikiVJsLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [PK = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Duplicate Flag", (k)=> List.Count(List.Select(Source[PK], each _ = k[PK] )))
in
    #"Added Custom"

 

You can throw a List.Buffer into the mix if you need better performance.

 

View solution in original post

dufoq3
Super User
Super User

Hi @Quiny_Harl, don't be affraid of using Group By - it is one of my favourite functions.

 

You can achieve what you need in single step.

dufoq3_0-1714067856112.png

 

This code will preserve all your exicting columns and add new [Duplicate Flag] at the end.

Add this code as new step, but replace Source with your previous step reference and "PK" with your column name where you want to check for duplicates

dufoq3_2-1714068041779.png

 

Table.Combine(Table.Group(Source, {"PK"}, {{"All", each
        Table.AddColumn(_, "Duplicate Flag", (x)=> Table.RowCount(_), Int64.Type), type table}})[All])

 

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

View solution in original post

12 REPLIES 12
dufoq3
Super User
Super User

Hi @Quiny_Harl, don't be affraid of using Group By - it is one of my favourite functions.

 

You can achieve what you need in single step.

dufoq3_0-1714067856112.png

 

This code will preserve all your exicting columns and add new [Duplicate Flag] at the end.

Add this code as new step, but replace Source with your previous step reference and "PK" with your column name where you want to check for duplicates

dufoq3_2-1714068041779.png

 

Table.Combine(Table.Group(Source, {"PK"}, {{"All", each
        Table.AddColumn(_, "Duplicate Flag", (x)=> Table.RowCount(_), Int64.Type), type table}})[All])

 

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

watkinnc
Super User
Super User

Not so, you can right-click on a single column or multiple columns, and select "Remove duplicates".  It will remove duplicates in the field(s) you selected, regardless of what else is in the row.

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!
lbendlin
Super User
Super User

please define "duplicate"  and "identify".  Does that include the first occurrence? If yes then you can do a table self join.

watkinnc
Super User
Super User

No way to do it without aggregating, if you are trying to keep the duplicates and mark them somehow. Just group, add a count column, add the All Rows column, then expand the table column. All of your counts above 1 are duplicates.

 

Not sure what else you might have envisioned.

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

@watkinnc, what I've envisioned is something similar to the DAX code in the link I provided. I would like to create a custom column that is going to flag each records of a Column1 that apears more than once. Then, I'm going to filter out the duplicates.

You can use implicit measures for that (Count and Count(Distinct)).  Please define what you mean by "filter out the duplicates".  

 

Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).

Do not include sensitive information or anything not related to the issue or question.

If you are unsure how to upload data please refer to https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...

Please show the expected outcome based on the sample data you provided.

Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...

I cannot use Count and Count(Distinct) in Power Query.
I'm not sure how to further clarify what I need as I think it is pretty obvious.

PKDuplicate Flag
11
21
32
32
41

 I need to add the Duplicate Flag column in Power Query. Each record that has a count more than once is a duplicate. I'm going to apply a filter on the  Duplicate Flag to remove these records.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMgKTxkikiVJsLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [PK = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Duplicate Flag", (k)=> List.Count(List.Select(Source[PK], each _ = k[PK] )))
in
    #"Added Custom"

 

You can throw a List.Buffer into the mix if you need better performance.

 

Thank you for trying to help me but I don't understand how to use this code.

How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done". Once you examined the code, replace the Source step with your own source.

If you are going to filter out the duplicates anyway, why not just click "Remove Duplicates"?

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

@watkinnc, because this works only when the whole table row, including all columns, is a duplicate.

Helpful resources

Announcements
PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

Top Solution Authors
Top Kudoed Authors