Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
andrew_sxs
Regular Visitor

Power Query not removing all duplicates

Hi, first time using this so lets see how it goes...

 

I am working ith a file of nearly 1mil rows and 14 columns.

 

I am trying to remove duplicates within a column (LOCAL_ITEM_DESCRIPTOR) column.

 

I have therfore perfomed 'remove duplicates' function on the column.

andrew_sxs_0-1664374466326.png

 

However, when the data set updates, there are still 9 duplicates (see below in table).

andrew_sxs_1-1664374543162.png

andrew_sxs_0-1664374909686.png

 

I don't understand why these duplicates are not being removed. How can I get them removed through the remove duplicate function or with a secondary remove duplicates operation?

Can someone please help?

1 ACCEPTED SOLUTION
Daryl-Lynch-Bzy
Resident Rockstar
Resident Rockstar

Hi @andrew_sxs  - I would like to suggest that you Filter you population in Power Query before the remove Duplicates step.  If possible include an example where there are:

  • no duplicates to remove
  • duplicate has been removed successfully
  • duplicate is not removed

This will help you visible see what is happening.  It is possible that you need Trim or change case on the column with the duplicates because the following are not the same:  "ABC" <> "AbC" or "ABC ".

View solution in original post

6 REPLIES 6
Daryl-Lynch-Bzy
Resident Rockstar
Resident Rockstar

Yes, this correct.  The function will remove the duplicate after retaining the row associated with the first unique instance from the column.  Here is an example:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRyVtJRMjRQitVB40RERgE5RlBOJYhjjKzMFM5RAPLMTMG80rzMwtJUsCFA6VgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Column For Distinct" = _t, #"Another Column" = _t]),
    #"Removed Duplicates" = Table.Distinct(Source, {"Column For Distinct"}),
    #"Added Custom" = Table.AddColumn(#"Removed Duplicates", "Custom", each Text.Length([Column For Distinct]))
in
    #"Added Custom"

 This example contains Data Qaulity exceptions so I may not have the desired result in Power Query.  I wanted XYZ to be unique, but XYZ and XyZ in Power Query terms are two distinct items.

DarylLynchBzy_0-1664388841849.png

 

When you get to DAX it will treat XyZ and XYZ has equals because it is not Case Sensitive like Power Query.  I think you need to Trim and Capitalise before using Distinct.  Like this:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRyVtJRMjRQitVB40RERgE5RlBOJYhjjKzMFM5RAPLMTMG80rzMwtJUsCFA6VgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Column For Distinct" = _t, #"Another Column" = _t]),
    #"Trimmed Text" = Table.TransformColumns(Source,{{"Column For Distinct", Text.Trim, type text}}),
    #"Uppercased Text" = Table.TransformColumns(#"Trimmed Text",{{"Column For Distinct", Text.Upper, type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Uppercased Text", {"Column For Distinct"}),
    #"Added Custom" = Table.AddColumn(#"Removed Duplicates", "Custom", each Text.Length([Column For Distinct]))
in
    #"Added Custom"

DarylLynchBzy_1-1664389033366.png

 

Daryl-Lynch-Bzy
Resident Rockstar
Resident Rockstar

Hi @andrew_sxs  - I would like to suggest that you Filter you population in Power Query before the remove Duplicates step.  If possible include an example where there are:

  • no duplicates to remove
  • duplicate has been removed successfully
  • duplicate is not removed

This will help you visible see what is happening.  It is possible that you need Trim or change case on the column with the duplicates because the following are not the same:  "ABC" <> "AbC" or "ABC ".

trim function before remove duplicate worked... thanks 🙂

AlexisOlson
Super User
Super User

Did you remove duplicates on the whole table or only on a single column? The reason I ask is that it looks like the rows are probably unique even if there are duplicates in the first column.

 

What does the M code for the remove duplicates step look like?

M code is as follows - So I presume it is meant to be removing all duplicates within this column, despite the fact that other columns have differing information which would make the whole row distinct but not the column?

andrew_sxs_0-1664383966213.png

 

Remaining duplicate values (9 in total) in this column are as follows:

andrew_sxs_1-1664384049277.png

 

why did you distinct the table depending on the column of description, and then count the distinct values of item code?

maybe there some item codes have two descriptions.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors