Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Noyer
Helper I
Helper I

How to Remove Duplicates in Calculated Table?

Hi All

 

I'm trying to remove duplicate product codes from a calculated table without success. 

 

I've loaded in three separate lists of product codes and products names (plus department etc) from three different areas of the business.  They're not in the same format as each other and I've had to use calculated columns to create the product codes in some instances by concatenating other columns.  In each of the tables this resulting product code is unique. 

 

I'm trying to create a single list of products and product names with no duplicates that's business-wide to produce a business-wide sales by product report.  I've joined the three product tables using UNION.  However, some of the product codes appear in multiple tables (not unexpected, occasionally products will cross into different areas of the business).  This gives me duplicates in the product code column in the calculated table.  I've tried to remove these by wrapping the UNION in DISTINCT which removes most of the duplicates but not all.  The ones that are left are where the product code has been given a slightly different name (additional space, abbreviations etc) in each of the original tables that it appears in, so DISTINCT is leaving me with unique rows, not unique product codes.

 

What I'm trying to do is remove any row that has a duplicate product code.  I'm not bothered which of the product descriptions I'm left with..  First/last/shortest/longest.. It doesn't matter.  It's akin to 'remove duplicates' in Excel, and the only column I want to test for duplicates is the product code.  I've tried SUMMARIZE with MAX in the expression but this seems to return the same product name for all products.

 

Any idea how I can achieve this?

1 ACCEPTED SOLUTION

Hi @Noyer 

When i append two tables in Query editor, tehn remove duplicates rows for the "Product code" column, it returns distinct  "Product code", "description" and "department".

Capture18.JPGCapture19.JPG

Would you like this result?
If not, please let me know.
 
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

5 REPLIES 5
v-juanli-msft
Community Support
Community Support

Hi @Noyer 

Do you have tables as below:

Capture6.JPG

Finally, you want a table with distinct product code as below, right?

Capture7.JPG

 

If so, please open edit queries, add a custom column in three tables

Capture8.JPG

Then create a new blank query

Capture9.JPG

let
    Source1 = Table.SelectColumns(Table1,{"new product code","product name"}),
    Source2 = Table.SelectColumns(Table2,{"new product code","product name"}),
    Source3 = Table.SelectColumns(Table3,{"new product code","product name"}),
    #"new table"=Table.Combine({Source1,Source2,Source3}),
    #"Removed Duplicates" = Table.Distinct(#"new table", {"new product code"})
in
    #"Removed Duplicates"

Finally, close&&apply.

 

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Maggie

 

Thank you for the reply.  It's not a unique product code that I need but a unique description.  Example I have is:

 

Table 1   Table 2  
       
Product CodeDescriptionDepartment Product CodeDescriptionDepartment
1RedExternal 1RedExtn'l
2GreenExternal 2Gr'nExternal
3BlueInternal 3BlueInternal

 

If I join the tables, even if I wrap them in UNIQUE, I get two entries for product 2, one called 'Green' and one called 'Gr'n' so i can't lookup to the new table.

 

The product is the same, it's just called two slightly different things in two different environments.  I understand why wrapping UNION in UNIQUE repeats product 2 (becasue it's looking for unique rows) but I need unique to look at the product code only and return either/any of the descriptions as long as it's consistent.

 

I've tried summarize, but I can't find a way to 'summarize' a text filed to return first/last/any single entry. 

 

Thanks

 

N

 

 

Hi @Noyer 

When i append two tables in Query editor, tehn remove duplicates rows for the "Product code" column, it returns distinct  "Product code", "description" and "department".

Capture18.JPGCapture19.JPG

Would you like this result?
If not, please let me know.
 
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks Maggie!  I don't know why I'd not considered appending the actual queries!  Works perfectly 🙂

Noyer
Helper I
Helper I

A really ugly way to get what I need would be to just to pull the product codes in a distinct list (and only the product codes) and then lookup to each of the original product tables in turn in a calculated column (if blank lookup to the next table) to return the product name etc, but I've got 200,000+ products and it seems excessive..  Which simpler way am I missing?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.