Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi All
I'm trying to remove duplicate product codes from a calculated table without success.
I've loaded in three separate lists of product codes and products names (plus department etc) from three different areas of the business. They're not in the same format as each other and I've had to use calculated columns to create the product codes in some instances by concatenating other columns. In each of the tables this resulting product code is unique.
I'm trying to create a single list of products and product names with no duplicates that's business-wide to produce a business-wide sales by product report. I've joined the three product tables using UNION. However, some of the product codes appear in multiple tables (not unexpected, occasionally products will cross into different areas of the business). This gives me duplicates in the product code column in the calculated table. I've tried to remove these by wrapping the UNION in DISTINCT which removes most of the duplicates but not all. The ones that are left are where the product code has been given a slightly different name (additional space, abbreviations etc) in each of the original tables that it appears in, so DISTINCT is leaving me with unique rows, not unique product codes.
What I'm trying to do is remove any row that has a duplicate product code. I'm not bothered which of the product descriptions I'm left with.. First/last/shortest/longest.. It doesn't matter. It's akin to 'remove duplicates' in Excel, and the only column I want to test for duplicates is the product code. I've tried SUMMARIZE with MAX in the expression but this seems to return the same product name for all products.
Any idea how I can achieve this?
Solved! Go to Solution.
Hi @Noyer
When i append two tables in Query editor, tehn remove duplicates rows for the "Product code" column, it returns distinct "Product code", "description" and "department".
Hi @Noyer
Do you have tables as below:
Finally, you want a table with distinct product code as below, right?
If so, please open edit queries, add a custom column in three tables
Then create a new blank query
let Source1 = Table.SelectColumns(Table1,{"new product code","product name"}), Source2 = Table.SelectColumns(Table2,{"new product code","product name"}), Source3 = Table.SelectColumns(Table3,{"new product code","product name"}), #"new table"=Table.Combine({Source1,Source2,Source3}), #"Removed Duplicates" = Table.Distinct(#"new table", {"new product code"}) in #"Removed Duplicates"
Finally, close&&apply.
Hi Maggie
Thank you for the reply. It's not a unique product code that I need but a unique description. Example I have is:
Table 1 | Table 2 | |||||
Product Code | Description | Department | Product Code | Description | Department | |
1 | Red | External | 1 | Red | Extn'l | |
2 | Green | External | 2 | Gr'n | External | |
3 | Blue | Internal | 3 | Blue | Internal |
If I join the tables, even if I wrap them in UNIQUE, I get two entries for product 2, one called 'Green' and one called 'Gr'n' so i can't lookup to the new table.
The product is the same, it's just called two slightly different things in two different environments. I understand why wrapping UNION in UNIQUE repeats product 2 (becasue it's looking for unique rows) but I need unique to look at the product code only and return either/any of the descriptions as long as it's consistent.
I've tried summarize, but I can't find a way to 'summarize' a text filed to return first/last/any single entry.
Thanks
N
Hi @Noyer
When i append two tables in Query editor, tehn remove duplicates rows for the "Product code" column, it returns distinct "Product code", "description" and "department".
Thanks Maggie! I don't know why I'd not considered appending the actual queries! Works perfectly 🙂
A really ugly way to get what I need would be to just to pull the product codes in a distinct list (and only the product codes) and then lookup to each of the original product tables in turn in a calculated column (if blank lookup to the next table) to return the product name etc, but I've got 200,000+ products and it seems excessive.. Which simpler way am I missing?
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
94 | |
82 | |
66 | |
58 |
User | Count |
---|---|
151 | |
121 | |
104 | |
87 | |
67 |