Solved: Re: Remove Duplicates

mraka9 · ‎08-13-2022

Greeting,
I have a table with two columns.
Would like to remove duplicates.
Product1 is related to Product2.
The problem is that there are duplicates, but they have been changed in the columns.

Product1	Product2
AAA111	AAA222
BBB000	CCC111
AAA222	AAA111
CCC111	BBB000

It doesn't matter to me which product is in the column Product1 and which is in Product2. It only matters to me that the related articles have one line.

I would like only one column to remain:
AAA111-AAA222, and to remove the column
AAA222-AAA111 or vice versa. It is important that only one row of related articles remains.
I believe there is a way in Power Query, but I have no idea.
Thanks in advance!

ddpl · ‎08-13-2022

@mraka9

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0NDQ0VNIBMYyMjJRidaKVnJycDAwMgGLOzs4gSZAYVFoHpgEkBpXWgWmIjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product1 = _t, Product2 = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Columns", {"Value"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Duplicates", "Index", 1, 1, Int64.Type),
#"Pivoted Column" = Table.Pivot(#"Added Index", List.Distinct(#"Added Index"[Attribute]), "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"}),
#"Filled Down" = Table.FillDown(#"Removed Columns",{"Product1"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Product2] <> null))
in
#"Filtered Rows"

Please check and accpet as solution if its worked.

View solution in original post

speedramps · ‎08-13-2022

In that case please try this

Click here to download the solution

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0NDQ0VNIBMYyMjJRidaKVnJycDAwMgGLOzs4gSZAYVFoHpgEkBpXWgWmIjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product1 = _t, Product2 = _t]),
#"Added Conditional Column" = Table.AddColumn(Source, "Products", each if [Product1] <= [Product2] then [Product1] & "-" & [Product2] else [Product2] & "-" & [Product1]),
#"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Products", Order.Ascending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"Products"})
in
#"Removed Duplicates"

How it works ...

Conditional formating creates a new column, which you can then deduplicate

Please now click the thumbs up and also accept the solution, thank you

I put in a lot of effort to help you, now please quickly help me by giving kudos.

Remeber we are unpaid volunteers and here to coach you with Power BI and DAX skills and techniques, not do the users job for them. So please click the thumbs up and accept as solution button.

If you give someone a fish then you only give them one meal, but if you teach them how to fish then they can feed themselves and teach others for a lifetime. I prefer to teach members on this forum techniques rather give full solutions and do their job. You can then adapt the technique for your solution, learn some DAX skills for next time and soon become a Power BI Super User like me.

One question per ticket please. If you need to extend your request then please raise a new ticket.

You will get a quicker response and each volunteer solver will get the kudos they deserve. Thank you !

mraka9 · ‎08-13-2022

Thank you for your help.
ddpl found a solution to my problem.

speedramps · ‎08-13-2022

Try this ...

Click here to download the solution

Look at the Power Query =--

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0NDQ0VNIBMYyMjJRidaKVnJycDAwMgGLOzs4gSZAYVFoHpgEkBpXWgWmIjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product1 = _t, Product2 = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Product"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Product", Order.Ascending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows")
in
#"Removed Duplicates"

How it works ...

First unpivot the columns
Remive the atriburte column
Sort the column
Remove the duplicates

Thanks for reaching out for help.

I put in a lot of effort to help you, now please quickly help me by giving kudos.

Remeber we are unpaid volunteers and here to coach you with Power BI and DAX skills and techniques, not do the users job for them. So please click the thumbs up and accept as solution button.

If you give someone a fish then you only give them one meal, but if you teach them how to fish then they can feed themselves and teach others for a lifetime. I prefer to teach members on this forum techniques rather give full solutions and do their job. You can then adapt the technique for your solution, learn some DAX skills for next time and soon become a Power BI Super User like me.

One question per ticket please. If you need to extend your request then please raise a new ticket.

You will get a quicker response and each volunteer solver will get the kudos they deserve. Thank you !

mraka9 · ‎08-13-2022

Thanks speedramps for the quick reply!

Maybe I failed to explain what I needed.
I need to have both Product1 and Product2 columns. They are related products.
AAA111-AAA222
AAA222-AAA111

I would like to have only one line of related products left.
It doesn't matter if it remains AAA111-AAA222 or AAA222-AAA111.

ddpl · ‎08-13-2022

@mraka9

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0NDQ0VNIBMYyMjJRidaKVnJycDAwMgGLOzs4gSZAYVFoHpgEkBpXWgWmIjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product1 = _t, Product2 = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Columns", {"Value"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Duplicates", "Index", 1, 1, Int64.Type),
#"Pivoted Column" = Table.Pivot(#"Added Index", List.Distinct(#"Added Index"[Attribute]), "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"}),
#"Filled Down" = Table.FillDown(#"Removed Columns",{"Product1"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Product2] <> null))
in
#"Filtered Rows"

Please check and accpet as solution if its worked.

mraka9 · ‎08-13-2022

Thank you very much!
This could be the solution when I apply to a large amount of data!

Remove Duplicates

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly