cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
mraka9
Helper III
Helper III

Remove Duplicates

Greeting,
I have a table with two columns.
Would like to remove duplicates.
Product1 is related to Product2.
The problem is that there are duplicates, but they have been changed in the columns.

 

Product1Product2
AAA111AAA222
BBB000CCC111
AAA222AAA111
CCC111BBB000

 

It doesn't matter to me which product is in the column Product1 and which is in Product2. It only matters to me that the related articles have one line.

I would like only one column to remain:
AAA111-AAA222, and to remove the column
AAA222-AAA111 or vice versa. It is important that only one row of related articles remains.
I believe there is a way in Power Query, but I have no idea.
Thanks in advance!

1 ACCEPTED SOLUTION
ddpl
Solution Sage
Solution Sage

@mraka9 

 

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0NDQ0VNIBMYyMjJRidaKVnJycDAwMgGLOzs4gSZAYVFoHpgEkBpXWgWmIjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product1 = _t, Product2 = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Columns", {"Value"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Duplicates", "Index", 1, 1, Int64.Type),
#"Pivoted Column" = Table.Pivot(#"Added Index", List.Distinct(#"Added Index"[Attribute]), "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"}),
#"Filled Down" = Table.FillDown(#"Removed Columns",{"Product1"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Product2] <> null))
in
#"Filtered Rows"

 

ddpl_0-1660388949060.png

 

Please check and accpet as solution if its worked.

 

View solution in original post

6 REPLIES 6
speedramps
Super User
Super User

In that case please try this

 

Click here to download the solution 

 

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0NDQ0VNIBMYyMjJRidaKVnJycDAwMgGLOzs4gSZAYVFoHpgEkBpXWgWmIjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product1 = _t, Product2 = _t]),
#"Added Conditional Column" = Table.AddColumn(Source, "Products", each if [Product1] <= [Product2] then [Product1] & "-" & [Product2] else [Product2] & "-" & [Product1]),
#"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Products", Order.Ascending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"Products"})
in
#"Removed Duplicates"

 

 

How it works ...

 

Conditional formating creates a new column, which you can then deduplicate  

 

speedramps_0-1660388471577.png

 

Please now click the thumbs up and also accept the solution, thank you 

 

I put in a lot of effort to help you, now please quickly help me by giving kudos.

Remeber we are unpaid volunteers and here to coach you with Power BI and DAX skills and techniques, not do the users job for them. So please click the thumbs up and accept as solution button. 

If you give someone a fish then you only give them one meal, but if you teach them how to fish then they can feed themselves and teach others for a lifetime.  I prefer to teach members on this forum techniques rather give full solutions and do their job. You can then adapt the technique for your solution, learn some DAX skills for next time and soon become a Power BI Super User like me. 

One question per ticket please. If you need to extend your request then please raise a new ticket.

You will get a quicker response and each volunteer solver will get the kudos they deserve. Thank you ! 

 

Thank you for your help.
ddpl found a solution to my problem.

speedramps
Super User
Super User

Try this  ...

Click here to download the solution 

 

Look at the Power Query =--

 

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0NDQ0VNIBMYyMjJRidaKVnJycDAwMgGLOzs4gSZAYVFoHpgEkBpXWgWmIjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product1 = _t, Product2 = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Product"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Product", Order.Ascending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows")
in
#"Removed Duplicates"

 

 

How it works ...

 

  • First unpivot the columns
  • Remive the atriburte column
  • Sort the column
  • Remove the duplicates

 

Thanks for reaching out for help.

I put in a lot of effort to help you, now please quickly help me by giving kudos.

Remeber we are unpaid volunteers and here to coach you with Power BI and DAX skills and techniques, not do the users job for them. So please click the thumbs up and accept as solution button. 

If you give someone a fish then you only give them one meal, but if you teach them how to fish then they can feed themselves and teach others for a lifetime.  I prefer to teach members on this forum techniques rather give full solutions and do their job. You can then adapt the technique for your solution, learn some DAX skills for next time and soon become a Power BI Super User like me. 

 

One question per ticket please. If you need to extend your request then please raise a new ticket.

You will get a quicker response and each volunteer solver will get the kudos they deserve. Thank you ! 

 

Thanks speedramps for the quick reply!

Maybe I failed to explain what I needed.
I need to have both Product1 and Product2 columns. They are related products.
AAA111-AAA222
AAA222-AAA111

I would like to have only one line of related products left.
It doesn't matter if it remains AAA111-AAA222 or AAA222-AAA111.

ddpl
Solution Sage
Solution Sage

@mraka9 

 

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0NDQ0VNIBMYyMjJRidaKVnJycDAwMgGLOzs4gSZAYVFoHpgEkBpXWgWmIjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product1 = _t, Product2 = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Columns", {"Value"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Duplicates", "Index", 1, 1, Int64.Type),
#"Pivoted Column" = Table.Pivot(#"Added Index", List.Distinct(#"Added Index"[Attribute]), "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"}),
#"Filled Down" = Table.FillDown(#"Removed Columns",{"Product1"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Product2] <> null))
in
#"Filtered Rows"

 

ddpl_0-1660388949060.png

 

Please check and accpet as solution if its worked.

 

Thank you very much!
This could be the solution when I apply to a large amount of data!

Helpful resources

Announcements
September Update

Check it Out!

Click here to learn more about the September 2022 updates!

Power BI Dev Camp Session 26

Check it Out!

Mark your calendars and join us on Thursday, September 29 at 11a PDT for a great session with Ted Pattison!

Top Solution Authors