cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
oliverL
Frequent Visitor

Removing duplicates from table based on certain columns only

Hello,

 

I need help removing "duplicates" from a table. Let me ilustrate my problem with one example:

NameCountryCompanyAgeCity
John DoeUSMicrosoft30Palo Alto
John DoeUSAzure30Madrid
John DoeUSGoogle30Ireland
Jane DoeUKGoogle54London
Dan KentMXMicrosoft38Tokyo

 

In this example we can see that the Name 'John Doe' appears multiple times with the same Country and Age. I would like to remove one duplicate but giving priority to Company 'Microsoft' for instance. I don't care about the City and rest of the fields if there're more.  Thus, resulting in the following table:

NameCountryCompanyAgeCity
John DoeUSMicrosoft30Palo Alto
Jane DoeUKGoogle54London
Dan KentMXMicrosoft38Tokyo

 

I have been trying to count how many times the combination of fields Name, Country and Age are duplicated in the same table so I can mark the rows whose number is higher than 1 and Company is different than 'Microsoft' and then filter them out. However, I don't manage to do it. I don't really mind doing it in Power Query or DAX as long as it works.

 

I really appreaciate the help. Thank you very much.

1 ACCEPTED SOLUTION
CNENFRNL
Super User III
Super User III

@oliverL , Power Query is competent in such tasks. You might want to try,

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Country = _t, Company = _t, Age = _t, City = _t]),
    #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]} otherwise _{0}}}),
    #"Expanded ar" = let cols = List.RemoveItems(Table.ColumnNames(Source),{"Name"}) in Table.ExpandRecordColumn(#"Grouped Rows", "ar", cols, cols)
in
    #"Expanded ar"

Screenshot 2021-03-05 000514.png

View solution in original post

4 REPLIES 4
CNENFRNL
Super User III
Super User III

@oliverL , Power Query is competent in such tasks. You might want to try,

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Country = _t, Company = _t, Age = _t, City = _t]),
    #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]} otherwise _{0}}}),
    #"Expanded ar" = let cols = List.RemoveItems(Table.ColumnNames(Source),{"Name"}) in Table.ExpandRecordColumn(#"Grouped Rows", "ar", cols, cols)
in
    #"Expanded ar"

Screenshot 2021-03-05 000514.png

View solution in original post

oliverL
Frequent Visitor

Thanks @CNENFRNL,


I have tried it and I think it works. I need to do some more testing since my real scanario is a bit more complex but I am sure I will be able to get to the solution with your help. I will mark it as solution. However, there are a couple of questions that I would like to ask you to see if you could be so glad to help me:

- In this section of the code:

#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]}

Do you know why is not possible to replace the "=" in Company... with "<>"? In my real scenario I know what company I don't want. My bad for not making that clear in the original post.

 

-Also:

Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64)

 

How do you get the binary representation of a JSON file like that? When I load a JSON file I get the path to the container folder.


Thank you once again 😄

@oliverL , further with your questions,

 

1. #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]}

table{[colName1=value1, colName2=value2, ...]} is a fixed pattern for filtering a table down to a unique matching result (one and only one result). It's can be considered as syntactic sugar for Table.SelectRows(), I think. Only equality ("=") applies here.

 

2. when a dataset is pasted or manually entered into Power Query, it's automatically converted by PQ into a Json representation.

Untitled.png

oliverL
Frequent Visitor

Thank you @CNENFRNL 

 

You've really helped me a lot!

 

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

secondImage

Are You Ready?

Test your skills now with the Cloud Skills Challenge.

Top Solution Authors