Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
oliverL
Frequent Visitor

Removing duplicates from table based on certain columns only

Hello,

 

I need help removing "duplicates" from a table. Let me ilustrate my problem with one example:

NameCountryCompanyAgeCity
John DoeUSMicrosoft30Palo Alto
John DoeUSAzure30Madrid
John DoeUSGoogle30Ireland
Jane DoeUKGoogle54London
Dan KentMXMicrosoft38Tokyo

 

In this example we can see that the Name 'John Doe' appears multiple times with the same Country and Age. I would like to remove one duplicate but giving priority to Company 'Microsoft' for instance. I don't care about the City and rest of the fields if there're more.  Thus, resulting in the following table:

NameCountryCompanyAgeCity
John DoeUSMicrosoft30Palo Alto
Jane DoeUKGoogle54London
Dan KentMXMicrosoft38Tokyo

 

I have been trying to count how many times the combination of fields Name, Country and Age are duplicated in the same table so I can mark the rows whose number is higher than 1 and Company is different than 'Microsoft' and then filter them out. However, I don't manage to do it. I don't really mind doing it in Power Query or DAX as long as it works.

 

I really appreaciate the help. Thank you very much.

1 ACCEPTED SOLUTION
CNENFRNL
Community Champion
Community Champion

@oliverL , Power Query is competent in such tasks. You might want to try,

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Country = _t, Company = _t, Age = _t, City = _t]),
    #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]} otherwise _{0}}}),
    #"Expanded ar" = let cols = List.RemoveItems(Table.ColumnNames(Source),{"Name"}) in Table.ExpandRecordColumn(#"Grouped Rows", "ar", cols, cols)
in
    #"Expanded ar"

Screenshot 2021-03-05 000514.png


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

View solution in original post

4 REPLIES 4
CNENFRNL
Community Champion
Community Champion

@oliverL , Power Query is competent in such tasks. You might want to try,

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Country = _t, Company = _t, Age = _t, City = _t]),
    #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]} otherwise _{0}}}),
    #"Expanded ar" = let cols = List.RemoveItems(Table.ColumnNames(Source),{"Name"}) in Table.ExpandRecordColumn(#"Grouped Rows", "ar", cols, cols)
in
    #"Expanded ar"

Screenshot 2021-03-05 000514.png


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Thanks @CNENFRNL,


I have tried it and I think it works. I need to do some more testing since my real scanario is a bit more complex but I am sure I will be able to get to the solution with your help. I will mark it as solution. However, there are a couple of questions that I would like to ask you to see if you could be so glad to help me:

- In this section of the code:

#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]}

Do you know why is not possible to replace the "=" in Company... with "<>"? In my real scenario I know what company I don't want. My bad for not making that clear in the original post.

 

-Also:

Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64)

 

How do you get the binary representation of a JSON file like that? When I load a JSON file I get the path to the container folder.


Thank you once again 😄

CNENFRNL
Community Champion
Community Champion

@oliverL , further with your questions,

 

1. #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]}

table{[colName1=value1, colName2=value2, ...]} is a fixed pattern for filtering a table down to a unique matching result (one and only one result). It's can be considered as syntactic sugar for Table.SelectRows(), I think. Only equality ("=") applies here.

 

2. when a dataset is pasted or manually entered into Power Query, it's automatically converted by PQ into a Json representation.

Untitled.png


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Thank you @CNENFRNL 

 

You've really helped me a lot!

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors