Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello,
I need help removing "duplicates" from a table. Let me ilustrate my problem with one example:
Name | Country | Company | Age | City |
John Doe | US | Microsoft | 30 | Palo Alto |
John Doe | US | Azure | 30 | Madrid |
John Doe | US | 30 | Ireland | |
Jane Doe | UK | 54 | London | |
Dan Kent | MX | Microsoft | 38 | Tokyo |
In this example we can see that the Name 'John Doe' appears multiple times with the same Country and Age. I would like to remove one duplicate but giving priority to Company 'Microsoft' for instance. I don't care about the City and rest of the fields if there're more. Thus, resulting in the following table:
Name | Country | Company | Age | City |
John Doe | US | Microsoft | 30 | Palo Alto |
Jane Doe | UK | 54 | London | |
Dan Kent | MX | Microsoft | 38 | Tokyo |
I have been trying to count how many times the combination of fields Name, Country and Age are duplicated in the same table so I can mark the rows whose number is higher than 1 and Company is different than 'Microsoft' and then filter them out. However, I don't manage to do it. I don't really mind doing it in Power Query or DAX as long as it works.
I really appreaciate the help. Thank you very much.
Solved! Go to Solution.
@oliverL , Power Query is competent in such tasks. You might want to try,
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Country = _t, Company = _t, Age = _t, City = _t]),
#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]} otherwise _{0}}}),
#"Expanded ar" = let cols = List.RemoveItems(Table.ColumnNames(Source),{"Name"}) in Table.ExpandRecordColumn(#"Grouped Rows", "ar", cols, cols)
in
#"Expanded ar"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
@oliverL , Power Query is competent in such tasks. You might want to try,
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Country = _t, Company = _t, Age = _t, City = _t]),
#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]} otherwise _{0}}}),
#"Expanded ar" = let cols = List.RemoveItems(Table.ColumnNames(Source),{"Name"}) in Table.ExpandRecordColumn(#"Grouped Rows", "ar", cols, cols)
in
#"Expanded ar"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
Thanks @CNENFRNL,
I have tried it and I think it works. I need to do some more testing since my real scanario is a bit more complex but I am sure I will be able to get to the solution with your help. I will mark it as solution. However, there are a couple of questions that I would like to ask you to see if you could be so glad to help me:
- In this section of the code:
#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]}
Do you know why is not possible to replace the "=" in Company... with "<>"? In my real scenario I know what company I don't want. My bad for not making that clear in the original post.
-Also:
Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64)
How do you get the binary representation of a JSON file like that? When I load a JSON file I get the path to the container folder.
Thank you once again 😄
@oliverL , further with your questions,
1. #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]}
table{[colName1=value1, colName2=value2, ...]} is a fixed pattern for filtering a table down to a unique matching result (one and only one result). It's can be considered as syntactic sugar for Table.SelectRows(), I think. Only equality ("=") applies here.
2. when a dataset is pasted or manually entered into Power Query, it's automatically converted by PQ into a Json representation.
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
47 | |
26 | |
19 | |
14 | |
10 |
User | Count |
---|---|
58 | |
50 | |
44 | |
19 | |
18 |