Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Posted this by mistake in the Ideas section. Reposting here.
Hi,
I am trying to create a table containing only distinct rows from the union of two existing Power Query tables in Excel.
My code (anonymised) is:
let SelectColumnsT1 = Table.SelectColumns(T1Data,{"Field1", "Field2", "Field3"}), SelectColumnsT2 = Table.SelectColumns(T2Data,{"Field1", "Field2", "Field3"}), CombineBoth = Table.Combine({SelectColumnsT1 , SelectColumnsT2 }), GetDistinct = Table.Distinct(CombineBoth,{"Field1", "Field2", "Field3"}) in GetDistinct
Field1 is an integer and the other fields are strings.
This returns duplicates in the resulting table. I have checked the individual rows which are duplicates and there are no leading/trailing blanks, and when I check within excel that the fields in the duplicated rows are equal, the result is TRUE.
Am I misunderstanding the use of Table.Distinct?
Have I got the syntax wrong?
Is there a bug in this function?
Any other possible things I should look into to try to get to the bottom of this?
I would be grateful if anyone can give me any help on this.
Regards,
Mark
Solved! Go to Solution.
Hi,
This works
Hi,
Share some data and show the expected result.
Hi Ashish,
Thanks very much for replying.
The actual data is proprietary so I can't share the actual data, however, some made up data to illustrate the point:
Table 1
Year Attr1 Attr2
2015 A Red
2016 A Red
2015 B Red
2015 B Blue
Table 2
Year Attr1 Attr2
2015 A Red
2016 A Red
2015 B Green
2015 B Blue
First step is a union.
Combined table
Year Attr1 Attr2
2015 A Red <- duplicate 1
2016 A Red <- duplicate 2
2015 B Red
2015 B Blue <- duplicate 3
2015 A Red <- duplicate 1
2016 A Red <- duplicate 2
2015 B Green
2015 B Blue <- duplicate 3
Next step is to get distinct rows. I've indicated above the duplicates introduced by the union query. So the distinct rows remove all but one instance of the duplicates.
Year Attr1 Attr2
2015 A Red
2016 A Red
2015 B Red
2015 B Blue
2015 B Green
The two data tables I'm working with are 291k and 69k records long. The Table.Distinct query returns 5,284 distinct rows, however de-duping these 5,284 rows reduces the row count to 5,250, so there are 34 duplicated rows (only duplicates, no triplicates etc). Hence it is *almost* successful in producing distinct rows, just not quite there.
Regards,
Mark
Hi,
This works
Hi Ashish.
Your code is more or less the same as mine.
I have experimented on using an up to date version of Excel (my office uses Excel 2013, my personal laptop has the latest Excel 365).
The problem disappears on my version of Excel, so I think maybe I've uncovered a bug in the old version, which I guess I can't get around.
Thanks for spending the time to help me out.
Regards,
Mark
You are welcome.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
113 | |
97 | |
84 | |
67 | |
60 |
User | Count |
---|---|
150 | |
120 | |
99 | |
87 | |
68 |