Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
In Power Query, I'm trying to sort a table and remove duplicates. The table sort is to ensure that the record I want to keep is first in the table and all subsequent records will come later and be removed. I've found that Power Query isn't always taking the sort into consideration when removing duplicates and I need to buffer the table after the sort in order to properly remove the duplicates.
See: https://community.powerbi.com/t5/Desktop/Selectively-remove-duplicate-rows/td-p/334489 (The table buffer reference is a few responses down the thread...Message 5, to be exact)
Unfortunately, I'm not entirely sure what the heck is going on in the very detailed & complicated example provided in the thread. I'm knowledgeable enough to manipulate the Advanced Editor and even fake my way through adding some of my own lines if I have something to reference, just not entirely sure what I should be adding in this case. Here's the current code from my Advanced Editor for the row sort and duplicate removal. Hoping someone will be able to help guide me as to what to add to it:
#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Application Date", Order.Descending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"DSI"}),
Ideas?
Solved! Go to Solution.
I haven't tested it, but it looks like you just need to add Table.Buffer to force PowerQuery to actually sort the table before duplicate removal. Otherwise, PowerQuery has a tendency to try and be too efficent, in this case, removing a bunch of rows and sorting the smaller dataset, which is great from a performance standpoint.
You should be able to update your advanced query with the items in blue:
#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Application Date", Order.Descending}}), #"Buffer Sorted Table" = Table.Buffer(#"Sorted Rows"), #"Removed Duplicates" = Table.Distinct(#"Buffer Sorted Table", {"DSI"}),
If this doesn't make sense, or doesn't actually work, you can try the solution that was talked about in message #6 of the thread you linked. That method is going through and finding the distinct rows yourself instead of letting Table.Distinct do it for you (and potentially get it wrong)
I haven't tested it, but it looks like you just need to add Table.Buffer to force PowerQuery to actually sort the table before duplicate removal. Otherwise, PowerQuery has a tendency to try and be too efficent, in this case, removing a bunch of rows and sorting the smaller dataset, which is great from a performance standpoint.
You should be able to update your advanced query with the items in blue:
#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Application Date", Order.Descending}}), #"Buffer Sorted Table" = Table.Buffer(#"Sorted Rows"), #"Removed Duplicates" = Table.Distinct(#"Buffer Sorted Table", {"DSI"}),
If this doesn't make sense, or doesn't actually work, you can try the solution that was talked about in message #6 of the thread you linked. That method is going through and finding the distinct rows yourself instead of letting Table.Distinct do it for you (and potentially get it wrong)
Thanks @Cmcmahan , this is exactly what I needed! Worked perfectly! I figured it'd be simple, but didn't want to mess up the syntax.
Appreciate the assist!
User | Count |
---|---|
85 | |
74 | |
71 | |
68 | |
56 |
User | Count |
---|---|
96 | |
94 | |
92 | |
78 | |
71 |