Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
jhartranft60
Advocate IV
Advocate IV

How do I buffer my table in Power Query?

In Power Query, I'm trying to sort a table and remove duplicates.  The table sort is to ensure that the record I want to keep is first in the table and all subsequent records will come later and be removed.  I've found that Power Query isn't always taking the sort into consideration when removing duplicates and I need to buffer the table after the sort in order to properly remove the duplicates.

See: https://community.powerbi.com/t5/Desktop/Selectively-remove-duplicate-rows/td-p/334489 (The table buffer reference is a few responses down the thread...Message 5, to be exact)


Unfortunately, I'm not entirely sure what the heck is going on in the very detailed & complicated example provided in the thread.  I'm knowledgeable enough to manipulate the Advanced Editor and even fake my way through adding some of my own lines if I have something to reference, just not entirely sure what I should be adding in this case.  Here's the current code from my Advanced Editor for the row sort and duplicate removal.  Hoping someone will be able to help guide me as to what to add to it:

 

#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Application Date", Order.Descending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"DSI"}),

 

Ideas?

 

 

1 ACCEPTED SOLUTION
Cmcmahan
Resident Rockstar
Resident Rockstar

I haven't tested it, but it looks like you just need to add Table.Buffer to force PowerQuery to actually sort the table before duplicate removal.  Otherwise, PowerQuery has a tendency to try and be too efficent, in this case, removing a bunch of rows and sorting the smaller dataset, which is great from a performance standpoint.

 

You should be able to update your advanced query with the items in blue:

#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Application Date", Order.Descending}}),
#"Buffer Sorted Table" = Table.Buffer(#"Sorted Rows"), 
#"Removed Duplicates" = Table.Distinct(#"Buffer Sorted Table", {"DSI"}),

If this doesn't make sense, or doesn't actually work, you can try the solution that was talked about in message #6 of the thread you linked.  That method is going through and finding the distinct rows yourself instead of letting Table.Distinct do it for you (and potentially get it wrong)

View solution in original post

2 REPLIES 2
Cmcmahan
Resident Rockstar
Resident Rockstar

I haven't tested it, but it looks like you just need to add Table.Buffer to force PowerQuery to actually sort the table before duplicate removal.  Otherwise, PowerQuery has a tendency to try and be too efficent, in this case, removing a bunch of rows and sorting the smaller dataset, which is great from a performance standpoint.

 

You should be able to update your advanced query with the items in blue:

#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Application Date", Order.Descending}}),
#"Buffer Sorted Table" = Table.Buffer(#"Sorted Rows"), 
#"Removed Duplicates" = Table.Distinct(#"Buffer Sorted Table", {"DSI"}),

If this doesn't make sense, or doesn't actually work, you can try the solution that was talked about in message #6 of the thread you linked.  That method is going through and finding the distinct rows yourself instead of letting Table.Distinct do it for you (and potentially get it wrong)

Thanks @Cmcmahan , this is exactly what I needed! Worked perfectly! I figured it'd be simple, but didn't want to mess up the syntax.  

 

Appreciate the assist!

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.