cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Anonymous
Not applicable

Remove duplicates based on sort in Power Query

I have a table that has ID and date columns. I want to remove duplicates and only keep the most recent record for each ID. I thought I'd by able to simply by sorting either ascending or descending then removing duplicates, but it doesn't keep the first or last record in your sort. It only keeps the first record in the source dataset regardless of sort. Is there a way around this?

2 ACCEPTED SOLUTIONS
AlexisOlson
Super User
Super User

I've run into this before. I think it's a result of query folding not operating in the order that you'd expect.

 

The workaround that worked for me was to add an index column after sorting before removing duplicates. This breaks query folding and forces it to operate in the order you just defined by your sort.

View solution in original post

@AlexisOlson  @Anonymous

 

Do check this video from curbal.com. It beautifully explains the reason why changing Sort Order in Power query doesn't work for removing duplicates and using Table.Buffer to keep new Sorting Order in memory.

 

Curbal.com

 

Sharing this as it might be helpful in some other scenarios too..

View solution in original post

8 REPLIES 8
AlexisOlson
Super User
Super User

I've run into this before. I think it's a result of query folding not operating in the order that you'd expect.

 

The workaround that worked for me was to add an index column after sorting before removing duplicates. This breaks query folding and forces it to operate in the order you just defined by your sort.

View solution in original post

I need to remove duplicates based on keeping the second instance rather than the first, if I add an index and sort descending then remove duplicates should that work?

If there are always exactly two instances, this should work. Are there always at least two and no more than two?

Anonymous
Not applicable

Thanks Alexis! Worked perfectly.

@AlexisOlson  @Anonymous

 

Do check this video from curbal.com. It beautifully explains the reason why changing Sort Order in Power query doesn't work for removing duplicates and using Table.Buffer to keep new Sorting Order in memory.

 

Curbal.com

 

Sharing this as it might be helpful in some other scenarios too..

View solution in original post

@AnkitBI , are there advantages to using Table.Buffer versus using Index like @AlexisOlson suggested? Index worked for me, but I wanted to know if there are cases where it won't.

 

Thanks!

The underlying idea in both cases is to prevent the query from using the original cached table in memory (since it's not the order we want). It's conceivable that in some cases the internal engine might fold in the index step without reloading the table, which would sort of defeat the purpose of adding that step. Table.Buffer explicitly reloads the table to memory so I'd guess you have fewer edge cases to worry about.

 

(The above is somewhat speculative, so if anyone knows more definitively, please let me know.)

I tried using Table.Buffer today before removing duplicates, and I ended up freezing up my file. But adding Index seemed to perform better.

Helpful resources

Announcements
Power BI December 2021 Update_carousel 768x460.jpg

Check it Out!

Click here to read more about the December 2021 Updates!

Jan 2022 Dev Camp 768x460 copy.png

Power BI Dev Camp- January 27th, 2022

Mark your calendars and join us for our next Power BI Dev Camp!

UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

Top Solution Authors
Top Kudoed Authors