Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Remove duplicates based on sort in Power Query

I have a table that has ID and date columns. I want to remove duplicates and only keep the most recent record for each ID. I thought I'd by able to simply by sorting either ascending or descending then removing duplicates, but it doesn't keep the first or last record in your sort. It only keeps the first record in the source dataset regardless of sort. Is there a way around this?

2 ACCEPTED SOLUTIONS
AlexisOlson
Super User
Super User

I've run into this before. I think it's a result of query folding not operating in the order that you'd expect.

 

The workaround that worked for me was to add an index column after sorting before removing duplicates. This breaks query folding and forces it to operate in the order you just defined by your sort.

View solution in original post

@AlexisOlson  @Anonymous

 

Do check this video from curbal.com. It beautifully explains the reason why changing Sort Order in Power query doesn't work for removing duplicates and using Table.Buffer to keep new Sorting Order in memory.

 

Curbal.com

 

Sharing this as it might be helpful in some other scenarios too..

View solution in original post

11 REPLIES 11
Nilselmano
Frequent Visitor

Maybe it is possible to use the function that was added in 2022, Table.StopFolding( SortedTable ), to break folding and keep the sorting.

AlexisOlson
Super User
Super User

I've run into this before. I think it's a result of query folding not operating in the order that you'd expect.

 

The workaround that worked for me was to add an index column after sorting before removing duplicates. This breaks query folding and forces it to operate in the order you just defined by your sort.

Worked perfectly!

I need to remove duplicates based on keeping the second instance rather than the first, if I add an index and sort descending then remove duplicates should that work?

If there are always exactly two instances, this should work. Are there always at least two and no more than two?

Anonymous
Not applicable

Thanks Alexis! Worked perfectly.

@AlexisOlson  @Anonymous

 

Do check this video from curbal.com. It beautifully explains the reason why changing Sort Order in Power query doesn't work for removing duplicates and using Table.Buffer to keep new Sorting Order in memory.

 

Curbal.com

 

Sharing this as it might be helpful in some other scenarios too..

@AnkitBI , are there advantages to using Table.Buffer versus using Index like @AlexisOlson suggested? Index worked for me, but I wanted to know if there are cases where it won't.

 

Thanks!

The underlying idea in both cases is to prevent the query from using the original cached table in memory (since it's not the order we want). It's conceivable that in some cases the internal engine might fold in the index step without reloading the table, which would sort of defeat the purpose of adding that step. Table.Buffer explicitly reloads the table to memory so I'd guess you have fewer edge cases to worry about.

 

(The above is somewhat speculative, so if anyone knows more definitively, please let me know.)

I tried using Table.Buffer today before removing duplicates, and I ended up freezing up my file. But adding Index seemed to perform better.

But how does add index helps in solving the issue can you elaborate

 

Thanks in advance 😊

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.