cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
heimk008 Frequent Visitor
Frequent Visitor

Remove duplicate rows based on max value of a different column

Hello!

 

I have a set of data that I am trying to remove duplicates from based on the number of filled cells per row. The data is form submissions that have a person's email address and then a number of different fields that could be filled, ranging from 1 - 16 filled cells. I was able to get make a column that tells me how many cells are filled for each of my rows which basically gives me a max. I have removed the duplicates that have the same email address and the same amount of filled cells but now I need to remove any duplicates (based on the email address) that have less than the max amount of filled cells for each email address. I found another a solution in the forums to do this with DAX but I need to be able to do this within the Query Editor so that once I have removed the duplicates I can append this data with other ones. 

 

Here is a snap shot of what the query currently looks like, and I have marked the ones that I would want removed: 

query.jpg

 

Thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Super User
Super User

Re: Remove duplicate rows based on max value of a different column

Yes:

 

Specializing in Power Query Formula Language (M)
5 REPLIES 5
Super User
Super User

Re: Remove duplicate rows based on max value of a different column

I do something similar in one of my reports that should work here (given the lack of incremental load), where I've merged two queries, a bunch of existing data which I give a number 1 in a custom field, and new data, which I give the number 2 in the same custom field. If I sort it on that column in descending order, it gets rid of any old data that's been updated, so the same principle should work for your problem - sort it descending by your #filled value then remove duplicates on your email column?

heimk008 Frequent Visitor
Frequent Visitor

Re: Remove duplicate rows based on max value of a different column

Thanks for the response , however that doesn't seem to work. Even when I sort by decending and then remove duplicate emails it still takes away some of the emails that have the highest number of filled cells. 

 

For example:

before-

 before.jpg

after -

after.jpg

 

you can see that I now lost the entry with 16 filled cells. 

Super User
Super User

Re: Remove duplicate rows based on max value of a different column

A known issue with sorting and removing duplicates is that you need to buffer the table in between.

 

You might wrap your sort step like: Table.Buffer(your sortcode).

Specializing in Power Query Formula Language (M)
heimk008 Frequent Visitor
Frequent Visitor

Re: Remove duplicate rows based on max value of a different column

@MarcelBeug,

still relatively new to Power BI, what does "wrap your sort step like: Table.Buffer(your sortcode)" actually mean? are you able to show me by chance? 

Highlighted
Super User
Super User

Re: Remove duplicate rows based on max value of a different column

Yes:

 

Specializing in Power Query Formula Language (M)