cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Frequent Visitor

Remove duplicate rows based on max value of a different column

Hello!

 

I have a set of data that I am trying to remove duplicates from based on the number of filled cells per row. The data is form submissions that have a person's email address and then a number of different fields that could be filled, ranging from 1 - 16 filled cells. I was able to get make a column that tells me how many cells are filled for each of my rows which basically gives me a max. I have removed the duplicates that have the same email address and the same amount of filled cells but now I need to remove any duplicates (based on the email address) that have less than the max amount of filled cells for each email address. I found another a solution in the forums to do this with DAX but I need to be able to do this within the Query Editor so that once I have removed the duplicates I can append this data with other ones. 

 

Here is a snap shot of what the query currently looks like, and I have marked the ones that I would want removed: 

query.jpg

 

Thanks! 

1 ACCEPTED SOLUTION

Yes:

 

Specializing in Power Query Formula Language (M)

View solution in original post

5 REPLIES 5
Solution Sage
Solution Sage

I do something similar in one of my reports that should work here (given the lack of incremental load), where I've merged two queries, a bunch of existing data which I give a number 1 in a custom field, and new data, which I give the number 2 in the same custom field. If I sort it on that column in descending order, it gets rid of any old data that's been updated, so the same principle should work for your problem - sort it descending by your #filled value then remove duplicates on your email column?

Thanks for the response , however that doesn't seem to work. Even when I sort by decending and then remove duplicate emails it still takes away some of the emails that have the highest number of filled cells. 

 

For example:

before-

 before.jpg

after -

after.jpg

 

you can see that I now lost the entry with 16 filled cells. 

A known issue with sorting and removing duplicates is that you need to buffer the table in between.

 

You might wrap your sort step like: Table.Buffer(your sortcode).

Specializing in Power Query Formula Language (M)

@MarcelBeug,

still relatively new to Power BI, what does "wrap your sort step like: Table.Buffer(your sortcode)" actually mean? are you able to show me by chance? 

Yes:

 

Specializing in Power Query Formula Language (M)

View solution in original post

Helpful resources

Announcements
secondImage

Happy New Year from Power BI

This is a must watch for a message from Power BI!

December Update

Check it Out!

Click here to read more about the December 2020 Updates!

Community Blog

Check it Out!

Click here to read the latest blog and learn more about contributing to the Power BI blog!

Get Ready for Power BI Dev Camp

Get Ready for Power BI Dev Camp

Mark your calendars and join us for our next Power BI Dev Camp!.

Top Solution Authors
Top Kudoed Authors