Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
heimk008
Frequent Visitor

Remove duplicate rows based on max value of a different column

Hello!

 

I have a set of data that I am trying to remove duplicates from based on the number of filled cells per row. The data is form submissions that have a person's email address and then a number of different fields that could be filled, ranging from 1 - 16 filled cells. I was able to get make a column that tells me how many cells are filled for each of my rows which basically gives me a max. I have removed the duplicates that have the same email address and the same amount of filled cells but now I need to remove any duplicates (based on the email address) that have less than the max amount of filled cells for each email address. I found another a solution in the forums to do this with DAX but I need to be able to do this within the Query Editor so that once I have removed the duplicates I can append this data with other ones. 

 

Here is a snap shot of what the query currently looks like, and I have marked the ones that I would want removed: 

query.jpg

 

Thanks! 

1 ACCEPTED SOLUTION

Yes:

 

Specializing in Power Query Formula Language (M)

View solution in original post

5 REPLIES 5
jthomson
Solution Sage
Solution Sage

I do something similar in one of my reports that should work here (given the lack of incremental load), where I've merged two queries, a bunch of existing data which I give a number 1 in a custom field, and new data, which I give the number 2 in the same custom field. If I sort it on that column in descending order, it gets rid of any old data that's been updated, so the same principle should work for your problem - sort it descending by your #filled value then remove duplicates on your email column?

Thanks for the response , however that doesn't seem to work. Even when I sort by decending and then remove duplicate emails it still takes away some of the emails that have the highest number of filled cells. 

 

For example:

before-

 before.jpg

after -

after.jpg

 

you can see that I now lost the entry with 16 filled cells. 

A known issue with sorting and removing duplicates is that you need to buffer the table in between.

 

You might wrap your sort step like: Table.Buffer(your sortcode).

Specializing in Power Query Formula Language (M)

@MarcelBeug,

still relatively new to Power BI, what does "wrap your sort step like: Table.Buffer(your sortcode)" actually mean? are you able to show me by chance? 

Yes:

 

Specializing in Power Query Formula Language (M)

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.