Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Deleting duplicate rows but choosing which row to delete

Hi All

 

In my dataset I have data that is duplicated in several rows but the only difference is that in one column there is a value for one of the rows and in the other row there is no value. 

I want to be able to remove the duplicate row that has the blank value in the one column or alternatively, add the same resource number to the blank field as then I can just remove duplicates and it won't matter which one is deleted. 

There are many of these I need to do, not just one or two. 

Below is an example of what I mean.

 

resource issue.PNG

 

 

 

I am doing this via Power Query in Power BI. 

If I choose all columns except the Resource column, then  delete duplicates, the row with the resource number in it, gets deleted. I need this one to stay. 

 

Any assistance anyone can provide me with will be appreciated. 

Thank you. 

 

Karen

1 ACCEPTED SOLUTION
BA_Pete
Super User
Super User

Hi @Anonymous ,

 

Select all columns except [Resource], do Group By, then add an aggregate column that is the MAX of [Resource].

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

9 REPLIES 9
serpiva64
Super User
Super User

Hi,

you can do a fill down like the one in the exemple:

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjRQ0lEyVIrVgTKNwEwjhKgRmiiYZYyQN0bIg5nGCCaCZaoUGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [SN = _t, Step = _t]),
#"Replaced Value" = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Step"}),
#"Grouped Rows" = Table.Group(#"Replaced Value", {"SN"}, {{"AllRows", each _, type table [SN=nullable text, Step=nullable text]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.FillDown([AllRows],{"Step"})),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"SN", "Step"}, {"SN.1", "Step"})
in
#"Expanded Custom"

 

then, as you thought, you can remove duplicates.

 

If this post is useful to help you to solve your issue consider giving the post a thumbs up 

 and accepting it as a solution !

BA_Pete
Super User
Super User

Hi @Anonymous ,

 

Select all columns except [Resource], do Group By, then add an aggregate column that is the MAX of [Resource].

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Anonymous
Not applicable

Hi BA_Pete,

 

It looks like this may have worked. Can I just confirm with you what this does? From my understanding, your solution is saying:

 

If all columns that I have highlighted are the same, group them under the one resource that is listed. If there is no resource, listed for dulplicated data in the columns, it will just return the one row with a blank cell in the resource column.  

 

Is that correct?

 

Karen

Hi @Anonymous ,

 

I don't think that's quite rght, but might just be that I've misunderstood your explanation.

I would describe it as the following, but also may not be a perfect explanation 🙂 :

 

For all the columns you selected to group, if they each have the same values as other rows in the same column, Power Query will turn them into just one row (as they are all the same value in each respective column anyway). Where you have whole rows that otherwise match, but the [Resource] column is different, we choose one value from the [Resource] column (the MAX value) so it can also be turned into one row with the rest. Given the choice between null/blank and any text or numerical value, the non-null/non-blank will always be evaluated as the 'MAX' value. If there is only one row but it has a null/blank [Resource] value, then the MAX of the choices PQ has is still null/blank, so you keep this row.

For other purposes, you could also choose to turn your columns that don't match with the rest of the row into SUM, MIN, AVG etc. so it can be resolved to a single row with the rest of the columns.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Anonymous
Not applicable

Great. Thanks Pete. Appreciate your time explaining that a bit more in depth. 

Karen

PhilipTreacy
Super User
Super User

Hi @Anonymous 

If the blank cells are actually empty

f1.png

 

Then filter out blanks

 

f6.png

f5.png

 

Regards

Phil



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Anonymous
Not applicable

Thanks Phil. I thought this sounded like an easy fix but unfortunately I do have other blank cells in that column that I need to keep as they aren't duplicates.  

Hi @Anonymous 

This is why it's important to provide a representative sample of your data so we can actually work with the data you are.

So if you need to keep some blanks and remove others, how do we distinguish between these blanks?

Please provide a good size sample of your data that includes all instances of the data you are working with.

regards

Phil



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Anonymous
Not applicable

Thanks Phil.

I will provide that for future queries I may have.

BA_Pete may have found me a solution. It appears to be doing what I need it to do so I will use that solution in this instance. Thank you for your prompt reply to my query though. It is appreciated.

My apologies for not providing you with more data to begin with. 

 

Karen

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors