Re: How to remove rows based on a condition

sbadiger3 · ‎02-07-2024

Hi everyone, I have a table in power query. There is column in the table by name 'order_line_id'. This column has duplicates which ideally shouldn't. I need to remove the rows with duplicate order_line_id. However, if I remove the duplicates, power query is removing the second line from the duplicate rows. I want the first row of duplicate to be removed rather than the second one. To make it easier to understand, consider the following column below as an example.

In the above column, the duplicate order_line_id has been highlighted. Now, if I try to remove duplicates using a option in power query, it removes the last row in this case. But I want the last row to be retained and first row to be removed.

Can someone please help to get this one right?
Thanks in advance!

smpa01 · ‎02-09-2024

@sbadiger3

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyNjE1U9JRSlSK1QFxgTxzIDcJygXKGgG5yQguSHGKUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, validation = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Column1"}, {{"ad", each Table.FirstN(Table.Sort(Table.AddIndexColumn(_, "Index", 1, 1),{{"Index", Order.Descending}}),1)}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows",{"ad"}),
    #"Expanded ad" = Table.ExpandTableColumn(#"Removed Other Columns", "ad", {"Column1", "validation"}, {"Column1", "validation"})
in
    #"Expanded ad"

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

My custom visualization projects

Plotting Live Sound: Viz1

Beautiful News:Viz1, Viz2, Viz3

Visual Capitalist: Working Hrs

Others:Easing Graph, Animated Calendar

Vijay_A_Verma · ‎02-08-2024

When you are using Table.Distinct to remove after adding Index column and sorting on descending of Index column, you will need to buffer the table. Example is below where #"Sorted Rows" was my previous step.

= Table.Distinct(Table.Buffer(#"Sorted Rows"), {"order_line_id"})

sbadiger3 · ‎02-09-2024

@Vijay_A_Verma What exactly does buffer do?

Vijay_A_Verma · ‎02-09-2024

This is explained here very well - https://www.daytodatastuff.co.uk/blog/m-deepdive-buffer

Gayatri_D05 · ‎02-08-2024

Hi @sbadiger3 ,
A workaround to get the deseried output could be:
Making an index column based on the id and then deleting the index id 1 where the occurenec would be more than one. I think using the logic you would be able to work through the data.

If your requirement is solved, please make THIS ANSWER as SOLUTION and help other users find the solution quickly. Please hit the LIKE button if this comment helps you. 😊

sbadiger3 · ‎02-08-2024

@Gayatri_D05 could you please help me understand how to create an index on that specific column?

Gayatri_D05 · ‎02-08-2024

Hi @sbadiger3 ,
I tried it out on the dummy data.

According to our scenairo we want to keep the highlighted values i.e 15,25.
AFter loading the data into Powerbi i just created a index column from the add column tab starting from 1. And then sorted my index column in a descending order

After that going into the advanced query I simply added this step :
#"Buffer table" = Table.Buffer (#"Sorted Rows")

And then removed the duolicates from the id column which gave me the desired output:

Sharing the Advanced editor query as well :


let
Source = Excel.Workbook(File.Contents("C:\Desktop\PowerBI Tasks\Data.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"id", Int64.Type}, {"values", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Sorted Rows" = Table.Sort(#"Added Index",{{"Index", Order.Descending}}),
#"Buffer table" = Table.Buffer (#"Sorted Rows"),
#"Removed Duplicates" = Table.Distinct(#"Buffer table", {"id"})
in
#"Removed Duplicates"

Please try out the above steps as it worked on the dummy data.

If your requirement is solved, please make THIS ANSWER as SOLUTION and help other users find the solution quickly. Please hit the LIKE button if this comment helps you. 😊

zenisekd · ‎02-08-2024

I guess you could add an index, to the table, then sort it descending and then apply remove the duplicates?

Kudos and mark as solution appreciated.

sbadiger3 · ‎02-08-2024

This method isn't working! I tried adding an index column and then sorting it in decending order and then remove duplicates from order_line_id. No matter I sort it or leave it. It's removing the line that I want to retain!!😵

How to remove rows based on a condition

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024