Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
sbadiger3
Helper I
Helper I

How to remove rows based on a condition

Hi everyone, I have a table in power query. There is column in the table by name 'order_line_id'. This column has duplicates which ideally shouldn't. I need to remove the rows with duplicate order_line_id. However, if I remove the duplicates, power query is removing the second line from the duplicate rows. I want the first row of duplicate to be removed rather than the second one. To make it easier to understand, consider the following column below as an example. 

 

sbadiger3_0-1707378446780.png

 

In the above column, the duplicate order_line_id has been highlighted. Now, if I try to remove duplicates using a option in power query, it removes the last row in this case. But I want the last row to be retained and first row to be removed. 

Can someone please help to get this one right? 
Thanks in advance! 

9 REPLIES 9
smpa01
Super User
Super User

@sbadiger3 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyNjE1U9JRSlSK1QFxgTxzIDcJygXKGgG5yQguSHGKUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, validation = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Column1"}, {{"ad", each Table.FirstN(Table.Sort(Table.AddIndexColumn(_, "Index", 1, 1),{{"Index", Order.Descending}}),1)}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows",{"ad"}),
    #"Expanded ad" = Table.ExpandTableColumn(#"Removed Other Columns", "ad", {"Column1", "validation"}, {"Column1", "validation"})
in
    #"Expanded ad"
Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
Vijay_A_Verma
Super User
Super User

When you are using Table.Distinct to remove after adding Index column and sorting on descending of Index column, you will need to buffer the table. Example is below where #"Sorted Rows" was my previous step.

 

= Table.Distinct(Table.Buffer(#"Sorted Rows"), {"order_line_id"})

 

 

@Vijay_A_Verma What exactly does buffer do? 

This is explained here very well - https://www.daytodatastuff.co.uk/blog/m-deepdive-buffer

Gayatri_D05
Resolver II
Resolver II

Hi @sbadiger3 ,
A workaround to get the deseried output could be: 
Making an index column based on the id and then deleting the index id 1 where the occurenec would be more than one. I think using the logic you would be able to work through the data.


If your requirement is solved, please make THIS ANSWER as SOLUTION and help other users find the solution quickly. Please hit the LIKE button if this comment helps you. 😊


@Gayatri_D05 could you please help me understand how to create an index on that specific column?

Hi @sbadiger3 ,
I tried it out on the dummy data.

Gayatri_D05_0-1707388173023.png

According to our scenairo we want to keep the highlighted values i.e 15,25.
AFter loading the data into Powerbi i just created a index column from the add column tab starting from 1. And then sorted my index column in a descending order

Gayatri_D05_2-1707388282257.png

 After that going into the advanced query I simply added this step :
#"Buffer table" = Table.Buffer (#"Sorted Rows")

And then removed the duolicates from the id column which gave me the desired output:
Gayatri_D05_4-1707388420848.png
Sharing the Advanced editor query as well :


let
Source = Excel.Workbook(File.Contents("C:\Desktop\PowerBI Tasks\Data.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"id", Int64.Type}, {"values", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Sorted Rows" = Table.Sort(#"Added Index",{{"Index", Order.Descending}}),
#"Buffer table" = Table.Buffer (#"Sorted Rows"),
#"Removed Duplicates" = Table.Distinct(#"Buffer table", {"id"})
in
#"Removed Duplicates"

 

Please try out the above steps as it worked on the dummy data.

If your requirement is solved, please make THIS ANSWER as SOLUTION and help other users find the solution quickly. Please hit the LIKE button if this comment helps you. 😊





zenisekd
Super User
Super User

I guess you could add an index, to the table, then sort it descending and then apply remove the duplicates?

Kudos and mark as solution appreciated. 

This method isn't working! I tried adding an index column and then sorting it in decending order and then remove duplicates from order_line_id. No matter I sort it or leave it. It's removing the line that I want to retain!!😵

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors