Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi everyone, I have a table in power query. There is column in the table by name 'order_line_id'. This column has duplicates which ideally shouldn't. I need to remove the rows with duplicate order_line_id. However, if I remove the duplicates, power query is removing the second line from the duplicate rows. I want the first row of duplicate to be removed rather than the second one. To make it easier to understand, consider the following column below as an example.
In the above column, the duplicate order_line_id has been highlighted. Now, if I try to remove duplicates using a option in power query, it removes the last row in this case. But I want the last row to be retained and first row to be removed.
Can someone please help to get this one right?
Thanks in advance!
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyNjE1U9JRSlSK1QFxgTxzIDcJygXKGgG5yQguSHGKUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, validation = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Column1"}, {{"ad", each Table.FirstN(Table.Sort(Table.AddIndexColumn(_, "Index", 1, 1),{{"Index", Order.Descending}}),1)}}),
#"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows",{"ad"}),
#"Expanded ad" = Table.ExpandTableColumn(#"Removed Other Columns", "ad", {"Column1", "validation"}, {"Column1", "validation"})
in
#"Expanded ad"
When you are using Table.Distinct to remove after adding Index column and sorting on descending of Index column, you will need to buffer the table. Example is below where #"Sorted Rows" was my previous step.
= Table.Distinct(Table.Buffer(#"Sorted Rows"), {"order_line_id"})
This is explained here very well - https://www.daytodatastuff.co.uk/blog/m-deepdive-buffer
Hi @sbadiger3 ,
A workaround to get the deseried output could be:
Making an index column based on the id and then deleting the index id 1 where the occurenec would be more than one. I think using the logic you would be able to work through the data.
If your requirement is solved, please make THIS ANSWER as SOLUTION and help other users find the solution quickly. Please hit the LIKE button if this comment helps you. 😊
@Gayatri_D05 could you please help me understand how to create an index on that specific column?
Hi @sbadiger3 ,
I tried it out on the dummy data.
According to our scenairo we want to keep the highlighted values i.e 15,25.
AFter loading the data into Powerbi i just created a index column from the add column tab starting from 1. And then sorted my index column in a descending order
After that going into the advanced query I simply added this step :
#"Buffer table" = Table.Buffer (#"Sorted Rows")
And then removed the duolicates from the id column which gave me the desired output:
Sharing the Advanced editor query as well :
let
Source = Excel.Workbook(File.Contents("C:\Desktop\PowerBI Tasks\Data.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"id", Int64.Type}, {"values", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Sorted Rows" = Table.Sort(#"Added Index",{{"Index", Order.Descending}}),
#"Buffer table" = Table.Buffer (#"Sorted Rows"),
#"Removed Duplicates" = Table.Distinct(#"Buffer table", {"id"})
in
#"Removed Duplicates"
Please try out the above steps as it worked on the dummy data.
If your requirement is solved, please make THIS ANSWER as SOLUTION and help other users find the solution quickly. Please hit the LIKE button if this comment helps you. 😊
I guess you could add an index, to the table, then sort it descending and then apply remove the duplicates?
Kudos and mark as solution appreciated.
This method isn't working! I tried adding an index column and then sorting it in decending order and then remove duplicates from order_line_id. No matter I sort it or leave it. It's removing the line that I want to retain!!😵
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.