Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
Mark1970
New Member

How to remove duplicates based on multiple criteria

I have a table with the following data from which I want to keep the green records and remove the yellow ones. Should I do this in powerquery or stick to Dax in power BI. I am using the table in Power BI for a dashboard.

Mark1970_0-1715852637528.png

 



1 ACCEPTED SOLUTION
BA_Pete
Super User
Super User

Hi @Mark1970 ,

 

As this is a data structure operation, I'd recommend doing it in Power Query.

I'm assuming the filter logic is "filter out any row with no [WorkCenterCode], [Year], and [Week] value, unless it's the only [Order-Opr] value". If this is correct, then try this example code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjIwNDc10DU0NFDSUYKiWB2EuBFI3NwARBoZGJlAKUMjTEVYNBuDxC3MTNE0GxOjyARTEcyGWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Order-Opr" = _t, WorkCenterCode = _t, Year = _t, Week = _t]),
    repBlankNull = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"WorkCenterCode", "Year", "Week"}),

// Relevant steps from here =====>
    groupOrderOpr = Table.Group(repBlankNull, {"Order-Opr"}, {{"count", each Table.RowCount(_), Int64.Type}, {"data", each _, type table [#"Order-Opr"=nullable text, WorkCenterCode=nullable text, Year=nullable text, Week=nullable text]}}),
    expandData = Table.ExpandTableColumn(groupOrderOpr, "data", {"WorkCenterCode", "Year", "Week"}, {"WorkCenterCode", "Year", "Week"}),
    filterRows = Table.SelectRows(expandData, each [count] = 1 or ([count] > 1 and not ([WorkCenterCode] = null and [Year] = null and [Week] = null)))
    
in
    filterRows

 

Summary:

groupOrderOpr = Group the table on the [Ordedr-Opr] column, adding a count column and an All Rows column as aggregates.

expandData = Expand your original columns back out of the nested tables.

filterRows = Apply a logical filter, using the new count column we just created to ensure single-row values are kept even if target fields are null.

 

The example code turns this:

BA_Pete_0-1715864096721.png

 

...into this:

BA_Pete_1-1715864129344.png

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

2 REPLIES 2
BA_Pete
Super User
Super User

Hi @Mark1970 ,

 

As this is a data structure operation, I'd recommend doing it in Power Query.

I'm assuming the filter logic is "filter out any row with no [WorkCenterCode], [Year], and [Week] value, unless it's the only [Order-Opr] value". If this is correct, then try this example code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjIwNDc10DU0NFDSUYKiWB2EuBFI3NwARBoZGJlAKUMjTEVYNBuDxC3MTNE0GxOjyARTEcyGWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Order-Opr" = _t, WorkCenterCode = _t, Year = _t, Week = _t]),
    repBlankNull = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"WorkCenterCode", "Year", "Week"}),

// Relevant steps from here =====>
    groupOrderOpr = Table.Group(repBlankNull, {"Order-Opr"}, {{"count", each Table.RowCount(_), Int64.Type}, {"data", each _, type table [#"Order-Opr"=nullable text, WorkCenterCode=nullable text, Year=nullable text, Week=nullable text]}}),
    expandData = Table.ExpandTableColumn(groupOrderOpr, "data", {"WorkCenterCode", "Year", "Week"}, {"WorkCenterCode", "Year", "Week"}),
    filterRows = Table.SelectRows(expandData, each [count] = 1 or ([count] > 1 and not ([WorkCenterCode] = null and [Year] = null and [Week] = null)))
    
in
    filterRows

 

Summary:

groupOrderOpr = Group the table on the [Ordedr-Opr] column, adding a count column and an All Rows column as aggregates.

expandData = Expand your original columns back out of the nested tables.

filterRows = Apply a logical filter, using the new count column we just created to ensure single-row values are kept even if target fields are null.

 

The example code turns this:

BA_Pete_0-1715864096721.png

 

...into this:

BA_Pete_1-1715864129344.png

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Thanks!

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Power BI Carousel June 2024

Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Top Solution Authors
Top Kudoed Authors