Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
DSwezey
Helper III
Helper III

Remove duplicates based on Multiple Columns

Hi there,

 

I have a table with 3 columns in it. Staff, Staff Position, Client.

Some staff members can have multiple positions with the same client. Some clients can have multiple staff members assigned to it.

See example below:

DSwezey_0-1641839435221.png

I want to filter out Manager positions where there are duplicate Staff & Client. Example above would be Bob. He is both a manager and partner for Google so I want to keep just 1 record of this and delete the manager row. 

 

Not sure how to go about this.

DSwezey_1-1641839514589.png

 

1 ACCEPTED SOLUTION
v-jingzhang
Community Support
Community Support

Hi @DSwezey 

 

You could try this method

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJR8k3MS0xPLQKy3PPz03NSlWJ1YFIBiUUleehSzhlFmcUo+hxzE6vy88CSjjmpFShyPvnlqcVgqaDKxDwUM6FSsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Staff = _t, #"Staff Position" = _t, Client = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Staff", type text}, {"Staff Position", type text}, {"Client", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Staff", "Client"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"AllData", each _, type table [Staff=nullable text, Staff Position=nullable text, Client=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each if [Count] > 1 then Table.SelectRows([AllData], each [Staff Position] <> "Manager") else [AllData]),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"Staff", "Staff Position", "Client"}, {"Staff", "Staff Position", "Client"})
in
    #"Expanded Custom"

vjingzhang_0-1642068371606.png

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

View solution in original post

4 REPLIES 4
v-jingzhang
Community Support
Community Support

Hi @DSwezey 

 

You could try this method

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJR8k3MS0xPLQKy3PPz03NSlWJ1YFIBiUUleehSzhlFmcUo+hxzE6vy88CSjjmpFShyPvnlqcVgqaDKxDwUM6FSsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Staff = _t, #"Staff Position" = _t, Client = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Staff", type text}, {"Staff Position", type text}, {"Client", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Staff", "Client"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"AllData", each _, type table [Staff=nullable text, Staff Position=nullable text, Client=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each if [Count] > 1 then Table.SelectRows([AllData], each [Staff Position] <> "Manager") else [AllData]),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"Staff", "Staff Position", "Client"}, {"Staff", "Staff Position", "Client"})
in
    #"Expanded Custom"

vjingzhang_0-1642068371606.png

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

ronrsnfld
Super User
Super User

To remove the duplicates, just select the columns you want to consider, and remove rows=>remove duplicates.  This method will retain the first in the list.

 

If you want control over which gets deleted, first sort your list appropriately so that the priority is at the top.

 

In your simplified example it would be a Z=>A sort of the staff position column, but I suspect your real data may be more complex.

The issue with that is There can be multiple clients for different Staff members. For example, Lowes is assigned to Alex and Ryan. I do not want to filter out any of that data even though Lowes is a duplicate. I want to filter out duplicate clients for Managers when the Staff is also a duplicate (i.e. Bob)

Did you actually try what I suggested?

It works fine on the data you provided and produces the output you show.

 

Please communicate exactly what you did, so I can assess the problem.

 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors