Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
basic
Frequent Visitor

Remove duplicates selectively based on priority order

Finally gave up looking for examples to learn from, and decided to post my first question to the community.

 

Hello, i have a table similar to this:

 

Row#     Col A     Col B

1            A1          B1

2            A1          B2

3            A1          B3

4            A2          B2

5            A2          B3

6            A3          B1

7            A3          B3

 

I would like to remove duplicates on Col A such that -- whenever Col B = B1, I want to keep that row.  So between rows 1, 2 and 3, I would like to keep row 1.  Between rows 6 and 7, I will keep row 6.

 

When Col A has duplicated and Col B doesn't have any B1 (a specific value), I don't care which row I keep as long as I keep only one.  So between rows 4 and 5, I could keep either, doesn't matter.  

 

My resulting table should look like this:

 

Row#     Col A     Col B

1            A1          B1

4            A2          B2

6            A3          B1

 

OR

 

Row#     Col A     Col B

1            A1          B1

5            A2          B3

6            A3          B1

 

(as i said above, rows 4 and 5 mean the same to me)

 

Can anyone help out please?

1 ACCEPTED SOLUTION

Thanks Zubair.  Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time.  That makes the remove duplicates function not suitable for my requirement.  

I got help from a colleague and ended up learning a bit of M to get the job done.  My code looks something like this:

 

let
     Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
     #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
     #"A1 list" = Table.Column(#"B1 rows", "Col A"),
     #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
     #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
     #"Appended Query"

 

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer.  But it gets the job done, and that's what i care about.

View solution in original post

3 REPLIES 3
Zubair_Muhammad
Community Champion
Community Champion

@basic 

 

If your data is sorted  (as in your sample data) i.e. B1 is the first row for each unique ColA item

 

you simple have to select Col A>>right click>>remove duplicates

 

removeduplicates.png


Regards
Zubair

Please try my custom visuals

Thanks Zubair.  Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time.  That makes the remove duplicates function not suitable for my requirement.  

I got help from a colleague and ended up learning a bit of M to get the job done.  My code looks something like this:

 

let
     Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
     #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
     #"A1 list" = Table.Column(#"B1 rows", "Col A"),
     #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
     #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
     #"Appended Query"

 

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer.  But it gets the job done, and that's what i care about.

Thanks Zubair.  Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time.  That makes the remove duplicates function is not suitable for my requirement.  

I got help from a colleague and ended up learning a bit of M to get the job done.  My code looks something like this:

 

let
   Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
   #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
   #"A1 list" = Table.Column(#"B1 rows", "Col A"),
   #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
   #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
   #"Appended Query"

 

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer.  But it gets the job done, and that's what i care about.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors