Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

A complex remove duplicates. Please help.

Example data:

Row #New Hire NameStart Date
1.John_Doe

1/1/2020

2.John_Doe5/1/2020
3.John_Doenull
4.Jane_Doenull

 

I need to remove line 3, but if I select both columns New Hire Name and Start Date and hit remove duplicates it will keep all of them. If I only use the New Hire name column. It will delete lines 2 and 3. I am pretty new to DAX and power query, but is there a way to do this?

1 ACCEPTED SOLUTION
v-alq-msft
Community Support
Community Support

Hi, @Anonymous 

 

Based on your description, I created data to reproduce your scenario. The pbix file is attached in the end.

Table:

b1.png

 

You may add a new step like below.

= Table.SelectRows(#"Changed Type",each 
let x = [New Hire Name],
y=Table.RowCount(
    Table.SelectRows(#"Changed Type",each [New Hire Name]=x)
)
in
(y>1 and [Start Date]<>null) or y=1
)

 

Result:

b2.png

 

Best Regards

Allan

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

8 REPLIES 8
v-alq-msft
Community Support
Community Support

Hi, @Anonymous 

 

Based on your description, I created data to reproduce your scenario. The pbix file is attached in the end.

Table:

b1.png

 

You may add a new step like below.

= Table.SelectRows(#"Changed Type",each 
let x = [New Hire Name],
y=Table.RowCount(
    Table.SelectRows(#"Changed Type",each [New Hire Name]=x)
)
in
(y>1 and [Start Date]<>null) or y=1
)

 

Result:

b2.png

 

Best Regards

Allan

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

watkinnc
Super User
Super User

You could just use the Fill Down function in the GUI on the Date column, and then your next step would be Table.Distinct(PriorStep)


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!
edhans
Super User
Super User

Hi @Anonymous 

without more specifics hard to say, but take a look at this. It turns this:

edhans_0-1604609003814.png

into this:

edhans_1-1604609033357.png

I used this code:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyIt3yU9V0lEy1DfUNzIwMlCK1UERN8UhnleakwMRS8xLRRGLBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"New Hire Name" = _t, #"Start Date" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Start Date", type date}}),
    #"Grouped Rows" = 
        Table.Group(
            #"Changed Type", 
            {"New Hire Name"}, 
            {
                {"AllRows", 
                each if Table.RowCount(_) = 1 then _ else
                    Table.Distinct(
                        Table.SelectRows(_, each [Start Date] <> null)
                    ), 
                type table [New Hire Name=nullable text, Start Date=nullable date]
                }
            }
        ),
    #"Expanded AllRows" = Table.ExpandTableColumn(#"Grouped Rows", "AllRows", {"Start Date"}, {"Start Date"})
in
    #"Expanded AllRows"

 

The logic I used was this:

  • Group all data by the New Hire Name
  • Count the rows in the nested tables. If there is just one record keep it.
  • If there are 2+ records, remove any rows with null, then do a Table.Distinct() over that result to remove any duplicate records. So if John Doe had two rows that were 5/1/2020, my logic would keep only one of those.
  • Expanded the Start Date column again.

 

If that isn't what you want, please be more specific on how the logic should work.

 

How to use M code provided in a blank query:
1) In Power Query, select New Source, then Blank Query
2) On the Home ribbon, select "Advanced Editor" button
3) Remove everything you see, then paste the M code I've given you in that box.
4) Press Done
5) See this article if you need help using this M code in your model.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
Fowmy
Super User
Super User

@Anonymous 

What is your logic to remove the record?

Do want to remove the lines for new hire names that have at least one record with start date?

________________________

If my answer was helpful, please consider Accept it as the solution to help the other members find it

Click on the Thumbs-Up icon if you like this reply 🙂

YouTube  LinkedIn

Did I answer your question? Mark my post as a solution! and hit thumbs up


Subscribe and learn Power BI from these videos

Website LinkedIn PBI User Group

Anonymous
Not applicable

I need to show if a new hire has been hired again, but also need to show if their name wasn't included in the new hire list provided to me. I am basically given two new hire list and one has a bunch of info with it including start dates and the other is pretty much just their name. If they are in the second list that has no info, but not in the first list, I need to keep it so I can get the rest of their info, but if they are in the first list multiple times, I need to keep all instances of their name in that list. 

this might be a nestedjoin problem. you can provide some sample data of these two tables.

 

camargos88
Community Champion
Community Champion

@Anonymous ,

 

It doesn't like you need to remove the duplicated, actually you don't have any.

Does it work if you remove when there is more than 1 row for the New Hire column and delete the null row ?



Did I answer your question? Mark my post as a solution!

Proud to be a Super User!



Anonymous
Not applicable

I am not sure I understand your response.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors