Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Keep only the latest date for duplicate entries

So I have a dataset with [Member ID, [Member Date of Birth] and [Updated Date]

Updated Date is when the member profile was last edited.

 

Some poor Members have more than one date of birth in my dataset. I want to keep the date of birth from the last time their profile was edited. So the MAX of [Updated Date] based on [Member ID]. 

 

So I would have a single instance of each [Member ID] and  single instance of [Member Date of birth]. At the moment, I have some duplicate Member ID.

 

Sample.PNG

 

Attached example - as you can see Member 1 and Member 5 have different dates of birth. I would want to keep the 02/01/1969 dob for User 1, and 12/12/1972 dob for User 5. 

1 ACCEPTED SOLUTION

Capture.PNG

paste following code in Power Query Advanced Editor and you should get desired result as long as your input data called Table1 (otherwise you need to change it accordingly). if you do it Power BI, GroupedRows step should reference your input table rather than Table1

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    GroupedRows = Table.Group(Source, {"HH Member ID"}, {{"Count", each _, type table}}),
    ExtractMostUpdated = (TT as table) =>
  Table.SelectRows(TT, each [HH User Updated Dt] = List.Max(TT[HH User Updated Dt])),
    #"Added Custom" = Table.AddColumn(GroupedRows, "Custom", each ExtractMostUpdated([Count])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"HH Member ID", "Count"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"HH Member ID", "HH User Updated Dt", "HH Member DoB"}, {"HH Member ID", "HH User Updated Dt", "HH Member DoB"})
in
    #"Expanded Custom"

 

View solution in original post

11 REPLIES 11
bogomda
Helper II
Helper II

@Anonymous  you can use DAX or also use Power QUery to prep the data and remove duplicate entries. Here is DAX you can use, add user and new measure in the table to see the result

 

Latest DOB = 
CALCULATE(
MAX( Table2[DOB] ),
FILTER(
ALLEXCEPT(Table2,Table2[User]),
Table2[UpdateDate] = MAX( Table2[UpdateDate] )
)
)


Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

Anonymous
Not applicable

Hi @parry2k 

 

How would I use Power Query to prep the data? I would rather actually bring through only unique entries before Loading the data...

 

I've tried grouping but it hasn't worked so far. 

 

Thanks,

Jemma

here is the link to Power Query solution i posted earlier

https://1drv.ms/x/s!AhUWZ84uo7UAgl48kvkwm1THdmsj

Anonymous
Not applicable

@bogomda i'm sorry the link is blocked by my employer... any way you can copy paste the answer here?

Capture.PNG

paste following code in Power Query Advanced Editor and you should get desired result as long as your input data called Table1 (otherwise you need to change it accordingly). if you do it Power BI, GroupedRows step should reference your input table rather than Table1

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    GroupedRows = Table.Group(Source, {"HH Member ID"}, {{"Count", each _, type table}}),
    ExtractMostUpdated = (TT as table) =>
  Table.SelectRows(TT, each [HH User Updated Dt] = List.Max(TT[HH User Updated Dt])),
    #"Added Custom" = Table.AddColumn(GroupedRows, "Custom", each ExtractMostUpdated([Count])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"HH Member ID", "Count"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"HH Member ID", "HH User Updated Dt", "HH Member DoB"}, {"HH Member ID", "HH User Updated Dt", "HH Member DoB"})
in
    #"Expanded Custom"

 

Thank you @bogomda it's a great solution!

took me couple of hours to get it working on my data now it's fun 😄 

Anonymous
Not applicable

I'd like to do something similar, but with a combination of multiple columns. 

For example, if a user uploads two sets of identical data in columns 'product_name', 'product_attribute_1', 'product_attribute_2','title', 'value_type' & only changes the 'value' column on different days, I need only the most recent entry to be considered. Essentially, a user "fixed" their original entry.

I have an upload_date column which would be easiest to key off of.

I was thinking of doing a calculated column where the entry is "Keep" if the record is the most recent, unique combination of attributes & "Delete" if the record has been overridden by another more recent record.

Can someone please assist ? 🙂

Anonymous
Not applicable

@bogomda  wow. This worked, you're a genius, thanks so much! 

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], GroupedRows = Table.Group(Source, {"HH Member ID"}, {{"Count", each _, type table}}), ExtractMostUpdated = (TT as table) => Table.SelectRows(TT, each [HH User Updated Dt] = List.Max(TT[HH User Updated Dt])), #"Added Custom" = Table.AddColumn(GroupedRows, "Custom", each ExtractMostUpdated([Count])), #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"HH Member ID", "Count"}), #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"HH Member ID", "HH User Updated Dt", "HH Member DoB"}, {"HH Member ID", "HH User Updated Dt", "HH Member DoB"}) in #"Expanded Custom"

parry2k
Super User
Super User

@Anonymous  when you said you want to keep dob from latest update, do you want to add it is as  seperate column or what?



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.