cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
heytherejem
Post Prodigy
Post Prodigy

Keep only the latest date for duplicate entries

So I have a dataset with [Member ID, [Member Date of Birth] and [Updated Date]

Updated Date is when the member profile was last edited.

 

Some poor Members have more than one date of birth in my dataset. I want to keep the date of birth from the last time their profile was edited. So the MAX of [Updated Date] based on [Member ID]. 

 

So I would have a single instance of each [Member ID] and  single instance of [Member Date of birth]. At the moment, I have some duplicate Member ID.

 

Sample.PNG

 

Attached example - as you can see Member 1 and Member 5 have different dates of birth. I would want to keep the 02/01/1969 dob for User 1, and 12/12/1972 dob for User 5. 

1 ACCEPTED SOLUTION

Capture.PNG

paste following code in Power Query Advanced Editor and you should get desired result as long as your input data called Table1 (otherwise you need to change it accordingly). if you do it Power BI, GroupedRows step should reference your input table rather than Table1

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    GroupedRows = Table.Group(Source, {"HH Member ID"}, {{"Count", each _, type table}}),
    ExtractMostUpdated = (TT as table) =>
  Table.SelectRows(TT, each [HH User Updated Dt] = List.Max(TT[HH User Updated Dt])),
    #"Added Custom" = Table.AddColumn(GroupedRows, "Custom", each ExtractMostUpdated([Count])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"HH Member ID", "Count"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"HH Member ID", "HH User Updated Dt", "HH Member DoB"}, {"HH Member ID", "HH User Updated Dt", "HH Member DoB"})
in
    #"Expanded Custom"

 

View solution in original post

10 REPLIES 10
bogomda
Helper II
Helper II

@heytherejem  you can use DAX or also use Power QUery to prep the data and remove duplicate entries. Here is DAX you can use, add user and new measure in the table to see the result

 

Latest DOB = 
CALCULATE(
MAX( Table2[DOB] ),
FILTER(
ALLEXCEPT(Table2,Table2[User]),
Table2[UpdateDate] = MAX( Table2[UpdateDate] )
)
)





Did I answer your question? Mark my post as a solution.

Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.





Hi @parry2k 

 

How would I use Power Query to prep the data? I would rather actually bring through only unique entries before Loading the data...

 

I've tried grouping but it hasn't worked so far. 

 

Thanks,

Jemma

here is the link to Power Query solution i posted earlier

https://1drv.ms/x/s!AhUWZ84uo7UAgl48kvkwm1THdmsj

@bogomda i'm sorry the link is blocked by my employer... any way you can copy paste the answer here?

Capture.PNG

paste following code in Power Query Advanced Editor and you should get desired result as long as your input data called Table1 (otherwise you need to change it accordingly). if you do it Power BI, GroupedRows step should reference your input table rather than Table1

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    GroupedRows = Table.Group(Source, {"HH Member ID"}, {{"Count", each _, type table}}),
    ExtractMostUpdated = (TT as table) =>
  Table.SelectRows(TT, each [HH User Updated Dt] = List.Max(TT[HH User Updated Dt])),
    #"Added Custom" = Table.AddColumn(GroupedRows, "Custom", each ExtractMostUpdated([Count])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"HH Member ID", "Count"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"HH Member ID", "HH User Updated Dt", "HH Member DoB"}, {"HH Member ID", "HH User Updated Dt", "HH Member DoB"})
in
    #"Expanded Custom"

 

View solution in original post

I'd like to do something similar, but with a combination of multiple columns. 

For example, if a user uploads two sets of identical data in columns 'product_name', 'product_attribute_1', 'product_attribute_2','title', 'value_type' & only changes the 'value' column on different days, I need only the most recent entry to be considered. Essentially, a user "fixed" their original entry.

I have an upload_date column which would be easiest to key off of.

I was thinking of doing a calculated column where the entry is "Keep" if the record is the most recent, unique combination of attributes & "Delete" if the record has been overridden by another more recent record.

Can someone please assist ? 🙂

@bogomda  wow. This worked, you're a genius, thanks so much! 

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], GroupedRows = Table.Group(Source, {"HH Member ID"}, {{"Count", each _, type table}}), ExtractMostUpdated = (TT as table) => Table.SelectRows(TT, each [HH User Updated Dt] = List.Max(TT[HH User Updated Dt])), #"Added Custom" = Table.AddColumn(GroupedRows, "Custom", each ExtractMostUpdated([Count])), #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"HH Member ID", "Count"}), #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"HH Member ID", "HH User Updated Dt", "HH Member DoB"}, {"HH Member ID", "HH User Updated Dt", "HH Member DoB"}) in #"Expanded Custom"

parry2k
Super User III
Super User III

@heytherejem  when you said you want to keep dob from latest update, do you want to add it is as  seperate column or what?






Did I answer your question? Mark my post as a solution.

Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.





Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS on Demand

Microsoft Business Applications Summit sessions

On-demand access to all the great content presented by the product teams and community members! #MSBizAppsSummit #CommunityRocks

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

Top Solution Authors
Top Kudoed Authors