cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper III
Helper III

Power Query : removing duplicates and merging duplicate rows selectively into one

Hi all!

 

What would be the smart way to remove duplicates here?

 

I  need the first value from the column b and last value from column c. There can be even more than two rows for the same ID.

 

a.PNG

Thanks in advance for any ideas,

Jenni

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Solution Sage
Solution Sage

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

For the sake of easy understanding of the code, I have written the power query step by step. But you can combine several steps and make the query much denser.

 

let Source = Table.FromRecords(
        {
            [id = 1, date = #date(2018,2,3), b = 513, c = 423],
            [id = 2, date = #date(2018,5,1), b = 534, c = 432],
            [id = 2, date = #date(2019,1,18), b = 345, c = 5434],
            [id = 3, date = #date(2019,2,2), b = 354, c = 4534],
            [id = 4, date = #date(2019,4,6), b = 543, c = 544],
            [id = 5, date = #date(2019,5,5), b = 654, c = 432]
        }
    ),
    
    ChangeColumnTypes = Table.TransformColumnTypes(Source,
        {
            {"date", type date}, 
            {"id", Int64.Type}, 
            {"b", Int64.Type}, 
            {"c", Int64.Type}
        }
    ),

TempTable = ChangeColumnTypes,
ResultTable = ChangeColumnTypes, 

MinDate = (x as table, y as number ) as date => List.Min(
    Table.Column(
        Table.SelectRows(x,each Record.Field(_,"id")=y),
    "date")
),

MaxDate = (x as table, y as number ) as date => List.Max(
    Table.Column(
        Table.SelectRows(x,each Record.Field(_,"id")=y),
    "date")
),

AddIsFirstDateColumn = Table.AddColumn(ResultTable,
        "IsFirstDate",
        each Record.Field(_,"date") = MinDate(TempTable,Record.Field(_,"id"))
),

AddIsLastDateColumn =Table.AddColumn(AddIsFirstDateColumn,
        "IsLastDate",
        each Record.Field(_,"date") = MaxDate(TempTable,Record.Field(_,"id"))
),

getFirstValue = (x as table,y as number) as number => List.Min(
    Table.Column(Table.SelectRows(
        x, each Record.Field(_,"id")=y and Record.Field(_,"IsFirstDate") = true
    ),"b")
),

FirstValue = Table.AddColumn(
    AddIsLastDateColumn,"FirstValue",
    each getFirstValue(AddIsLastDateColumn,Record.Field(_,"id"))
),

getLastValue = (x as table,y as number) as number => List.Min(
    Table.Column(Table.SelectRows(
        x, each Record.Field(_,"id")=y and Record.Field(_,"IsLastDate") = true
    ),"c")
),

LastValue = Table.AddColumn(
    FirstValue,"LastValue",
    each getLastValue(AddIsLastDateColumn,Record.Field(_,"id"))
),

RemoveDuplicates = Table.SelectRows(LastValue,each [IsLastDate] = true),
RemoveColumns = Table.RemoveColumns(RemoveDuplicates,{"IsLastDate","IsFirstDate","b","c"})
in
    RemoveColumns

View solution in original post

8 REPLIES 8
Highlighted
Solution Sage
Solution Sage

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

Two inputs required.

 

1) If there are two entries with the same date and same id, what should be the result?

2) If there is more than one entry for an id with different dates, after removing the duplicate rows, what is the date you want to show in the date field of the result table? First date or last date?

Highlighted
Helper III
Helper III

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

1) If there are two entries with the same date and same id, what should be the result?

-> duplicate (the latter one on the list) can be removed. This is very rare possibility and thus it really doesn't make a huge difference whichever it is.

 

2) If there is more than one entry for an id with different dates, after removing the duplicate rows, what is the date you want to show in the date field of the result table? First date or last date?

-> Last date. Date has not got too much value after this step.

Highlighted
Solution Sage
Solution Sage

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

For the sake of easy understanding of the code, I have written the power query step by step. But you can combine several steps and make the query much denser.

 

let Source = Table.FromRecords(
        {
            [id = 1, date = #date(2018,2,3), b = 513, c = 423],
            [id = 2, date = #date(2018,5,1), b = 534, c = 432],
            [id = 2, date = #date(2019,1,18), b = 345, c = 5434],
            [id = 3, date = #date(2019,2,2), b = 354, c = 4534],
            [id = 4, date = #date(2019,4,6), b = 543, c = 544],
            [id = 5, date = #date(2019,5,5), b = 654, c = 432]
        }
    ),
    
    ChangeColumnTypes = Table.TransformColumnTypes(Source,
        {
            {"date", type date}, 
            {"id", Int64.Type}, 
            {"b", Int64.Type}, 
            {"c", Int64.Type}
        }
    ),

TempTable = ChangeColumnTypes,
ResultTable = ChangeColumnTypes, 

MinDate = (x as table, y as number ) as date => List.Min(
    Table.Column(
        Table.SelectRows(x,each Record.Field(_,"id")=y),
    "date")
),

MaxDate = (x as table, y as number ) as date => List.Max(
    Table.Column(
        Table.SelectRows(x,each Record.Field(_,"id")=y),
    "date")
),

AddIsFirstDateColumn = Table.AddColumn(ResultTable,
        "IsFirstDate",
        each Record.Field(_,"date") = MinDate(TempTable,Record.Field(_,"id"))
),

AddIsLastDateColumn =Table.AddColumn(AddIsFirstDateColumn,
        "IsLastDate",
        each Record.Field(_,"date") = MaxDate(TempTable,Record.Field(_,"id"))
),

getFirstValue = (x as table,y as number) as number => List.Min(
    Table.Column(Table.SelectRows(
        x, each Record.Field(_,"id")=y and Record.Field(_,"IsFirstDate") = true
    ),"b")
),

FirstValue = Table.AddColumn(
    AddIsLastDateColumn,"FirstValue",
    each getFirstValue(AddIsLastDateColumn,Record.Field(_,"id"))
),

getLastValue = (x as table,y as number) as number => List.Min(
    Table.Column(Table.SelectRows(
        x, each Record.Field(_,"id")=y and Record.Field(_,"IsLastDate") = true
    ),"c")
),

LastValue = Table.AddColumn(
    FirstValue,"LastValue",
    each getLastValue(AddIsLastDateColumn,Record.Field(_,"id"))
),

RemoveDuplicates = Table.SelectRows(LastValue,each [IsLastDate] = true),
RemoveColumns = Table.RemoveColumns(RemoveDuplicates,{"IsLastDate","IsFirstDate","b","c"})
in
    RemoveColumns

View solution in original post

Highlighted
Helper III
Helper III

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

Thank you so much  @sreenathv , this is working well!

 

Two questions 

- If there are duplicate dates for certain ID, duplicate is not yet removed. Should I just remove duplicates in the end? As told before, there is no way to see what is the last change in these cases as there are no time stamps.

- After all my real ID:s contain a hyphen (-) and thus they are not number values. Do I just create an ID without the hyphen or is there a way to make this code work in a situation where ID if a text field?

Jenni 

Highlighted
Solution Sage
Solution Sage

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

1) There is an ambiguity if id-date combinations repeat. There is no way a system can determine which one to keep and which one to remove. Maybe you can use some functions like List.First() etc.. to randomly pick the first record and keep it.

 

2) IF id is text, change the data type of the following line as text. 

 

{"id", Int64.Type}, 

 

I think it should work as we are not using the id field for any arithmetic calculations. We are using it only for matching. So it really does not matter if id is number or text.

 

 

Highlighted
Helper III
Helper III

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

1) Yes this is the way to go, I agree.

 

2) I already did tchange the type you described this but keep receiving error in the "RemoveDuplicates" -step. 

Expression.Error: We cannot convert the value "9-432235" to type Number.
Details:
    Value= 9-432235
    Type=Type

 

Jenni

Highlighted
Solution Sage
Solution Sage

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

Refer to the code below...

 

I have modified certain codes and identified as //MODIFIED CODE

 

let Source = Table.FromRecords(
        {
            [id = 1, date = #date(2018,2,3), b = 513, c = 423],
            [id = 2, date = #date(2018,5,1), b = 534, c = 432],
            [id = 2, date = #date(2019,1,18), b = 345, c = 5434],
            [id = 3, date = #date(2019,2,2), b = 354, c = 4534],
            [id = 4, date = #date(2019,4,6), b = 543, c = 544],
            [id = 5, date = #date(2019,5,5), b = 654, c = 432]
        }
    ),
    
    ChangeColumnTypes = Table.TransformColumnTypes(Source,
        {
            {"date", type date}, 
            // {"id", Int64.Type}, 
            {"id", type text},  //MODIFIED CODE
            {"b", Int64.Type}, 
            {"c", Int64.Type}
        }
    ),

TempTable = ChangeColumnTypes,
ResultTable = ChangeColumnTypes, 

//MinDate = (x as table, y as number ) as date => List.Min(
MinDate = (x as table, y as text ) as date => List.Min( // MODIFIED CODE
    Table.Column(
        Table.SelectRows(x,each Record.Field(_,"id")=y),
    "date")
),

//MaxDate = (x as table, y as number ) as date => List.Max(
MaxDate = (x as table, y as text ) as date => List.Max( //MODIFIED CODE
    Table.Column(
        Table.SelectRows(x,each Record.Field(_,"id")=y),
    "date")
),

AddIsFirstDateColumn = Table.AddColumn(ResultTable,
        "IsFirstDate",
        each Record.Field(_,"date") = MinDate(TempTable,Record.Field(_,"id"))
),

AddIsLastDateColumn =Table.AddColumn(AddIsFirstDateColumn,
        "IsLastDate",
        each Record.Field(_,"date") = MaxDate(TempTable,Record.Field(_,"id"))
),

//getFirstValue = (x as table,y as number) as number => List.Min(
getFirstValue = (x as table,y as text) as number => List.Min( //MODIFIED CODE
    Table.Column(Table.SelectRows(
        x, each Record.Field(_,"id")=y and Record.Field(_,"IsFirstDate") = true
    ),"b")
),

FirstValue = Table.AddColumn(
    AddIsLastDateColumn,"FirstValue",
    each getFirstValue(AddIsLastDateColumn,Record.Field(_,"id"))
),

//getLastValue = (x as table,y as number) as number => List.Min(
getLastValue = (x as table,y as text) as number => List.Min( //MODIFIED CODE
    Table.Column(Table.SelectRows(
        x, each Record.Field(_,"id")=y and Record.Field(_,"IsLastDate") = true
    ),"c")
),

LastValue = Table.AddColumn(
    FirstValue,"LastValue",
    each getLastValue(AddIsLastDateColumn,Record.Field(_,"id"))
),

RemoveDuplicates = Table.SelectRows(LastValue,each [IsLastDate] = true),
RemoveColumns = Table.RemoveColumns(RemoveDuplicates,{"IsLastDate","IsFirstDate","b","c"})
in
    RemoveColumns
Highlighted
Helper III
Helper III

Re: Power Query : removing duplicates and merging duplicate rows selectively into one

Great, thanks @sreenathv !

 

Jenni

Helpful resources

Announcements
August 2020 Community Challenge: Can You Solve These?

August 2020 Community Challenge: Can You Solve These?

We're excited to announce our first cross-community 'Can You Solve These?' challenge!

July 2020 Community Highlights

July 2020 Community Highlights

Learn about the exciting things that happened in July.

Upcoming Events

Upcoming Events

Wondering what events you could join or have an event to promote yourself? Check out our Upcoming Events.

Get Ready for Power BI Dev Camp

Get Ready for Power BI Dev Camp

We are thrilled to announce we will begin running a monthly webinar series named Power BI Dev Camp.

Top Solution Authors
Top Kudoed Authors