Solved: Re: remove duplicates based on a filter

IF · ‎12-07-2022

Hi,

In the power query, I want to remove the dublicates if the ID number is the same. If the ID is repeated, it should keep the Classification = B. For example, the original Table is:

IDNameDateStartTypeDaysQtyClassificationDateClose

4340058	P1	11.15.2022	M2	23	1	C	12.07.2022
4340052	P1	11.14.2022	M1	12	1	C	11.29.2022
4340051	P1	11.14.2022	M2	1	1	A
4340050	P1	10.13.2022	M4	31	1	A
4340038	P2	11.15.2022	M3	11	1	C	11.26.2022
4340022	P2	11.14.2022	M7	2	1	C	11.16.2022
4340021	P2	10.14.2022	M2	45	1	B
4340030	P2	11.13.2022	M4	31	1	C	12.01.2022
4340021	P2	10.14.2022	M2	45	1	A
4340050	P1	10.13.2022	M4	31	1	B

But the red rows since the ID is repeated. I want to keep the classification B if the ID is repeated. In this case, I want to change it to:

IDNameDateStartTypeDaysQtyClassificationDateClose

4340058	P1	11.15.2022	M2	23	1	C	12.07.2022
4340052	P1	11.14.2022	M1	12	1	C	11.29.2022
4340051	P1	11.14.2022	M2	1	1	A
4340038	P2	11.15.2022	M3	11	1	C	11.26.2022
4340022	P2	11.14.2022	M7	2	1	C	11.16.2022
4340021	P2	10.14.2022	M2	45	1	B
4340030	P2	11.13.2022	M4	31	1	C	12.01.2022
4340050	P1	10.13.2022	M4	31	1	B

Can you help me?

Best

tackytechtom · ‎12-12-2022

Hi @IF ,

I apologise! There was a typo in the M code...

I updated the solution above. Feel free to try it once more 🙂

/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/

Did I answer your question❓➡️ Please, mark my post as a solution ✔️

Also happily accepting Kudos 🙂

Feel free to connect with me on LinkedIn!

#proudtobeasuperuser

View solution in original post

tackytechtom · ‎12-07-2022

Hi @IF ,

How about this:

Before:

After:

Here the code in Power Query M that you can paste into the advanced editor (if you do not know, how to exactly do this, please check out this quick walkthrough)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZExDsIwDEWvgjJXlr+dUBiBuVL3qve/BnFoTEjp0sGO7OTJ/zvLEqJG5nQLQ5iRE0BIJCySi8mSqLVzvOwU4vFzvQ4VlhaODpeOtDBI7j2M/3DlLB45Lg3CjjBBHYk56QGjxaDsDBZv6DReO43lqcNfjaNt55fFjoWz3PuLaYOfnVhu5x0YrL+BMwPPbHQTub4B", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Name = _t, DateStart = _t, Type = _t, Days = _t, Qty = _t, Classification = _t, DateClose = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Name", type text}, {"DateStart", type text}, {"Type", type text}, {"Days", Int64.Type}, {"Qty", Int64.Type}, {"Classification", type text}, {"DateClose", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"AllRows", each _, type table [ID=nullable number, Name=nullable text, DateStart=nullable text, Type=nullable text, Days=nullable number, Qty=nullable number, Classification=nullable text, DateClose=nullable text]}, {"CountRows", each Table.RowCount(_), Int64.Type}}),
    #"Filtered Rows 1" = Table.SelectRows(#"Grouped Rows", each ([CountRows] > 1)),
    #"Expanded AllRows 1" = Table.ExpandTableColumn(#"Filtered Rows 1", "AllRows", {"Name", "DateStart", "Type", "Days", "Qty", "Classification", "DateClose"}, {"AllRows.Name", "AllRows.DateStart", "AllRows.Type", "AllRows.Days", "AllRows.Qty", "AllRows.Classification", "AllRows.DateClose"}),
    #"Filtered Rows 2" = Table.SelectRows(#"Expanded AllRows 1", each ([AllRows.Classification] = "B")),
    #"Filtered Rows 3" = Table.SelectRows(#"Grouped Rows", each ([CountRows] = 1)),
    #"Expanded AllRows 2" = Table.ExpandTableColumn(#"Filtered Rows 3", "AllRows", {"Name", "DateStart", "Type", "Days", "Qty", "Classification", "DateClose"}, {"AllRows.Name", "AllRows.DateStart", "AllRows.Type", "AllRows.Days", "AllRows.Qty", "AllRows.Classification", "AllRows.DateClose"}),
    #"Appended Query" = Table.Combine({#"Expanded AllRows 2", #"Filtered Rows 2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Appended Query",{{"AllRows.Name", "Name"}, {"AllRows.DateStart", "DateStart"}, {"AllRows.Type", "Type"}, {"AllRows.Days", "Days"}, {"AllRows.Qty", "Qty"}, {"AllRows.Classification", "Classification"}, {"AllRows.DateClose", "DateClose"}}),
    #"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"CountRows"})
in
    #"Removed Columns"

The idea is to do a group by and filter for the IDs that have more than one row. For these rows, we filter for classification = B. Then we append that resultset with all rows that just have one row in the group by. In the end it's a bit of cleaning like renaming columns and removing the count (support) column.

I'd suggest to paste in the M code in a new query and check out the steps on the right hand side.

Let me know if you have questions 🙂

/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/

Did I answer your question❓➡️ Please, mark my post as a solution ✔️

Also happily accepting Kudos 🙂

Feel free to connect with me on LinkedIn!

#proudtobeasuperuser

IF · ‎12-12-2022

Hello,

Thanks for the help. I have a question about the start date. I didn't understand why the result is missing the start date data in some cases.

tackytechtom · ‎12-12-2022

Hi @IF ,

I apologise! There was a typo in the M code...

I updated the solution above. Feel free to try it once more 🙂

/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/

Did I answer your question❓➡️ Please, mark my post as a solution ✔️

Also happily accepting Kudos 🙂

Feel free to connect with me on LinkedIn!

#proudtobeasuperuser

IF · ‎12-23-2022

Perfect!

remove duplicates based on a filter

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024