Solved: Re: How to get rid of rows with a duplicate, but l...

Applicable88 · ‎09-21-2021

Hello,

I have machine data here which sometimes writes errors and forget to write the name of the creator of an order and only sometimes rewrite it correctly with the right content:

OrdeID	Starttime	CreatedBy
1	2021-01-01	Worker1
2	2021-01-02	Worker2
3	2021-01-03	Worker3
4	2021-01-04	Worker4
1	2021-01-01
2	2021-01-02
3	2021-01-03
4	2021-01-04
11	2021-01-05
12	2021-01-06
13	2021-01-07
14	2021-01-08

In that case it creates a duplicate, but I only want to keep the ones which has "created by" data. The problem is I also need to consider other cells which are empty in "created by" and has no correct duplicates with data. these I also need to keep. In my sample table I only want to get rid of the ones with red orderID.

Hope someone has an idea how to get rid of the ones without deleting the other data.

Thank you very much in advance.

Best.

sevenhills · ‎09-21-2021

The way I understood the problem as you want to
keep all rows "OrdeID Starttime CreatedBy",

with group by "OrdeID Starttime" and keep the first created by without blanks as first preference. If this is true, you can do this.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIyMDLUNQAhICc8vyg7tchQKVYnWskIWdIILmkEljRGljSGSxqDJU2QJU3gkiZgSQw7FbDbpoDdHgXsNkCEDVEMN0WIo5huhhBHMd4cIY5ivgVEPBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [OrdeID = _t, Starttime = _t, CreatedBy = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"OrdeID", Int64.Type}, {"Starttime", type date}, {"CreatedBy", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrdeID", Order.Ascending}, {"Starttime", Order.Ascending}, {"CreatedBy", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"OrdeID", "Starttime"}, {{"Alldatarows", each _, type table [OrdeID=nullable number, Starttime=nullable date, CreatedBy=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([Alldatarows], "Sub Index", 1, 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"OrdeID", "Starttime", "CreatedBy", "Sub Index"}, {"Custom.OrdeID", "Custom.Starttime", "Custom.CreatedBy", "Custom.Sub Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each [Custom.Sub Index] = 1)
in
    #"Filtered Rows"

Basically, I did sorting of data, tried creating subgroup index and removed subgroup index > 1

If this is what you need, rename the columns, and remove the sub index column.

Hope this helps!

View solution in original post

sevenhills · ‎09-21-2021

The way I understood the problem as you want to
keep all rows "OrdeID Starttime CreatedBy",

with group by "OrdeID Starttime" and keep the first created by without blanks as first preference. If this is true, you can do this.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIyMDLUNQAhICc8vyg7tchQKVYnWskIWdIILmkEljRGljSGSxqDJU2QJU3gkiZgSQw7FbDbpoDdHgXsNkCEDVEMN0WIo5huhhBHMd4cIY5ivgVEPBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [OrdeID = _t, Starttime = _t, CreatedBy = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"OrdeID", Int64.Type}, {"Starttime", type date}, {"CreatedBy", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrdeID", Order.Ascending}, {"Starttime", Order.Ascending}, {"CreatedBy", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"OrdeID", "Starttime"}, {{"Alldatarows", each _, type table [OrdeID=nullable number, Starttime=nullable date, CreatedBy=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([Alldatarows], "Sub Index", 1, 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"OrdeID", "Starttime", "CreatedBy", "Sub Index"}, {"Custom.OrdeID", "Custom.Starttime", "Custom.CreatedBy", "Custom.Sub Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each [Custom.Sub Index] = 1)
in
    #"Filtered Rows"

Basically, I did sorting of data, tried creating subgroup index and removed subgroup index > 1

If this is what you need, rename the columns, and remove the sub index column.

Hope this helps!

Watsky · ‎09-21-2021

Hey @Applicable88 ,

Per chance you want to keep all the records but be able to create a filter to achieve your result you can try this:

Column 2 = 
VAR a =
    CALCULATE (
        COUNT ( 'Table'[OrdeID] ),
        FILTER ( 'Table', 'Table'[OrdeID] = EARLIER ( 'Table'[OrdeID] ) )
    )
VAR b =
    CALCULATE (
        COUNT ( 'Table'[OrdeID] ),
        ALLEXCEPT ( 'Table', 'Table'[CreatedBy] )
    )
RETURN
    IF ( a = 1, "YES", IF ( b = 1, "YES", "NO" ) )

You'll end up with a column where you'll want to filter on YES.

Did my answer(s) help you? Give it a kudos by clicking the Thumbs Up! ?
Did my post answer your question(s)? Mark my post as a solution. This will help others find the solution.

Did my answer(s) help you? Give it a kudos by clicking the Thumbs Up!
Did my post answer your question(s)? Mark my post as a solution. This will help others find the solution.

Proud to be a Super User!

Anonymous · ‎09-21-2021

Open Power Query. Select only the OrdeID column and go to Remove Rows and select Remove Duplicates. That should do it. I replicated your case, see result below:

Applicable88 · ‎09-21-2021

Hello @Anonymous,

thanks for your effort, but I saw you use a step called "remove bottom rows" which implies it only works if the one needed to be removed always comes second, which is unfortunately not in my case. It can be mixed. Sometimes comes first sometimes second etc. Any possiblites to overcome thes irregularities?

Best.

Anonymous · ‎09-22-2021

That step was just because I created the table manually and the last row was full of nulls. This would not happen on your data source 🙂

How to get rid of rows with a duplicate, but leave the ones which are not empty

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024