Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Applicable88
Impactful Individual
Impactful Individual

How to get rid of rows with a duplicate, but leave the ones which are not empty

Hello,

I have machine data here which sometimes writes errors and forget to write the name of the creator of an order and only sometimes rewrite it correctly with the right content:

 

OrdeID Starttime CreatedBy
1 2021-01-01 Worker1
2 2021-01-02 Worker2
3 2021-01-03 Worker3
4 2021-01-04 Worker4
1 2021-01-01  
2 2021-01-02  
3 2021-01-03  
4 2021-01-04  
11 2021-01-05  
12 2021-01-06  
13 2021-01-07  
14 2021-01-08  
     

 

In that case it creates a duplicate, but I only want to keep the ones which has "created by" data. The problem is I also need to consider other cells which are empty in "created by" and has no correct duplicates with data. these I also need to keep. In my sample table I only want to get rid of the ones with red orderID. 

Hope someone has an idea how to get rid of the ones without deleting the other data.

Thank you very much in advance. 

Best. 

1 ACCEPTED SOLUTION
sevenhills
Super User
Super User

The way I understood the problem as you want to
keep all rows "OrdeID Starttime CreatedBy",

with group by "OrdeID Starttime" and keep the first created by without blanks as first preference. If this is true, you can do this.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIyMDLUNQAhICc8vyg7tchQKVYnWskIWdIILmkEljRGljSGSxqDJU2QJU3gkiZgSQw7FbDbpoDdHgXsNkCEDVEMN0WIo5huhhBHMd4cIY5ivgVEPBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [OrdeID = _t, Starttime = _t, CreatedBy = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"OrdeID", Int64.Type}, {"Starttime", type date}, {"CreatedBy", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrdeID", Order.Ascending}, {"Starttime", Order.Ascending}, {"CreatedBy", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"OrdeID", "Starttime"}, {{"Alldatarows", each _, type table [OrdeID=nullable number, Starttime=nullable date, CreatedBy=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([Alldatarows], "Sub Index", 1, 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"OrdeID", "Starttime", "CreatedBy", "Sub Index"}, {"Custom.OrdeID", "Custom.Starttime", "Custom.CreatedBy", "Custom.Sub Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each [Custom.Sub Index] = 1)
in
    #"Filtered Rows"

 

 

Basically, I did sorting of data, tried creating subgroup index and removed subgroup index > 1

 

sevenhills_0-1632292833847.png

 

sevenhills_1-1632292857337.png

 

If this is what you need, rename the columns, and remove the sub index column.

 

Hope this helps! 

View solution in original post

5 REPLIES 5
sevenhills
Super User
Super User

The way I understood the problem as you want to
keep all rows "OrdeID Starttime CreatedBy",

with group by "OrdeID Starttime" and keep the first created by without blanks as first preference. If this is true, you can do this.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIyMDLUNQAhICc8vyg7tchQKVYnWskIWdIILmkEljRGljSGSxqDJU2QJU3gkiZgSQw7FbDbpoDdHgXsNkCEDVEMN0WIo5huhhBHMd4cIY5ivgVEPBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [OrdeID = _t, Starttime = _t, CreatedBy = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"OrdeID", Int64.Type}, {"Starttime", type date}, {"CreatedBy", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrdeID", Order.Ascending}, {"Starttime", Order.Ascending}, {"CreatedBy", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"OrdeID", "Starttime"}, {{"Alldatarows", each _, type table [OrdeID=nullable number, Starttime=nullable date, CreatedBy=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([Alldatarows], "Sub Index", 1, 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"OrdeID", "Starttime", "CreatedBy", "Sub Index"}, {"Custom.OrdeID", "Custom.Starttime", "Custom.CreatedBy", "Custom.Sub Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each [Custom.Sub Index] = 1)
in
    #"Filtered Rows"

 

 

Basically, I did sorting of data, tried creating subgroup index and removed subgroup index > 1

 

sevenhills_0-1632292833847.png

 

sevenhills_1-1632292857337.png

 

If this is what you need, rename the columns, and remove the sub index column.

 

Hope this helps! 

Watsky
Solution Sage
Solution Sage

Hey @Applicable88 ,

 

Per chance you want to keep all the records but be able to create a filter to achieve your result you can try this:

 

Column 2 = 
VAR a =
    CALCULATE (
        COUNT ( 'Table'[OrdeID] ),
        FILTER ( 'Table', 'Table'[OrdeID] = EARLIER ( 'Table'[OrdeID] ) )
    )
VAR b =
    CALCULATE (
        COUNT ( 'Table'[OrdeID] ),
        ALLEXCEPT ( 'Table', 'Table'[CreatedBy] )
    )
RETURN
    IF ( a = 1, "YES", IF ( b = 1, "YES", "NO" ) )

 You'll end up with a column where you'll want to filter on YES.

 

Watsky_0-1632248409547.png


Did my answer(s) help you? Give it a kudos by clicking the Thumbs Up! ?
Did my post answer your question(s)? Mark my post as a solution. This will help others find the solution.


Did my answer(s) help you? Give it a kudos by clicking the Thumbs Up!
Did my post answer your question(s)? Mark my post as a solution. This will help others find the solution.

Proud to be a Super User!

Anonymous
Not applicable

Open Power Query. Select only the OrdeID column and go to Remove Rows and select Remove Duplicates. That should do it. I replicated your case, see result below:

Raymundo2910_0-1632243195859.png

 

Hello @Anonymous,

 

thanks for your effort, but I saw you use a step called "remove bottom rows" which implies it only works if the one needed to be removed always comes second, which is unfortunately not in my case. It can be mixed. Sometimes comes first sometimes second etc. Any possiblites to overcome thes irregularities? 

Best. 

Anonymous
Not applicable

That step was just because I created the table manually and the last row was full of nulls. This would not happen on your data source 🙂 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors