Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
SuperSayan
Resolver I
Resolver I

Manage duplicates with different values

Hello everyone,

 

I'm struggling with a new issue.
I thought it would be simple to deal with but can't figure it out.

 

I have a type of intervention that is a string.

I have a hundred of reference but there is one specificity I have 3 references for very similar intervention and a lot of duplicates.

Below is a sample of the dataset where this reference is in the "service" column.

Basically I have:

- TM04

- TM4A

- TM4E

 

What I want to do, for every couple of "Fin réelle" and "Code implantation" I want to keep a single line of all the above value.

Even more challenging:

- if there is a TM04 and a TM4A I want to keep one row TM4A

- same if I have TM04 and TM4E, I want to keep one row TM4E.

- I can't have TM4A and TM4E at the same time.

- The only instance where I want to keep TM04 is when I have only this one.

 

For info, I'm in the process of removing the possibility to have TM04 in the future so I can just perform "regular" duplicates removal, unfortunately I can't clean the past :(.

 

Intervention data.JPG

1 ACCEPTED SOLUTION
SuperSayan
Resolver I
Resolver I

I actually managed to perform the above using @MarcelBeug proposed code on another topic! (See here)

 

The logic behind that is that I remove all duplicates based on "Service", "Fin réelle" and "Code Implantation" therefore I can only have 0, 1 or 2 Services TM04/TM4A/TM4E by "Code Implantation"/"Fin réelle"

Then I group by "Code Implantation"/"Fin réelle" and count the 3 different types of services and finally keep only the ones I want with a filter.

 

I guess I asked too quickly but maybe this can help someone else (or someone can give me a better/quicker solution to this one).

 

let
    Source = Excel.Workbook(File.Contents("C:\Work\Réorganisation STMC\PowerBI\Gestion doublons.xlsx"), null, true),
    Feuil1_Sheet = Source{[Item="Feuil1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Feuil1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Code intervention", type text}, {"Code implantation", Int64.Type}, {"Service", type text}, {"Clients", type text}, {"Code postal", Int64.Type}, {"Ville", type text}, {"Catégorie (Matériel)", type text}, {"Statut", type text}, {"Raison de statut", type text}, {"Code canton", Int64.Type}, {"Echéance max.", type datetime}, {"Référence matériel", type text}, {"Ressources", type text}, {"Fin réelle", type date}, {"Durée réelle", Int64.Type}, {"Description", type text}, {"Date création intervention", type datetime}, {"Division", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Service", "Code Implantation", "Fin réelle"}),
    #"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Code Implantation", "Fin réelle"},
                        {{"New Column", each List.Count(List.Select([Service], each Text.StartsWith(_,"TM4") or Text.StartsWith(_,"TM04") )), Int64.Type},
                        {"AllData", each _, Value.Type(#"Removed Duplicates")}}),
     #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Service"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded AllData", each [New Column] <> 2 or [Service] <> "TM04")
in
    #"Filtered Rows"

 

 

 

 

 

View solution in original post

1 REPLY 1
SuperSayan
Resolver I
Resolver I

I actually managed to perform the above using @MarcelBeug proposed code on another topic! (See here)

 

The logic behind that is that I remove all duplicates based on "Service", "Fin réelle" and "Code Implantation" therefore I can only have 0, 1 or 2 Services TM04/TM4A/TM4E by "Code Implantation"/"Fin réelle"

Then I group by "Code Implantation"/"Fin réelle" and count the 3 different types of services and finally keep only the ones I want with a filter.

 

I guess I asked too quickly but maybe this can help someone else (or someone can give me a better/quicker solution to this one).

 

let
    Source = Excel.Workbook(File.Contents("C:\Work\Réorganisation STMC\PowerBI\Gestion doublons.xlsx"), null, true),
    Feuil1_Sheet = Source{[Item="Feuil1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Feuil1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Code intervention", type text}, {"Code implantation", Int64.Type}, {"Service", type text}, {"Clients", type text}, {"Code postal", Int64.Type}, {"Ville", type text}, {"Catégorie (Matériel)", type text}, {"Statut", type text}, {"Raison de statut", type text}, {"Code canton", Int64.Type}, {"Echéance max.", type datetime}, {"Référence matériel", type text}, {"Ressources", type text}, {"Fin réelle", type date}, {"Durée réelle", Int64.Type}, {"Description", type text}, {"Date création intervention", type datetime}, {"Division", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Service", "Code Implantation", "Fin réelle"}),
    #"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Code Implantation", "Fin réelle"},
                        {{"New Column", each List.Count(List.Select([Service], each Text.StartsWith(_,"TM4") or Text.StartsWith(_,"TM04") )), Int64.Type},
                        {"AllData", each _, Value.Type(#"Removed Duplicates")}}),
     #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Service"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded AllData", each [New Column] <> 2 or [Service] <> "TM04")
in
    #"Filtered Rows"

 

 

 

 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.