Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Automatically Rename Only Duplicate Values in a Table

I have a table which is built by inputs from useres. The problem that I face is resulted from multiple entries from the same user (one users submits several times). Basically, I need to retain all entries for later filtering and calculations, but having repeated names will inhibit one-to-many relationship and other filtering.

 

For example, if I receive this raw table:

 

UserEmailValue

John Smith

john@gmail.com45
Mickeal Jacksonmichael@gmail.com56
Andrew Dariandrew@gmail.com56
John Smithjohn@gmail.com32
John Smithjohn@gmail.com19
Mickeal Jacksonmichael@gmail.com78
Helen Ravadihelen@gmail.com10
Ann Hongann@gmail.com00

 

One solution is to change it to the following table to solve the problem of repeated usernames:

 

UserEmailValue

John Smith 1

john@gmail.com45
Mickeal Jackson 1michael@gmail.com56
Andrew Dariandrew@gmail.com56
John Smith 2john@gmail.com32
John Smith 3john@gmail.com19
Mickeal Jackson 2michael@gmail.com78
Helen Ravadihelen@gmail.com10
Ann Hongann@gmail.com00

 

Can someone help me how I can automate this process? There are many users in the list.

 

By the way, if you have any better suggestion to deal with this problem, please let me me know. Thanks.

1 ACCEPTED SOLUTION

Hi,

This M code works

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
    Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
    #"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Grouped Rows" = Table.Group(#"Expanded Partition", {"User"}, {{"GroupTables", each _, type table [User=text, Email=text, Value=number, Index=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "CountRows", each Table.RowCount([GroupTables])),
    #"Expanded GroupTables" = Table.ExpandTableColumn(#"Added Custom", "GroupTables", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded GroupTables", "Custom", each if [CountRows]=1 then [User] else [User] & Number.ToText([Index])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"User", "Index", "CountRows"})
in
    #"Removed Columns"

Hope this helps.

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

8 REPLIES 8
Ashish_Mathur
Super User
Super User

Hi,

This M code works

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
    Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
    #"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Expanded Partition", {{"Index", type text}}, "en-IN"),{"User", "Index"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged")
in
    #"Merged Columns"

Hope this helps.

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Anonymous
Not applicable

@Ashish_Mathur 

Thanks.

Can you make your solution work only on duplicate values? Non-repeated usernames should not change.

Hi,

This M code works

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
    Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
    #"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Grouped Rows" = Table.Group(#"Expanded Partition", {"User"}, {{"GroupTables", each _, type table [User=text, Email=text, Value=number, Index=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "CountRows", each Table.RowCount([GroupTables])),
    #"Expanded GroupTables" = Table.ExpandTableColumn(#"Added Custom", "GroupTables", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded GroupTables", "Custom", each if [CountRows]=1 then [User] else [User] & Number.ToText([Index])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"User", "Index", "CountRows"})
in
    #"Removed Columns"

Hope this helps.

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Anonymous
Not applicable

Thanks for all your replies.

I developed this python code to modify only duplicate values. It is simpler 🙂

 

dataset.loc[dataset.duplicated('FullName', keep=False),"FullName"] = dataset.loc[dataset.duplicated('FullName', keep=False),"FullName"] + " " + (dataset[dataset.duplicated('FullName', keep=False)].groupby("FullName").cumcount() + 1).astype(str)

 

 

However, python have problem working in app.powerbi online! Really frustrating!

MFelix
Super User
Super User

Hi @Anonymous ,

 

Believe that the best option is to create a star schema model and create a table to use on the one side of the relationship with only user and e-mail, that way you can make the relationships working properly.

 

Think it's easier than to make changes to the data that way you will loose that John Smith as 3 entries and you will get John Smith 1, 2, 3.

 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



Anonymous
Not applicable

I have this user table that has all the names and emails and all other values (shown above). I am using that as a dimension table. But with repeated users, there should be many to many relationship which makes difficulty in visualizing. Especially, not all repeated users are being shown in a table visualization.

 

If I build the table with only names and emails, then what should I do with the value columns? I have also another fact table that this user table is connected to that.

 

I also tried snowflake model. That is, I built a table with only unique names and put that on top of the user info table. Did not solve the problem.

Making another table that references your example table in the query editor is the way to go.  Below is an example query that shows the steps to do that with your example data.  Once you load this table too, you can make relationships to your original table and the other table(s) in your model from the email column (or name).  To see how it works, just create a blank query, go to Advanced Editor, and replace the text there with the M code below.

 

let
    Source = OriginalTable,
    #"Removed Other Columns" = Table.SelectColumns(Source,{"User", "Email"}),
    Custom1 = Table.Distinct(#"Removed Other Columns")
in
    Custom1

 

If this works for you, please mark it as the solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Anonymous
Not applicable

Thanks for your suggestion. As I mentioned above, I already built such table. See figure.

But, it is still not fulfilling my purpose. There is no problem in counting users (applicants), but when I want to show repeated users information in a 'table' visualization, it is only showing the first one, not all three repeatitions, for example.

 

What should I do with that?

 

Relationships.jpg

 

By the way, I have also some calculations that should be done per each application (not per user). So, if there are users with same name/email, these calculations will cover all! I also won't be able to choose user as a filter slice, because it will cover all his applications. If I have John Smith 1, John Smith 2, etc, then selection and calculations will be much easier in every field!

 

Any idea how to rename automatically?

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.