cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper I
Helper I

Automatically Rename Only Duplicate Values in a Table

I have a table which is built by inputs from useres. The problem that I face is resulted from multiple entries from the same user (one users submits several times). Basically, I need to retain all entries for later filtering and calculations, but having repeated names will inhibit one-to-many relationship and other filtering.

 

For example, if I receive this raw table:

 

UserEmailValue

John Smith

john@gmail.com45
Mickeal Jacksonmichael@gmail.com56
Andrew Dariandrew@gmail.com56
John Smithjohn@gmail.com32
John Smithjohn@gmail.com19
Mickeal Jacksonmichael@gmail.com78
Helen Ravadihelen@gmail.com10
Ann Hongann@gmail.com00

 

One solution is to change it to the following table to solve the problem of repeated usernames:

 

UserEmailValue

John Smith 1

john@gmail.com45
Mickeal Jackson 1michael@gmail.com56
Andrew Dariandrew@gmail.com56
John Smith 2john@gmail.com32
John Smith 3john@gmail.com19
Mickeal Jackson 2michael@gmail.com78
Helen Ravadihelen@gmail.com10
Ann Hongann@gmail.com00

 

Can someone help me how I can automate this process? There are many users in the list.

 

By the way, if you have any better suggestion to deal with this problem, please let me me know. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Super User IV
Super User IV

Re: Automatically Rename Only Duplicate Values in a Table

Hi,

This M code works

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
    Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
    #"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Grouped Rows" = Table.Group(#"Expanded Partition", {"User"}, {{"GroupTables", each _, type table [User=text, Email=text, Value=number, Index=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "CountRows", each Table.RowCount([GroupTables])),
    #"Expanded GroupTables" = Table.ExpandTableColumn(#"Added Custom", "GroupTables", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded GroupTables", "Custom", each if [CountRows]=1 then [User] else [User] & Number.ToText([Index])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"User", "Index", "CountRows"})
in
    #"Removed Columns"

Hope this helps.

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

8 REPLIES 8
Highlighted
Super User III
Super User III

Re: Automatically Rename Only Duplicate Values in a Table

Hi @mah65 ,

 

Believe that the best option is to create a star schema model and create a table to use on the one side of the relationship with only user and e-mail, that way you can make the relationships working properly.

 

Think it's easier than to make changes to the data that way you will loose that John Smith as 3 entries and you will get John Smith 1, 2, 3.

 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog:

Power BI em Português





Highlighted
Helper I
Helper I

Re: Automatically Rename Only Duplicate Values in a Table

I have this user table that has all the names and emails and all other values (shown above). I am using that as a dimension table. But with repeated users, there should be many to many relationship which makes difficulty in visualizing. Especially, not all repeated users are being shown in a table visualization.

 

If I build the table with only names and emails, then what should I do with the value columns? I have also another fact table that this user table is connected to that.

 

I also tried snowflake model. That is, I built a table with only unique names and put that on top of the user info table. Did not solve the problem.

Highlighted
Community Champion
Community Champion

Re: Automatically Rename Only Duplicate Values in a Table

Making another table that references your example table in the query editor is the way to go.  Below is an example query that shows the steps to do that with your example data.  Once you load this table too, you can make relationships to your original table and the other table(s) in your model from the email column (or name).  To see how it works, just create a blank query, go to Advanced Editor, and replace the text there with the M code below.

 

let
    Source = OriginalTable,
    #"Removed Other Columns" = Table.SelectColumns(Source,{"User", "Email"}),
    Custom1 = Table.Distinct(#"Removed Other Columns")
in
    Custom1

 

If this works for you, please mark it as the solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat

Highlighted
Helper I
Helper I

Re: Automatically Rename Only Duplicate Values in a Table

Thanks for your suggestion. As I mentioned above, I already built such table. See figure.

But, it is still not fulfilling my purpose. There is no problem in counting users (applicants), but when I want to show repeated users information in a 'table' visualization, it is only showing the first one, not all three repeatitions, for example.

 

What should I do with that?

 

Relationships.jpg

 

By the way, I have also some calculations that should be done per each application (not per user). So, if there are users with same name/email, these calculations will cover all! I also won't be able to choose user as a filter slice, because it will cover all his applications. If I have John Smith 1, John Smith 2, etc, then selection and calculations will be much easier in every field!

 

Any idea how to rename automatically?

Highlighted
Super User IV
Super User IV

Re: Automatically Rename Only Duplicate Values in a Table

Hi,

This M code works

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
    Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
    #"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Expanded Partition", {{"Index", type text}}, "en-IN"),{"User", "Index"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged")
in
    #"Merged Columns"

Hope this helps.

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Highlighted
Helper I
Helper I

Re: Automatically Rename Only Duplicate Values in a Table

Thanks for all your replies.

I developed this python code to modify only duplicate values. It is simpler 🙂

 

dataset.loc[dataset.duplicated('FullName', keep=False),"FullName"] = dataset.loc[dataset.duplicated('FullName', keep=False),"FullName"] + " " + (dataset[dataset.duplicated('FullName', keep=False)].groupby("FullName").cumcount() + 1).astype(str)

 

 

However, python have problem working in app.powerbi online! Really frustrating!

Highlighted
Helper I
Helper I

Re: Automatically Rename Only Duplicate Values in a Table

@Ashish_Mathur 

Thanks.

Can you make your solution work only on duplicate values? Non-repeated usernames should not change.

Highlighted
Super User IV
Super User IV

Re: Automatically Rename Only Duplicate Values in a Table

Hi,

This M code works

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
    Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
    #"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Grouped Rows" = Table.Group(#"Expanded Partition", {"User"}, {{"GroupTables", each _, type table [User=text, Email=text, Value=number, Index=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "CountRows", each Table.RowCount([GroupTables])),
    #"Expanded GroupTables" = Table.ExpandTableColumn(#"Added Custom", "GroupTables", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded GroupTables", "Custom", each if [CountRows]=1 then [User] else [User] & Number.ToText([Index])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"User", "Index", "CountRows"})
in
    #"Removed Columns"

Hope this helps.

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

Helpful resources

Announcements
August 2020 Community Challenge: Can You Solve These?

August 2020 Community Challenge: Can You Solve These?

We're excited to announce our first cross-community 'Can You Solve These?' challenge!

July 2020 Community Highlights

July 2020 Community Highlights

Learn about the exciting things that happened in July.

Upcoming Events

Upcoming Events

Wondering what events you could join or have an event to promote yourself? Check out our Upcoming Events.

Get Ready for Power BI Dev Camp

Get Ready for Power BI Dev Camp

We are thrilled to announce we will begin running a monthly webinar series named Power BI Dev Camp.

Top Solution Authors
Top Kudoed Authors