Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I have a table which is built by inputs from useres. The problem that I face is resulted from multiple entries from the same user (one users submits several times). Basically, I need to retain all entries for later filtering and calculations, but having repeated names will inhibit one-to-many relationship and other filtering.
For example, if I receive this raw table:
User | Value | |
John Smith | john@gmail.com | 45 |
Mickeal Jackson | michael@gmail.com | 56 |
Andrew Dari | andrew@gmail.com | 56 |
John Smith | john@gmail.com | 32 |
John Smith | john@gmail.com | 19 |
Mickeal Jackson | michael@gmail.com | 78 |
Helen Ravadi | helen@gmail.com | 10 |
Ann Hong | ann@gmail.com | 00 |
One solution is to change it to the following table to solve the problem of repeated usernames:
User | Value | |
John Smith 1 | john@gmail.com | 45 |
Mickeal Jackson 1 | michael@gmail.com | 56 |
Andrew Dari | andrew@gmail.com | 56 |
John Smith 2 | john@gmail.com | 32 |
John Smith 3 | john@gmail.com | 19 |
Mickeal Jackson 2 | michael@gmail.com | 78 |
Helen Ravadi | helen@gmail.com | 10 |
Ann Hong | ann@gmail.com | 00 |
Can someone help me how I can automate this process? There are many users in the list.
By the way, if you have any better suggestion to deal with this problem, please let me me know. Thanks.
Solved! Go to Solution.
Hi,
This M code works
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
#"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
#"Grouped Rows" = Table.Group(#"Expanded Partition", {"User"}, {{"GroupTables", each _, type table [User=text, Email=text, Value=number, Index=number]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "CountRows", each Table.RowCount([GroupTables])),
#"Expanded GroupTables" = Table.ExpandTableColumn(#"Added Custom", "GroupTables", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
#"Added Custom1" = Table.AddColumn(#"Expanded GroupTables", "Custom", each if [CountRows]=1 then [User] else [User] & Number.ToText([Index])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"User", "Index", "CountRows"})
in
#"Removed Columns"
Hope this helps.
Hi,
This M code works
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
#"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Expanded Partition", {{"Index", type text}}, "en-IN"),{"User", "Index"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged")
in
#"Merged Columns"
Hope this helps.
Thanks.
Can you make your solution work only on duplicate values? Non-repeated usernames should not change.
Hi,
This M code works
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Email", type text}, {"Value", Int64.Type}}),
Partition = Table.Group(#"Changed Type", {"User"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
#"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
#"Grouped Rows" = Table.Group(#"Expanded Partition", {"User"}, {{"GroupTables", each _, type table [User=text, Email=text, Value=number, Index=number]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "CountRows", each Table.RowCount([GroupTables])),
#"Expanded GroupTables" = Table.ExpandTableColumn(#"Added Custom", "GroupTables", {"Email", "Value", "Index"}, {"Email", "Value", "Index"}),
#"Added Custom1" = Table.AddColumn(#"Expanded GroupTables", "Custom", each if [CountRows]=1 then [User] else [User] & Number.ToText([Index])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"User", "Index", "CountRows"})
in
#"Removed Columns"
Hope this helps.
Thanks for all your replies.
I developed this python code to modify only duplicate values. It is simpler 🙂
dataset.loc[dataset.duplicated('FullName', keep=False),"FullName"] = dataset.loc[dataset.duplicated('FullName', keep=False),"FullName"] + " " + (dataset[dataset.duplicated('FullName', keep=False)].groupby("FullName").cumcount() + 1).astype(str)
However, python have problem working in app.powerbi online! Really frustrating!
Hi @Anonymous ,
Believe that the best option is to create a star schema model and create a table to use on the one side of the relationship with only user and e-mail, that way you can make the relationships working properly.
Think it's easier than to make changes to the data that way you will loose that John Smith as 3 entries and you will get John Smith 1, 2, 3.
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsI have this user table that has all the names and emails and all other values (shown above). I am using that as a dimension table. But with repeated users, there should be many to many relationship which makes difficulty in visualizing. Especially, not all repeated users are being shown in a table visualization.
If I build the table with only names and emails, then what should I do with the value columns? I have also another fact table that this user table is connected to that.
I also tried snowflake model. That is, I built a table with only unique names and put that on top of the user info table. Did not solve the problem.
Making another table that references your example table in the query editor is the way to go. Below is an example query that shows the steps to do that with your example data. Once you load this table too, you can make relationships to your original table and the other table(s) in your model from the email column (or name). To see how it works, just create a blank query, go to Advanced Editor, and replace the text there with the M code below.
let
Source = OriginalTable,
#"Removed Other Columns" = Table.SelectColumns(Source,{"User", "Email"}),
Custom1 = Table.Distinct(#"Removed Other Columns")
in
Custom1
If this works for you, please mark it as the solution. Kudos are appreciated too. Please let me know if not.
Regards,
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Thanks for your suggestion. As I mentioned above, I already built such table. See figure.
But, it is still not fulfilling my purpose. There is no problem in counting users (applicants), but when I want to show repeated users information in a 'table' visualization, it is only showing the first one, not all three repeatitions, for example.
What should I do with that?
By the way, I have also some calculations that should be done per each application (not per user). So, if there are users with same name/email, these calculations will cover all! I also won't be able to choose user as a filter slice, because it will cover all his applications. If I have John Smith 1, John Smith 2, etc, then selection and calculations will be much easier in every field!
Any idea how to rename automatically?
User | Count |
---|---|
141 | |
113 | |
104 | |
77 | |
64 |
User | Count |
---|---|
135 | |
123 | |
101 | |
71 | |
61 |