Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
This might end up being a duplicate post, I previously hit submit but it the system threw me out (apoloiges if it is)
So in a nutshell I have data from a Drupal webform dump which I am having to split into different tables and then pivot so that I can format the columns properly (age as a number, address as text etc). The (simplified) raw data looks something like this:
User ID | Field | Value | Page ID |
1 | Address | 123 ABC St | User_info |
1 | Phone | 555-5555555 | User_info |
1 | Age | 55 | User_info |
2 | Address | 456 DEF St | User_info |
2 | Phone | 123-4567890 | User_info |
2 | Age | 33 | User_info |
I need it to look like this:
User ID | Address | Phone | Age | Page ID |
1 | 123 ABC St | 555-5555555 | 55 | User_info |
2 | 456 DEF St | 123-4567890 | 33 | User_info |
I have read quite a few solutions that need to add indexes, group, add sub-indexes etc. I can follow a fair amount and the closest I get (with no cigar) is something like this:
User ID | Address | Phone | Age | Page ID |
1 | 123 ABC St | |||
555-5555555 | ||||
55 | ||||
User_info | ||||
2 | you get it! |
I fundamentally don't quite understand what I need to do, can somebody point me in the right direction!
Thanks in advance
Steve
Solved! Go to Solution.
The problem is repeated fields in your data. For instance, for sid 6 there are several repeated values for name (see below). That causes the error when pivoting. In the data you've shared, it seems like all those repetitions have the same value. For instance, below for sid 6, all rows with name = contact_registered_before have value = Yes
all rows with name = contact_registered_before_with have value = Submitted to the OFHA
If that is always the case, simply eliminating duplicates before the pivoting will solve it. Otherwise, we will first need to determine how to treat those cases. For example, if for sid 6 you had two rows with
name = contact_registered_before but one with value = Yes and one with value = No, which one will you choose. The M code for the previous example adding the step to remove the duplicates:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),
#"Removed Duplicates" = Table.Distinct(#"Changed Type"),
#"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Field]), "Field", "Value"),
#"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"User ID", "Address", "Phone", "Age", "Page ID"})
in
#"Reordered Columns"
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
sid | name | value |
6 | contact_registered_before | Yes |
6 | contact_registered_before | Yes |
6 | contact_registered_before | Yes |
6 | contact_registered_before_with | Submitted to the OFHA |
6 | contact_registered_before_with | Submitted to the OFHA |
6 | contact_registered_before_with | Submitted to the OFHA |
6 | contact_registered_other_groups | Yes |
6 | contact_registered_other_groups | Yes |
6 | contact_registered_other_groups | Yes |
6 | country | Canada |
6 | provide_details | Fugiat voluptatem ve |
6 | provide_details | Fugiat voluptatem ve |
6 | provide_details_other_groups | Deserunt qui nisi no |
6 | provide_details_other_groups | Deserunt qui nisi no |
6 | province_canada | Northwest Territories |
6 | submitting_behalf | Yes |
let
Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
#"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),
#"Raggruppate righe" = Table.Group(#"Modificato tipo", {"User ID", "Page ID"}, {{"all", each Table.PromoteHeaders(Table.Transpose(_[[Field], [Value]]))}}),
#"Tabella all espansa" = Table.ExpandTableColumn(#"Raggruppate righe", "all", {"Address", "Phone", "Age"}, {"Address", "Phone", "Age"})
in
#"Tabella all espansa"
it is not clear to me if you have already had a complete solution or not ... in any case try this
let
Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
#"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),
#"Colonna trasformata tramite Pivot" = Table.Pivot(#"Modificato tipo", List.Distinct(#"Modificato tipo"[Field]), "Field", "Value", List.First)
in #"Colonna trasformata tramite Pivot"
The problem is repeated fields in your data. For instance, for sid 6 there are several repeated values for name (see below). That causes the error when pivoting. In the data you've shared, it seems like all those repetitions have the same value. For instance, below for sid 6, all rows with name = contact_registered_before have value = Yes
all rows with name = contact_registered_before_with have value = Submitted to the OFHA
If that is always the case, simply eliminating duplicates before the pivoting will solve it. Otherwise, we will first need to determine how to treat those cases. For example, if for sid 6 you had two rows with
name = contact_registered_before but one with value = Yes and one with value = No, which one will you choose. The M code for the previous example adding the step to remove the duplicates:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),
#"Removed Duplicates" = Table.Distinct(#"Changed Type"),
#"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Field]), "Field", "Value"),
#"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"User ID", "Address", "Phone", "Age", "Page ID"})
in
#"Reordered Columns"
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
sid | name | value |
6 | contact_registered_before | Yes |
6 | contact_registered_before | Yes |
6 | contact_registered_before | Yes |
6 | contact_registered_before_with | Submitted to the OFHA |
6 | contact_registered_before_with | Submitted to the OFHA |
6 | contact_registered_before_with | Submitted to the OFHA |
6 | contact_registered_other_groups | Yes |
6 | contact_registered_other_groups | Yes |
6 | contact_registered_other_groups | Yes |
6 | country | Canada |
6 | provide_details | Fugiat voluptatem ve |
6 | provide_details | Fugiat voluptatem ve |
6 | provide_details_other_groups | Deserunt qui nisi no |
6 | provide_details_other_groups | Deserunt qui nisi no |
6 | province_canada | Northwest Territories |
6 | submitting_behalf | Yes |
Ok so that is fantastic and I think I have good and bad news. I think those duplicates came from a previous merge query (I know where I went wrong) so that can be fixed
There are other that will be duplicated. For example there are questions that might have multiple responses, e.g. in "self_status_issues" the respondent would be able to check up to 10 (or so) checkboxes - the dump only tags them to "self_status_issues" and not to the ID of the checkbox.
I am going to have to see if we can add additional fields in the database extract, although there maybe ways it can be handled in Power BI.
I am going to accept as solution as now I know exactly what the issue is (THANKS!!), this would have taken me a LONG time to work out!!
It can most likely be done in the query editor too. M is quite powerful and versatile. But it might be more efficient to do it directly at the source through SQL. I would need some example data to look into it.
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Post it here. Please post the data you're getting the error on, plus the M code of the transformation
Best if you can share the pbix
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Hi @Steveyd78
Place the following M code in a blank query to see the steps
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),
#"Pivoted Column" = Table.Pivot(#"Changed Type", List.Distinct(#"Changed Type"[Field]), "Field", "Value"),
#"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"User ID", "Address", "Phone", "Age", "Page ID"})
in
#"Reordered Columns"
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Let me slightly modify my last comment. I just re-did it and I DO get the pivot operation working BUT for some values I see and error and when I click on that I get:
"Expression.Error: There were too many elements in the enumeration to complete the operation.
Details:
[List]"
This is now slightly outside the scope of this post (the pivoting seems to work), you think I should post this as another thread?
Thanks again!!
Ok so that seems to be straightforward, when I followed the steps it looked like just like the standard pivot operations out of the user guide
However when I do it on my dataset (and I select don't aggregate) I get this error "There were too many elements in the enumeration to complete the operation". Now my dataset has quite a few more rows but apart from the columns are named differently that is about it! - was there anything that you did in your M that was different from just doing a standard pivot operation?
Thanks, I will work that through and see how well I understand the steps. Really apprecaited!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.