Solved: Re: Pivot columns that have repeated row values - ...

Steveyd78 · ‎01-25-2021

This might end up being a duplicate post, I previously hit submit but it the system threw me out (apoloiges if it is)

So in a nutshell I have data from a Drupal webform dump which I am having to split into different tables and then pivot so that I can format the columns properly (age as a number, address as text etc). The (simplified) raw data looks something like this:

User ID	Field	Value	Page ID
1	Address	123 ABC St	User_info
1	Phone	555-5555555	User_info
1	Age	55	User_info
2	Address	456 DEF St	User_info
2	Phone	123-4567890	User_info
2	Age	33	User_info

I need it to look like this:

User ID	Address	Phone	Age	Page ID
1	123 ABC St	555-5555555	55	User_info
2	456 DEF St	123-4567890	33	User_info

I have read quite a few solutions that need to add indexes, group, add sub-indexes etc. I can follow a fair amount and the closest I get (with no cigar) is something like this:

User ID	Address	Phone	Age	Page ID
1	123 ABC St
		555-5555555
			55
				User_info
2	you get it!

I fundamentally don't quite understand what I need to do, can somebody point me in the right direction!

Thanks in advance

Steve

AlB · ‎01-25-2021

@Steveyd78

The problem is repeated fields in your data. For instance, for sid 6 there are several repeated values for name (see below). That causes the error when pivoting. In the data you've shared, it seems like all those repetitions have the same value. For instance, below for sid 6, all rows with name = contact_registered_before have value = Yes

all rows with name = contact_registered_before_with have value = Submitted to the OFHA

If that is always the case, simply eliminating duplicates before the pivoting will solve it. Otherwise, we will first need to determine how to treat those cases. For example, if for sid 6 you had two rows with

name = contact_registered_before but one with value = Yes and one with value = No, which one will you choose. The M code for the previous example adding the step to remove the duplicates:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),

    #"Removed Duplicates" = Table.Distinct(#"Changed Type"),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Field]), "Field", "Value"),
    #"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"User ID", "Address", "Phone", "Age", "Page ID"})
in
    #"Reordered Columns"

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers

sid	name	value
6	contact_registered_before	Yes
6	contact_registered_before	Yes
6	contact_registered_before	Yes
6	contact_registered_before_with	Submitted to the OFHA
6	contact_registered_before_with	Submitted to the OFHA
6	contact_registered_before_with	Submitted to the OFHA
6	contact_registered_other_groups	Yes
6	contact_registered_other_groups	Yes
6	contact_registered_other_groups	Yes
6	country	Canada
6	provide_details	Fugiat voluptatem ve
6	provide_details	Fugiat voluptatem ve
6	provide_details_other_groups	Deserunt qui nisi no
6	provide_details_other_groups	Deserunt qui nisi no
6	province_canada	Northwest Territories
6	submitting_behalf	Yes

View solution in original post

Anonymous · ‎01-25-2021

let
    Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
    #"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),
    #"Raggruppate righe" = Table.Group(#"Modificato tipo", {"User ID", "Page ID"}, {{"all", each Table.PromoteHeaders(Table.Transpose(_[[Field], [Value]]))}}),
    #"Tabella all espansa" = Table.ExpandTableColumn(#"Raggruppate righe", "all", {"Address", "Phone", "Age"}, {"Address", "Phone", "Age"})
in
    #"Tabella all espansa"

Anonymous · ‎01-25-2021

it is not clear to me if you have already had a complete solution or not ... in any case try this

let
    Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
    #"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),
    #"Colonna trasformata tramite Pivot" = Table.Pivot(#"Modificato tipo", List.Distinct(#"Modificato tipo"[Field]), "Field", "Value", List.First)
    in #"Colonna trasformata tramite Pivot"

AlB · ‎01-25-2021

@Steveyd78

The problem is repeated fields in your data. For instance, for sid 6 there are several repeated values for name (see below). That causes the error when pivoting. In the data you've shared, it seems like all those repetitions have the same value. For instance, below for sid 6, all rows with name = contact_registered_before have value = Yes

all rows with name = contact_registered_before_with have value = Submitted to the OFHA

If that is always the case, simply eliminating duplicates before the pivoting will solve it. Otherwise, we will first need to determine how to treat those cases. For example, if for sid 6 you had two rows with

name = contact_registered_before but one with value = Yes and one with value = No, which one will you choose. The M code for the previous example adding the step to remove the duplicates:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),

    #"Removed Duplicates" = Table.Distinct(#"Changed Type"),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Field]), "Field", "Value"),
    #"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"User ID", "Address", "Phone", "Age", "Page ID"})
in
    #"Reordered Columns"

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers

sid	name	value
6	contact_registered_before	Yes
6	contact_registered_before	Yes
6	contact_registered_before	Yes
6	contact_registered_before_with	Submitted to the OFHA
6	contact_registered_before_with	Submitted to the OFHA
6	contact_registered_before_with	Submitted to the OFHA
6	contact_registered_other_groups	Yes
6	contact_registered_other_groups	Yes
6	contact_registered_other_groups	Yes
6	country	Canada
6	provide_details	Fugiat voluptatem ve
6	provide_details	Fugiat voluptatem ve
6	provide_details_other_groups	Deserunt qui nisi no
6	provide_details_other_groups	Deserunt qui nisi no
6	province_canada	Northwest Territories
6	submitting_behalf	Yes

Steveyd78 · ‎01-25-2021

Ok so that is fantastic and I think I have good and bad news. I think those duplicates came from a previous merge query (I know where I went wrong) so that can be fixed

There are other that will be duplicated. For example there are questions that might have multiple responses, e.g. in "self_status_issues" the respondent would be able to check up to 10 (or so) checkboxes - the dump only tags them to "self_status_issues" and not to the ID of the checkbox.

I am going to have to see if we can add additional fields in the database extract, although there maybe ways it can be handled in Power BI.

I am going to accept as solution as now I know exactly what the issue is (THANKS!!), this would have taken me a LONG time to work out!!

AlB · ‎01-25-2021

@Steveyd78

It can most likely be done in the query editor too. M is quite powerful and versatile. But it might be more efficient to do it directly at the source through SQL. I would need some example data to look into it.

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers

AlB · ‎01-25-2021

@Steveyd78

Post it here. Please post the data you're getting the error on, plus the M code of the transformation

Best if you can share the pbix

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers

AlB · ‎01-25-2021

Hi @Steveyd78

Place the following M code in a blank query to see the steps

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJMSSlKLS4GsgyNjBUcnZwVgkuAnNDi1KL4zLy0fKVYHYjCgIz8vFQgbWpqqmsKAVjVOaZDVGFIGqHYZmJqpuDi6obNNiMk24CO0gUqNbewNMBuINg2Y2NUyVgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"User ID" = _t, Field = _t, Value = _t, #"Page ID" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User ID", Int64.Type}, {"Field", type text}, {"Value", type text}, {"Page ID", type text}}),

    #"Pivoted Column" = Table.Pivot(#"Changed Type", List.Distinct(#"Changed Type"[Field]), "Field", "Value"),
    #"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"User ID", "Address", "Phone", "Age", "Page ID"})
in
    #"Reordered Columns"

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers

Steveyd78 · ‎01-25-2021

Let me slightly modify my last comment. I just re-did it and I DO get the pivot operation working BUT for some values I see and error and when I click on that I get:

"Expression.Error: There were too many elements in the enumeration to complete the operation.
Details:
[List]"

This is now slightly outside the scope of this post (the pivoting seems to work), you think I should post this as another thread?

Thanks again!!

Steveyd78 · ‎01-25-2021

Ok so that seems to be straightforward, when I followed the steps it looked like just like the standard pivot operations out of the user guide

However when I do it on my dataset (and I select don't aggregate) I get this error "There were too many elements in the enumeration to complete the operation". Now my dataset has quite a few more rows but apart from the columns are named differently that is about it! - was there anything that you did in your M that was different from just doing a standard pivot operation?

Steveyd78 · ‎01-25-2021

Thanks, I will work that through and see how well I understand the steps. Really apprecaited!

Pivot columns that have repeated row values - other solutions don't seem to work with my data

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024