Solved: Unpivoting multiple columns

ElliotP · ‎06-07-2017

Morning,

Updated 9/6

I currently have a large number of rows which are the result of un-nesting from a json file. I would like to be unpivot two groups of rows as so instead of there simply being one long row per TransactionID there are multiple rows with the same TransactionID but different ItemName category groups.

At the moment, there are seven ItemName category groups (ItemName1, ItemName2) which have the same columns names, structures and types but are only differentiated by their respective ItemName, for example ItemName1, ItemName2, ItemName3.

I would like there to be a maximum of 7 rows for each permanent grouping, which have the same permanent column groups and only different values from the different ItemName clusters. After we've completed this in powerquery, I would then like to remove the rows which contain a "null" ItemName*ItemID value. So there could be up to seven rows where the permanent grouping of columns (will list below) remain the same, but the values in the variable columns (will list below) are filled by the values from the seven different ItemName clusters.

Permanent Columns (same for each row) in the unpivot:

TransactionID

list_merchant_id

list_creator_id

DateTime

list_tender_type

list_tender_name

list_tender_id

list_tender_total_money_amount

list_tender_tendered_money_amount

list_tender_change_back_money_amount

list_tender_refunded_money_amount

list_refunds_type

list_refunds_created_at

list_refunds_processed_at

list_refunds_reason

list_refunds_refunded_money_amount

list_refunds_payment_id

list_refunds_merchant_id

list_refunds_refunded_processing_fee_money_amount

list_refunds_refunded_additive_tax_money_amount

list_refunds_refunded_inclusive_tax_money_amount

list_refunds_refunded_tip_money_amount

list_refunds_refunded_discount_money_amount

These are the first 23 columns in the excel file.

Variable Columns (same column names with subtle difference of number for grouping, but the values which are different for each unpivot'ed rows):

ItemName1Notes

ItemName1Quantity

ItemName1

ItemName1Category

ItemName1Sku

ItemName1ItemID

ItemName1VariationID

Item Name1DiscountAmount

Sales1

ItemName1DiscountAmount

ItemName1Modifier1Name

ItemName1Modifier1Amount

ItemName1Modifier1ID

list_itemizations_itemization_type

ItemName1Modifier2Name

ItemName1Modifier2Amount

ItemName1Modifier2ID

ItemName1Modifier3Name

ItemName1Modifier3Amount

ItemName1Modifier3ID

ItemName1Modifier4Name

ItemName1Modifier4Amount

ItemName1Modifier4ID

ItemName1Modifier5Name

ItemName1Modifier5Amount

ItemName1Modifier5ID

These are the next 26 columns in the excel file.

I have attached the pbix and the excel file.

In the excel file, Sheet One contains the table structure and the data examples. Sheet Two contains an example of how I would like the data to be pivoted to.

Pbix and Excel File Link: https://1drv.ms/f/s!At8Q-ZbRnAj8hl4QQ9aqeygF46VF

@MarcelBeug has posted a lot about this and I feel his solutions are the right idea and direction. I'll post links and ideas later tonight.

MarcelBeug · ‎06-09-2017

You can easily combine the different queries into 1:

let
    FixedParts = List.Transform({1..7}, each "Modifier"&Text.From(_))&{
        "ItemName",
        "Item Name",
        "Notes",
        "Quantity",
        "Category",
        "Sku",
        "ItemID",
        "VariationID",
        "Sales",
        "DiscountAmount",
        "Discount",
        "Name",
        "Amount",
        "ID",
        "list_itemizations_itemization_type"},

    Source = Excel.Workbook(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\Elliot 20170609\unpivotdatastructure.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
    #"Replaced Value" = Table.ReplaceValue(#"Promoted Headers","NULL",null,Replacer.ReplaceValue,Table.ColumnNames(#"Promoted Headers")),

    First23columnNames = List.FirstN(Table.ColumnNames(#"Replaced Value"),23),
    Unpivot = Table.UnpivotOtherColumns(#"Replaced Value", First23columnNames, "Attribute", "Value"),

    VariableParts = Table.AddColumn(Unpivot, "VariableParts", each List.Select(Splitter.SplitTextByAnyDelimiter(FixedParts)([Attribute]), each _ <> "")),
    #"Added Custom" = Table.AddColumn(VariableParts, "FixedPart", each Text.Combine(List.Select(Splitter.SplitTextByEachDelimiter([VariableParts])([Attribute]), each _ <> ""))),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom",{{"VariableParts", Text.Combine, type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Trimmed Text",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[FixedPart]), "FixedPart", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each [VariableParts] >= "1" and [VariableParts] <= "7")
in
    #"Filtered Rows"

Once your source data is correct, you probably won't need the last step anymore.

Specializing in Power Query Formula Language (M)

View solution in original post

MarkS · ‎06-09-2017

Hi,

The value of 25 in those formulas is the number of columns that were unpivoted. Change as data requires. Of course if there are variable columns as suggested this approach probably won't work (all depends how your raw data is organized).

MarcelBeug · ‎06-09-2017

Well, it may be a heavy topic, but it is still fascinating and I made some progress.

There are some points of attention left, especially: duplicated column names in the input.

First, I replaced all NULL's with null in the Sheet1 query which I named query Sheet1_NULL_To_null. I hope this is OK as it cleans up a bit. The code (with my file path) is now:

let
    Source = Excel.Workbook(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\Elliot 20170609\unpivotdatastructure.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
    #"Replaced Value" = Table.ReplaceValue(#"Promoted Headers","NULL",null,Replacer.ReplaceValue,Table.ColumnNames(#"Promoted Headers"))
in
    #"Replaced Value"

I also removed the #"Changed Type" step as this will give errors if columns will change (in fact it already gave an error).

Second, I unpivoted all except the first 23 columns in query Unpivot:

let
    Source = Sheet1_NULL_To_null,
    First23columnNames = List.FirstN(Table.ColumnNames(Source),23),
    Unpivot = Table.UnpivotOtherColumns(Source, First23columnNames, "Attribute", "Value")
in
    Unpivot

Third, I created a query FixedParts that returns all fixed parts in the column names:

List.Transform({1..7}, each "Modifier"&Text.From(_))&{
        "ItemName",
        "Item Name",
        "Notes",
        "Quantity",
        "Category",
        "Sku",
        "ItemID",
        "VariationID",
        "Sales",
        "DiscountAmount",
        "Discount",
        "Name",
        "Amount",
        "ID",
        "list_itemizations_itemization_type"}

Next, I created a query Pivot:, that splits the "Attribute" column in a fixed part and a variable part.

This is quite an interesting query bcause of the way Splitter functions are used to split the fixed parts and the variable parts of the column names.

Example column name "ItemName1Quantity" is split on any delimiter (which are the fixed parts), which includes "ItemName" and "Quantity", so this leaves "1" (as the delimiters are not part of the result after splitting). Next, the column name is split again on each delimiter, with the variable parts (the "1" in this example) as delimiter, so this will take out the "1", leaving "ItemName" and "Quantity" which are then combined to "ItemNameQuantity" as fixed column name..

And the query is also interesting because it produces the end result.

let
    Source = Unpivot,
    VariableParts = Table.AddColumn(Source, "VariableParts", each List.Select(Splitter.SplitTextByAnyDelimiter(FixedParts)([Attribute]), each _ <> "")),
    #"Added Custom" = Table.AddColumn(VariableParts, "FixedPart", each Text.Combine(List.Select(Splitter.SplitTextByEachDelimiter([VariableParts])([Attribute]), each _ <> ""))),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom",{{"VariableParts", Text.Combine, type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Trimmed Text",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[FixedPart]), "FixedPart", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each [VariableParts] >= "1" and [VariableParts] <= "7")
in
    #"Filtered Rows"

Actually, the variable part should only contain the numbers 1-7, but the source data has duplicate column names:

Some fields for Item1 are duplicated, some of them in this picture (but there are more duplicates):

These fields are automatically removed during the unpivot step, because of the NULL values that I replaced with null.

But field list_itemizations_itemization_type appears 7 times. I think these should have had group numbers 1-7:

Consequently, these fields were given suffixes ("_1", "_2" etcetera) during the #"Promoted Headers" step in the very first query.

This is how it looks like, just before the last step in the last query (query Pivot😞

Next steps for you to try and copy the queries above to your environment, so you can follow all the steps, then clean up your input data, see if it all works and provide feedback.

Specializing in Power Query Formula Language (M)

ElliotP · ‎06-09-2017

@MarcelBeugI will later tonight after work, thank you so much. The columns with duplicate names are a result of my mistake in not renamining them, they should have 1,2,3,etc following the names. The modifier columns should also follow the naming convention, sorry.

If I'm reading your post correctly, there are multiple queries which reference each other, as opposed to just one large query?

MarcelBeug · ‎06-09-2017

You can easily combine the different queries into 1:

let
    FixedParts = List.Transform({1..7}, each "Modifier"&Text.From(_))&{
        "ItemName",
        "Item Name",
        "Notes",
        "Quantity",
        "Category",
        "Sku",
        "ItemID",
        "VariationID",
        "Sales",
        "DiscountAmount",
        "Discount",
        "Name",
        "Amount",
        "ID",
        "list_itemizations_itemization_type"},

    Source = Excel.Workbook(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\Elliot 20170609\unpivotdatastructure.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
    #"Replaced Value" = Table.ReplaceValue(#"Promoted Headers","NULL",null,Replacer.ReplaceValue,Table.ColumnNames(#"Promoted Headers")),

    First23columnNames = List.FirstN(Table.ColumnNames(#"Replaced Value"),23),
    Unpivot = Table.UnpivotOtherColumns(#"Replaced Value", First23columnNames, "Attribute", "Value"),

    VariableParts = Table.AddColumn(Unpivot, "VariableParts", each List.Select(Splitter.SplitTextByAnyDelimiter(FixedParts)([Attribute]), each _ <> "")),
    #"Added Custom" = Table.AddColumn(VariableParts, "FixedPart", each Text.Combine(List.Select(Splitter.SplitTextByEachDelimiter([VariableParts])([Attribute]), each _ <> ""))),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom",{{"VariableParts", Text.Combine, type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Trimmed Text",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[FixedPart]), "FixedPart", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each [VariableParts] >= "1" and [VariableParts] <= "7")
in
    #"Filtered Rows"

Once your source data is correct, you probably won't need the last step anymore.

Specializing in Power Query Formula Language (M)

ElliotP · ‎06-13-2017

Evening,

Sorry for not getting back faster. I've spent the past few days working with the solution; it's amazing. Thank you so much, I greatly appreciate it. It works phenominally and has solved a major headache for me. @MarcelBeug you again have come up with an incredible solution, thank you so much, you have no idea how it means to me.

MarkS · ‎06-08-2017

Hello,

One way to accomplish this:

Select the first 23 columns and Unpivot Other Columns
Add an index Column
Add a custom Column with the formula =Number.Mod([Index],25)
Add another custom Column with the formula = Number.IntegerDivide([Index],25)+1
Remove the Index and Attribute columns
select the column created in step 3 and Pivot Column - Select the Value column for the values and in the addvanced section select don't aggregate

You will now need to edit the headers that are showing as the numbers 0 through 24 to the correct headers. (you could also make a seperate table with the numbers 0 through 24 and the headers and join that table on the column created in 3 and use this new column in the pivot step)

MarkS · ‎06-08-2017

Actually,

thinking about it the formula in step 4 should be =Number.Mod(Number.IntegerDivide([Index],25),7)+1 so that it cycles from 1 to 7 for each record

MarcelBeug · ‎06-08-2017

@MarkS Unfortunatley this case is far more complicated as there are variable numbers of columns per group.

@ElliotP The key to the solution here is to split the column names in a fixed part (which will be the new column names) and the variable part (which will become the row identifiers).

You can of course work with a fixed list of the fixed parts, but I don't think that would be a good idea:

there are some inaccuracies already ("Item Name1DiscountAmount" with a space, if "Sales1" is the fixed part then you'll never get a match for "Sales2", I guess "Sales1" should be "Sales")

and moreover I suspect the list won't be fixed over time.

The other approach would be to specify the exact rules how the actual column names can be split into a fixed part and a variable part. My first idea was to take out the digits 0-9, but unfortunately there are also fixed parts that include digits.

These specifications must be very precise, otherwise they can't be automated, so you need to eloborate on your phrase "same column names with subtle difference of number for grouping".

In my view this case is now beyond the kind of support you can expect from a free forum.

Specializing in Power Query Formula Language (M)

ElliotP · ‎06-09-2017

@MarkSThanks for the response, I really appreciate it. Is the value (25) in step four and five dependent upon the number of columns under consideration? As in, can I slide it up if there are slightly more?

@MarcelBeugThanks for the response. The list is fixed and has a fixed number of parts (though some row values may be null which can be easily removed with power query).

Sales1, Sales2, Sales3, etc would become Sales for example. The same would go for each column to accomodate the 7 possible row variations.

The list is fixed and will not increase or decrease over time. It will remain groups of seven identical packages if you will.

I don't mind if we can change it from 'Sales1' to 'SalesA' if you think that's accomodative, that was purely an easy organisational thing.

Sorry for not being clear in regards to "same column names with subtle difference of number of grouping". What I mean, is that there are for each transaction 7 identical groupings of the ItemName about the Item and the 5 Modifier groups of 3. In essence,

ItemNameNotes

ItemNameQuantity

ItemName

ItemNameCategory

ItemNameSku

ItemNameItemID

ItemNameVariationID

Item Name1DiscountAmount

Sales1

ItemNameDiscountAmount

&

ItemNameModifier1Name

ItemNameModifier1Amount

ItemNameModifier1ID

list_itemizations_itemization_type

ItemNameModifier2Name

ItemNameModifier2Amount

ItemNameModifier2ID

ItemNameModifier3Name

ItemNameModifier3Amount

ItemNameModifier3ID

ItemNameModifier4Name

ItemNameModifier4Amount

ItemNameModifier4ID

ItemNameModifier5Name

ItemNameModifier5Amount

ItemNameModifier5ID

Will be the same, but there will be 7 combinations of both.

I agree, this is a complicated solution and one which is well beyond that of a free forum and a community that gives advice/solutions for free. I greatly appreciate the community & especially @MarcelBeug for your incredible solutions in the past and the beyond amazing solution to the UTC time zone swapping issue. If we can find a solution that would be ideal, but I understand and appreciate everything.

ElliotP · ‎06-08-2017

Updated to include pbix and excel file.

MarcelBeug · ‎06-08-2017

I know nothing about API, data factory, Azure Blobs, Azure tables or modifiers.

A phrase like "a transaction with 4 items and 2,0,0 and 1 modifier applied to the items" is complete abacadabra to me.

What I would expect from your description, is 2 fields - TransactionID and DateTime - plus a mulitple of 7 fields, each with the same name except for a number that identifies each group of 7 fields.

Instead, in your example there are not 2 but 109 (!) fields without digits in the field name.

So I won't be able to help you.

Maybe if you come up with a far better explanation and realistic examples of the required transformations.

For instance, how should the result look like, for all fields with number 1 in the column name:

Specializing in Power Query Formula Language (M)

ElliotP · ‎06-08-2017

@MarcelBeugUpdated Original Post to be a bit more clear. I've included what I would like to achieve, an example, an excel file demonstrating as well as a pbix. I cleaned up the headings and removed the unneccessary ones so there aren't as many columns.

Unpivoting multiple columns

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly