Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ElliotP
Post Prodigy
Post Prodigy

Unpivoting multiple columns

Morning,

 

Updated 9/6

 

 

I currently have a large number of rows which are the result of un-nesting from a json file. I would like to be unpivot two groups of rows as so instead of there simply being one long row per TransactionID there are multiple rows with the same TransactionID but different ItemName category groups.

 

At the moment, there are seven ItemName category groups (ItemName1, ItemName2) which have the same columns names, structures and types but are only differentiated by their respective ItemName, for example ItemName1, ItemName2, ItemName3.

 

I would like there to be a maximum of 7 rows for each permanent grouping, which have the same permanent column groups and only different values from the different ItemName clusters. After we've completed this in powerquery, I would then like to remove the rows which contain a "null" ItemName*ItemID value. So there could be up to seven rows where the permanent grouping of columns (will list below) remain the same, but the values in the variable columns (will list below) are filled by the values from the seven different ItemName clusters.

 

Permanent Columns (same for each row) in the unpivot:

TransactionIDlist_merchant_idlist_creator_idDateTimelist_tender_typelist_tender_namelist_tender_idlist_tender_total_money_amountlist_tender_tendered_money_amountlist_tender_change_back_money_amountlist_tender_refunded_money_amountlist_refunds_typelist_refunds_created_atlist_refunds_processed_atlist_refunds_reasonlist_refunds_refunded_money_amountlist_refunds_payment_idlist_refunds_merchant_idlist_refunds_refunded_processing_fee_money_amountlist_refunds_refunded_additive_tax_money_amountlist_refunds_refunded_inclusive_tax_money_amountlist_refunds_refunded_tip_money_amountlist_refunds_refunded_discount_money_amount

These are the first 23 columns in the excel file.

 

Variable Columns (same column names with subtle difference of number for grouping, but the values which are different for each unpivot'ed rows):

ItemName1NotesItemName1QuantityItemName1ItemName1CategoryItemName1SkuItemName1ItemIDItemName1VariationIDItem Name1DiscountAmountSales1ItemName1DiscountAmountItemName1Modifier1NameItemName1Modifier1AmountItemName1Modifier1IDlist_itemizations_itemization_typeItemName1Modifier2NameItemName1Modifier2AmountItemName1Modifier2IDItemName1Modifier3NameItemName1Modifier3AmountItemName1Modifier3IDItemName1Modifier4NameItemName1Modifier4AmountItemName1Modifier4IDItemName1Modifier5NameItemName1Modifier5Amount

ItemName1Modifier5ID

 

 

These are the next 26 columns in the excel file.

 

I have attached the pbix and the excel file.

 

In the excel file, Sheet One contains the table structure and the data examples. Sheet Two contains an example of how I would like the data to be pivoted to.

 

Pbix and Excel File Link: https://1drv.ms/f/s!At8Q-ZbRnAj8hl4QQ9aqeygF46VF

 

@MarcelBeug has posted a lot about this and I feel his solutions are the right idea and direction. I'll post links and ideas later tonight.

 

1 ACCEPTED SOLUTION

You can easily combine the different queries into 1:

 

let
    FixedParts = List.Transform({1..7}, each "Modifier"&Text.From(_))&{
        "ItemName",
        "Item Name",
        "Notes",
        "Quantity",
        "Category",
        "Sku",
        "ItemID",
        "VariationID",
        "Sales",
        "DiscountAmount",
        "Discount",
        "Name",
        "Amount",
        "ID",
        "list_itemizations_itemization_type"},

    Source = Excel.Workbook(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\Elliot 20170609\unpivotdatastructure.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
    #"Replaced Value" = Table.ReplaceValue(#"Promoted Headers","NULL",null,Replacer.ReplaceValue,Table.ColumnNames(#"Promoted Headers")),

    First23columnNames = List.FirstN(Table.ColumnNames(#"Replaced Value"),23),
    Unpivot = Table.UnpivotOtherColumns(#"Replaced Value", First23columnNames, "Attribute", "Value"),

    VariableParts = Table.AddColumn(Unpivot, "VariableParts", each List.Select(Splitter.SplitTextByAnyDelimiter(FixedParts)([Attribute]), each _ <> "")),
    #"Added Custom" = Table.AddColumn(VariableParts, "FixedPart", each Text.Combine(List.Select(Splitter.SplitTextByEachDelimiter([VariableParts])([Attribute]), each _ <> ""))),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom",{{"VariableParts", Text.Combine, type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Trimmed Text",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[FixedPart]), "FixedPart", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each [VariableParts] >= "1" and [VariableParts] <= "7")
in
    #"Filtered Rows"

 

Once your source data is correct, you probably won't need the last step anymore.

Specializing in Power Query Formula Language (M)

View solution in original post

12 REPLIES 12
MarkS
Resolver IV
Resolver IV

Hi,

The value of 25 in those formulas is the number of columns that were unpivoted.  Change as data requires.  Of course if there are variable columns as suggested this approach probably won't work (all depends how your raw data is organized).

MarcelBeug
Community Champion
Community Champion

Well, it may be a heavy topic, but it is still fascinating and I made some progress.

There are some points of attention left, especially: duplicated column names in the input.

 

First, I replaced all NULL's with null in the Sheet1 query which I named query Sheet1_NULL_To_null. I hope this is OK as it cleans up a bit. The code (with my file path) is now:

 

let
    Source = Excel.Workbook(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\Elliot 20170609\unpivotdatastructure.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
    #"Replaced Value" = Table.ReplaceValue(#"Promoted Headers","NULL",null,Replacer.ReplaceValue,Table.ColumnNames(#"Promoted Headers"))
in
    #"Replaced Value"

I also removed the #"Changed Type" step as this will give errors if columns will change (in fact it already gave an error).

 

 

Second, I unpivoted all except the first 23 columns in query Unpivot:

let
    Source = Sheet1_NULL_To_null,
    First23columnNames = List.FirstN(Table.ColumnNames(Source),23),
    Unpivot = Table.UnpivotOtherColumns(Source, First23columnNames, "Attribute", "Value")
in
    Unpivot

 

Third, I created a query FixedParts that returns all fixed parts in the column names:

 

List.Transform({1..7}, each "Modifier"&Text.From(_))&{
        "ItemName",
        "Item Name",
        "Notes",
        "Quantity",
        "Category",
        "Sku",
        "ItemID",
        "VariationID",
        "Sales",
        "DiscountAmount",
        "Discount",
        "Name",
        "Amount",
        "ID",
        "list_itemizations_itemization_type"}

 

Next, I created a query Pivot:, that splits the "Attribute" column in a fixed part and a variable part.

This is quite an interesting query bcause of the way Splitter functions are used to split the fixed parts and the variable parts of the column names.

Example column name "ItemName1Quantity" is split on any delimiter (which are the fixed parts), which includes "ItemName" and "Quantity", so this leaves "1" (as the delimiters are not part of the result after splitting). Next, the column name is split again on each delimiter, with the variable parts (the "1" in this example) as delimiter, so this will take out the "1", leaving "ItemName" and "Quantity" which are then combined to "ItemNameQuantity" as fixed column name..

 

And the query is also interesting because it produces the end result. 

 

let
    Source = Unpivot,
    VariableParts = Table.AddColumn(Source, "VariableParts", each List.Select(Splitter.SplitTextByAnyDelimiter(FixedParts)([Attribute]), each _ <> "")),
    #"Added Custom" = Table.AddColumn(VariableParts, "FixedPart", each Text.Combine(List.Select(Splitter.SplitTextByEachDelimiter([VariableParts])([Attribute]), each _ <> ""))),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom",{{"VariableParts", Text.Combine, type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Trimmed Text",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[FixedPart]), "FixedPart", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each [VariableParts] >= "1" and [VariableParts] <= "7")
in
    #"Filtered Rows"

Actually, the variable part should only contain the numbers 1-7, but the source data has duplicate column names:

 

 

Some fields for Item1 are duplicated, some of them in this picture (but there are more duplicates):

Duplicate columns 1.png

These fields are automatically removed during the unpivot step, because of the NULL values that I replaced with null.

 

But field list_itemizations_itemization_type appears 7 times. I think these should have had group numbers 1-7:

Duplicate columns 2.png

 

Consequently, these fields were given suffixes ("_1", "_2" etcetera) during the #"Promoted Headers" step in the very first query.

This is how it looks like, just before the last step in the last query (query Pivot😞

 

field without group number.png

 

Next steps for you to try and copy the queries above to your environment, so you can follow all the steps, then clean up your input data, see if it all works and provide feedback.

Specializing in Power Query Formula Language (M)

@MarcelBeugI will later tonight after work, thank you so much. The columns with duplicate names are a result of my mistake in not renamining them, they should have 1,2,3,etc following the names. The modifier columns should also follow the naming convention, sorry.

 

If I'm reading your post correctly, there are multiple queries which reference each other, as opposed to just one large query?

You can easily combine the different queries into 1:

 

let
    FixedParts = List.Transform({1..7}, each "Modifier"&Text.From(_))&{
        "ItemName",
        "Item Name",
        "Notes",
        "Quantity",
        "Category",
        "Sku",
        "ItemID",
        "VariationID",
        "Sales",
        "DiscountAmount",
        "Discount",
        "Name",
        "Amount",
        "ID",
        "list_itemizations_itemization_type"},

    Source = Excel.Workbook(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\Elliot 20170609\unpivotdatastructure.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
    #"Replaced Value" = Table.ReplaceValue(#"Promoted Headers","NULL",null,Replacer.ReplaceValue,Table.ColumnNames(#"Promoted Headers")),

    First23columnNames = List.FirstN(Table.ColumnNames(#"Replaced Value"),23),
    Unpivot = Table.UnpivotOtherColumns(#"Replaced Value", First23columnNames, "Attribute", "Value"),

    VariableParts = Table.AddColumn(Unpivot, "VariableParts", each List.Select(Splitter.SplitTextByAnyDelimiter(FixedParts)([Attribute]), each _ <> "")),
    #"Added Custom" = Table.AddColumn(VariableParts, "FixedPart", each Text.Combine(List.Select(Splitter.SplitTextByEachDelimiter([VariableParts])([Attribute]), each _ <> ""))),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom",{{"VariableParts", Text.Combine, type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Trimmed Text",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[FixedPart]), "FixedPart", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each [VariableParts] >= "1" and [VariableParts] <= "7")
in
    #"Filtered Rows"

 

Once your source data is correct, you probably won't need the last step anymore.

Specializing in Power Query Formula Language (M)

Evening,

 

Sorry for not getting back faster. I've spent the past few days working with the solution; it's amazing. Thank you so much, I greatly appreciate it. It works phenominally and has solved a major headache for me. @MarcelBeug you again have come up with an incredible solution, thank you so much, you have no idea how it means to me.

MarkS
Resolver IV
Resolver IV

Hello,

One way to accomplish this:

  1. Select the first 23 columns and Unpivot Other Columns
  2. Add an index Column
  3. Add a custom Column with the formula =Number.Mod([Index],25)  
  4. Add another custom Column with the formula = Number.IntegerDivide([Index],25)+1
  5. Remove the Index and Attribute columns
  6. select the column created in step 3 and Pivot Column - Select the Value column for the values and in the addvanced section select don't aggregate

You will now need to edit the headers that are showing as the numbers 0 through 24 to the correct headers.  (you could also make a seperate table with the numbers 0 through 24 and the headers and join that table on the column created in 3 and use this new column in the pivot step)

Actually,

thinking about it the formula in step 4 should be =Number.Mod(Number.IntegerDivide([Index],25),7)+1 so that it cycles from 1 to 7 for each record

MarcelBeug
Community Champion
Community Champion

@MarkS Unfortunatley this case is far more complicated as there are variable numbers of columns per group.

 

@ElliotP The key to the solution here is to split the column names in a fixed part (which will be the new column names) and the variable part (which will become the row identifiers).

 

You can of course work with a fixed list of the fixed parts, but I don't think that would be a good idea:

there are some inaccuracies already ("Item Name1DiscountAmount" with a space, if "Sales1" is the fixed part then you'll never get a match for "Sales2", I guess "Sales1" should be "Sales")

and moreover I suspect the list won't be fixed over time.

 

The other approach would be to specify the exact rules how the actual column names can be split into a fixed part and a variable part. My first idea was to take out the digits 0-9, but unfortunately there are also fixed parts that include digits.

These specifications must be very precise, otherwise they can't be automated, so you need to eloborate on your phrase "same column names with subtle difference of number for grouping".

 

In my view this case is now beyond the kind of support you can expect from a free forum.

 

Specializing in Power Query Formula Language (M)

@MarkSThanks for the response, I really appreciate it. Is the value (25) in step four and five dependent upon the number of columns under consideration? As in, can I slide it up if there are slightly more?

 

@MarcelBeugThanks for the response. The list is fixed and has a fixed number of parts (though some row values may be null which can be easily removed with power query).

 

Sales1, Sales2, Sales3, etc would become Sales for example. The same would go for each column to accomodate the 7 possible row variations.

 

The list is fixed and will not increase or decrease over time. It will remain groups of seven identical packages if you will.

 

I don't mind if we can change it from 'Sales1' to 'SalesA' if you think that's accomodative, that was purely an easy organisational thing.

 

Sorry for not being clear in regards to "same column names with subtle difference of number of grouping". What I mean, is that there are for each transaction 7 identical groupings of the ItemName about the Item and the 5 Modifier groups of 3. In essence,

ItemNameNotesItemNameQuantityItemNameItemNameCategoryItemNameSkuItemNameItemIDItemNameVariationIDItem Name1DiscountAmountSales1ItemNameDiscountAmount

&

ItemNameModifier1NameItemNameModifier1AmountItemNameModifier1IDlist_itemizations_itemization_typeItemNameModifier2NameItemNameModifier2AmountItemNameModifier2IDItemNameModifier3NameItemNameModifier3AmountItemNameModifier3IDItemNameModifier4NameItemNameModifier4AmountItemNameModifier4IDItemNameModifier5NameItemNameModifier5AmountItemNameModifier5ID

Will be the same, but there will be 7 combinations of both.

 

I agree, this is a complicated solution and one which is well beyond that of a free forum and a community that gives advice/solutions for free. I greatly appreciate the community & especially @MarcelBeug for your incredible solutions in the past and the beyond amazing solution to the UTC time zone swapping issue. If we can find a solution that would be ideal, but I understand and appreciate everything.

ElliotP
Post Prodigy
Post Prodigy

Updated to include pbix and excel file.

I know nothing about API, data factory, Azure Blobs, Azure tables or modifiers.

A phrase like "a transaction with 4 items and 2,0,0 and 1 modifier applied to the items" is complete abacadabra to me.

 

What I would expect from your description, is 2 fields - TransactionID and DateTime - plus a mulitple of 7 fields, each with the same name except for a number that identifies each group of 7 fields.

 

Instead, in your example there are not 2 but 109 (!) fields without digits in the field name.

 

So I won't be able to help you.

Maybe if you come up with a far better explanation and realistic examples of the required transformations.

 

For instance, how should the result look like, for all fields with number 1 in the column name:

Fields with number 1.png

Specializing in Power Query Formula Language (M)

@MarcelBeugUpdated Original Post to be a bit more clear. I've included what I would like to achieve, an example, an excel file demonstrating as well as a pbix. I cleaned up the headings and removed the unneccessary ones so there aren't as many columns.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.