Solved: Flattening multiple related rows in Power Query

jPinhao · ‎11-02-2016

We are loading some JSON data, which holds some arrays with varying property-value pairs. Due to the JSON structure, once you start expanding the fields you end up with something along these lines:

id | key1 | key2 | key3 | key4
0  | a    | null | null | null 
0  | null | b    | null | null
0  | null | null | c    | null
1  | a    | null | null | null
1  | null | b    | null | null
1  | null | null | c    | null
1  | null | null | null | d

We'd like to flatten these rows by ID to end up with something like:

id | key1 | key2 | key3 | key4
0  | a    | b    | c    | null 
1  | a    | b    | c    | d

The simplest way I found to solve this is by unpivotting the key columns, and then pivotting it back. However in some cases, we actually have quite a few more columns, and thousands of rows, in a quite sparse table. From previous trials I found that unpivot/pivot can be a big performance hit.

Is there a better way to solve this issue? Either in Power Query or with an R script?

EDIT: I beleive I found a way to do this with Table.Group:

Table.Group(sourceTable, {"id"}, {{"key1", each List.RemoveNulls([key1]){0},  type none}, {"key2", ...},  ...})

I'd like to do this across all columns (except ID), or from a List of column names - how would I go about making Table.Group work from a list of values dynamically?

jPinhao · ‎11-03-2016

Thanks for the replies! I actually managed to solve this on my own in the end 🙂

@ImkeF - your solution is interesting, I'm guessing FillUp will find the bottom-most non-null value and fill any rows above with it?

My solution - wrote a function that will find and return the first non-null value in a list (or a default if all null), and use that as the aggregator. I then build a list from the original list of column names that will run the operation on each named column:

//FirstNotNull
let
    Source = (sourceList as list) =>
        let
            firstNotNull = List.First(List.RemoveNulls(sourceList), "Not Applicable")
        in
            firstNotNull
in
    Source

//DynamicTableGroupColumns
let
    Source = (sourceTable as table, columns as list, aggregateFunction as function) =>
        let
            result = List.Transform(columns, each 
                                            // build lists with {columnName, aggregateFunction}
                                            let
                                                //save current _ (column name) to use in next each statement
                                                columnName = _,
                                                columnToFunctionList =  {columnName, each 
                                                                            //_ will be the grouping table as it's called by Table.Group
                                                                            aggregateFunction(Table.Column(_, columnName))}
                                            in
                                                columnToFunctionList)
        in
            result
in
    Source

//DynamicTableGroup
let
    Source = (sourceTable as table, groupBy as list, columns as list, aggregateFunction as function) =>
        let 
            result = Table.Group(sourceTable , groupBy, DynamicTableGroupColumns(sourceTable, columns, aggregateFunction))
        in
            result
in
    Source

Any comments on one method being better than the other? Will your method of FillUp into a single column and then expanding the relevant fields be more performant that preparing a list of lists to feed to Table.Group?

EDIT: @ImkeF just timed the 2 queries, and filling up into one column and then expanding seemed to take 2min20s, while my approach took 58s! Yesterday I had also timed doing an unpivot/pivot over all columns, and that was taking about 1min45s. I'm not sure how the unpivot/pivot scales with more columns and rows, but I'd assume our 2 methods would scale similarly.

Feel free to use the set of functions I put up in case you find use for them to speed up any queries! Or let me know if don't see similar results 🙂

View solution in original post

ImkeF · ‎11-03-2016

A possible solution with dynamic column headers is this:

let
    Source = YourTable,
    #"Grouped Rows" = Table.Group(Source, {"id"}, {{"FillUp", each Table.FirstN(Table.FillUp(_,Table.ColumnNames(_)), 1), type table}}),
    #"Expanded FillUp" = Table.ExpandTableColumn(#"Grouped Rows", "FillUp", List.Skip(Table.ColumnNames(Source),1), List.Skip(Table.ColumnNames(Source),1))
in
    #"Expanded FillUp"

It pushes all items to the first row and then just keeps that.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

jfclark27 · ‎10-25-2018

I'm late to the party here, but was wondering if you could help me. I'm trying to use the functions OP posted above. Unfortunately, I cannot figure out how to call them in my query.

I have a list containing my "groupBy" columns

I have a list containing my "columns" I'd like summed

I just cant figure out the "aggregateFunction" argument.

Do you know how to actually use these functions in a query to List.Sum the dynamic columns? If I can impliment these functions, it would save me a lot of steps and an expensive unpivot.

ImkeF · ‎10-25-2018

Sounds to me that what you @jfclark27 are asking for is a bit different. Does this code do the job?:

let
    GroupColumns = {"Group"},
    SumColumn = {"Col1", "Col2"},
    AggregationFunctions = List.Transform(SumColumn, each {_, (x)=> List.Sum(Table.Column(x,_)), type number}),
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUTIEYmOlWB0IzwiITcA8JyDLHIhN4TwLIDZTio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Group = _t, Col1 = _t, Col2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Group", type text}, {"Col1", Int64.Type}, {"Col2", Int64.Type}}),
    DynamicAggregation = Table.Group(#"Changed Type", GroupColumns, AggregationFunctions)
in
    DynamicAggregation

The tricky part is how to create the AggregationFunction:

List.Transform(SumColumn, each {_, (x)=> List.Sum(Table.Column(x,_)), type number})

There you have to work with different environments: The _ represents each element from your list with columns to be aggregated ("SumColumn") and will actually be "used" in the step "AggregationFunction", while the "x" represents the table that will be passed into the function once it is called in step "DynamicAggregation".

If you are interested to learn more about Power Query's environment-concept, I recommend this article-series: http://ssbi-blog.de/technical-topics-english/the-environment-concept-in-m-for-power-query-and-power-...

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

jfclark27 · ‎10-26-2018

You are so awesome. I adapted your code to my Query, and it works perfect!!!

I do have a couple columns that will need a MAX aggrigation, but I'm pretty sure I can do another List.Transform, then List.Combine then run the new list through Table.Group.

I look forward to a new pot of coffee and the rabbit hole you have showed me. Time to see how deep it goes

ImkeF · ‎10-26-2018

Looks like you've got it 🙂

Does this solution actually runs significantly faster than the unpivot -> pivot-back-alternative?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

jfclark27 · ‎10-27-2018

Yes, it does run faster. I'm using Power Query from Excel 365. The Excel workbook is a template that I will eventually distribute to my estimators. My estimators will use the template to prepare bid proposals for commercial construction projects. The original template was made over 20 years ago. It is not dynamic, and has been abused over the years by estimators trying to add functionality to it. So, I've tasked myself with upgrading to the 21st century. (I still believe I shouldn't be using Excel at all)

Speed is very much an issue. On bid day, we will run multiple scenerios, adjusting labor costs, crew size, playing with profit and overhead, etc. Pivoting, merging, and expanding does slow down refreshes. The fact table is 50 - 60 columns, and averages +/- 500 rows and 3 dimension tables. Not a big fact table, but I need refreshes to be as close to a 1/10th second as possible, or estimators will refuse to use it.

The nice thing about your solution is the larger the fact table, the more benifit I get from grouping first. On the tests I've ran so far, whether I have 700 rows or 50, I'm able to group down to around 25-30 rows before I have to start shaping.

The main structure of my new template is complete. Now, I'm performance tuning. Unfortunately, I'm new to PQ, so this has been a very slow process.

ImkeF · ‎10-28-2018

Thanks for confirming @jfclark27!

Performance tuning can be a pain. I've collected some tipps for it here: https://www.thebiccountant.com/speedperformance-aspects/

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

jPinhao · ‎11-03-2016

Thanks for the replies! I actually managed to solve this on my own in the end 🙂

@ImkeF - your solution is interesting, I'm guessing FillUp will find the bottom-most non-null value and fill any rows above with it?

My solution - wrote a function that will find and return the first non-null value in a list (or a default if all null), and use that as the aggregator. I then build a list from the original list of column names that will run the operation on each named column:

//FirstNotNull
let
    Source = (sourceList as list) =>
        let
            firstNotNull = List.First(List.RemoveNulls(sourceList), "Not Applicable")
        in
            firstNotNull
in
    Source

//DynamicTableGroupColumns
let
    Source = (sourceTable as table, columns as list, aggregateFunction as function) =>
        let
            result = List.Transform(columns, each 
                                            // build lists with {columnName, aggregateFunction}
                                            let
                                                //save current _ (column name) to use in next each statement
                                                columnName = _,
                                                columnToFunctionList =  {columnName, each 
                                                                            //_ will be the grouping table as it's called by Table.Group
                                                                            aggregateFunction(Table.Column(_, columnName))}
                                            in
                                                columnToFunctionList)
        in
            result
in
    Source

//DynamicTableGroup
let
    Source = (sourceTable as table, groupBy as list, columns as list, aggregateFunction as function) =>
        let 
            result = Table.Group(sourceTable , groupBy, DynamicTableGroupColumns(sourceTable, columns, aggregateFunction))
        in
            result
in
    Source

Any comments on one method being better than the other? Will your method of FillUp into a single column and then expanding the relevant fields be more performant that preparing a list of lists to feed to Table.Group?

EDIT: @ImkeF just timed the 2 queries, and filling up into one column and then expanding seemed to take 2min20s, while my approach took 58s! Yesterday I had also timed doing an unpivot/pivot over all columns, and that was taking about 1min45s. I'm not sure how the unpivot/pivot scales with more columns and rows, but I'd assume our 2 methods would scale similarly.

Feel free to use the set of functions I put up in case you find use for them to speed up any queries! Or let me know if don't see similar results 🙂

ImkeF · ‎11-03-2016

Hi @jPinhao, that's pretty cool!

Wasn't aware that FillUp is even slower than pivoting 🙂

You can further play around with List or Table.Buffer to see if this speeds it up even more.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

jPinhao · ‎11-03-2016

Yea, I thought it was curious too 🙂 But again, I'm not sure if that approach would scale better than pivoting, it might be a case of one approach being better in particular scenarios.

I do intend to play with Buffering at some point to see where it can help improve performance. Do you know of any general rules where using Buffer will help?

ImkeF · ‎11-03-2016

No, most often it is trial & error.

Only for List.Generate I will always use it for the input-tables or -lists to the function

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

JasonG · ‎05-19-2017

Maybe it's too early in the morning that I tackled this, but I can't yet visualize how to flatten this data.

What I start with is as follows:

Column10	Column3	Column14
18/05/2017	As75	489.044189453125
18/05/2017	As75	529.010314941406
18/05/2017	As75	40914.140625
18/05/2017	As75	3145.38793945313
18/05/2017	As75	43844.9296875
18/05/2017	Ca44	2365.14038085938
18/05/2017	Ca44	20024.193359375
18/05/2017	Ca44	6499.56396484375
18/05/2017	Ca44	49550.2265625
18/05/2017	Ca44	125394.3984375
18/05/2017	Cu65	818.863037109375
18/05/2017	Cu65	120588.8828125
18/05/2017	Cu65	5401.640625
18/05/2017	Cu65	1566.38427734375
18/05/2017	Cu65	119667.9921875

What I would like to do is turn it into

Date	As75	Ca44	Cu65
18/05/2017	489.044189453125	2365.14038085938	818.863037109375
18/05/2017	529.010314941406	20024.193359375	120588.8828125
18/05/2017	40914.140625	6499.56396484375	5401.640625
18/05/2017	3145.38793945313	49550.2265625	1566.38427734375
18/05/2017	43844.9296875	125394.3984375	119667.9921875

Any ideas?

ImkeF · ‎05-19-2017

Thats actually not trivial if you want to make it dynamic. Check out this code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fZE7DsNQCATv4hopfBYelFGOYbnIGaLcP9hJOj9XNKNhYdd1kbyx35RlLLTcX8N7IIsYkCy4ifqy0Tno2qCwCQoCjikILgHtyIWtNU6Wo+zYa3ObJUClFTlOdI8n0EMtfF9pyellOQeZtcOVWWMXwkAVeVgFElcgyp1JNfz03B/Vj+1DyWome8d+bEpShrEN4Um8LyjKns2m5mlnP8zBQjFp4q/yiG4COoZdhhOpiEFVKkcV2wc=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column10 = _t, Column3 = _t, Column14 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column10", type date}, {"Column3", type text}, {"Column14", type number}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Added to Column" = Table.TransformColumns(#"Added Index", {{"Index", each Number.Mod(_, List.Count(#"Added Index"[Column3])/List.Count(List.Distinct(#"Added Index"[Column3]))), type number}}),
    #"Pivoted Column" = Table.Pivot(#"Added to Column", List.Distinct(#"Added to Column"[Column3]), "Column3", "Column14"),
    #"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in
    #"Removed Columns"

If you don't know how to apply this code, check out this video: https://www.youtube.com/watch?v=S9xlq5KUZ60

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

ImkeF · ‎05-20-2017

There is a more elegant version for it, which should also perform faster:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fZExDsMwCEXv4hlFgAHDWPUYUYaeoer96ypul0Anhv/gf2DfG/mGujHSaNBuz6GziAegtANyWXnKVMqCQQJU93cShR71gO4iEP0K3B8is3A3TQ2+OiLPBInDAkwiQK3UJVQRuE5ArD0EBBPiZZ8lnBw8czhlYlSfhJeEChJYtuSaoGbQ636iMBsQ6wrxJwN7yawUaKwJs45xJhEe48qsn/7SMLfjeAM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column10 = _t, Column3 = _t, Column14 = _t]),
    Group = Table.Group(Source, {"Column10", "Column3"}, {{"All", each _[Column14], type table}}),
    ToColumns = Table.Group(Group, {"Column10"}, {{"All", each Table.FromColumns(_[All], _[Column3]), type table}}),
    Expand = Table.ExpandTableColumn(ToColumns, "All", List.Distinct(Group[Column3]))
in
    Expand

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

JasonG · ‎05-19-2017

Maybe it's too early in the morning that I tackled this, but I can't yet visualize how to flatten this data.

What I start with is as follows:

Column10	Column3	Column14
18/05/2017	As75	489.044189453125
18/05/2017	As75	529.010314941406
18/05/2017	As75	40914.140625
18/05/2017	As75	3145.38793945313
18/05/2017	As75	43844.9296875
18/05/2017	Ca44	2365.14038085938
18/05/2017	Ca44	20024.193359375
18/05/2017	Ca44	6499.56396484375
18/05/2017	Ca44	49550.2265625
18/05/2017	Ca44	125394.3984375
18/05/2017	Cu65	818.863037109375
18/05/2017	Cu65	120588.8828125
18/05/2017	Cu65	5401.640625
18/05/2017	Cu65	1566.38427734375
18/05/2017	Cu65	119667.9921875

What I would like to do is turn it into

Date	As75	Ca44	Cu65
18/05/2017	489.044189453125	2365.14038085938	818.863037109375
18/05/2017	529.010314941406	20024.193359375	120588.8828125
18/05/2017	40914.140625	6499.56396484375	5401.640625
18/05/2017	3145.38793945313	49550.2265625	1566.38427734375
18/05/2017	43844.9296875	125394.3984375	119667.9921875

Any ideas?

v-haibl-msft · ‎11-02-2016

@jPinhao

Please try with following Power Query in Advanced Editor.

let
    Source = Excel.Workbook(File.Contents("C:\11032016\Flattening multiple related rows in Power Query.xlsx"), null, true),
    Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Table1_Table,{{"id", Int64.Type}, {"key1", type text}, {"key2", type text}, {"key3", type text}, {"key4", type text}}),
    #"Combine" = Combiner.CombineTextByDelimiter(""),
    #"GroupRows" = Table.Group( #"Changed Type", {"id"}, {{"key1", each Combine([key1]), type text}, {"key2", each Combine([key2]), type text}, {"key3", each Combine([key3]), type text}, {"key4", each Combine([key4]), type text}} )
in
    #"GroupRows"

Best Regards,

Herbert

Flattening multiple related rows in Power Query

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly