Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
jPinhao
Helper II
Helper II

Flattening multiple related rows in Power Query

We are loading some JSON data, which holds some arrays with varying property-value pairs. Due to the JSON structure, once you start expanding the fields you end up with something along these lines:

 

id | key1 | key2 | key3 | key4
0  | a    | null | null | null 
0  | null | b    | null | null
0  | null | null | c    | null
1  | a    | null | null | null
1  | null | b    | null | null
1  | null | null | c    | null
1  | null | null | null | d

We'd like to flatten these rows by ID to end up with something like:

id | key1 | key2 | key3 | key4
0  | a    | b    | c    | null 
1  | a    | b    | c    | d

The simplest way I found to solve this is by unpivotting the key columns, and then pivotting it back. However in some cases, we actually have quite a few more columns, and thousands of rows, in a quite sparse table. From previous trials I found that unpivot/pivot can be a big performance hit.

 

Is there a better way to solve this issue? Either in Power Query or with an R script?

 

 

EDIT: I beleive I found a way to do this with Table.Group:

Table.Group(sourceTable, {"id"}, {{"key1", each List.RemoveNulls([key1]){0},  type none}, {"key2", ...},  ...})

I'd like to do this across all columns (except ID), or from a List of column names - how would I go about making Table.Group work from a list of values dynamically?

1 ACCEPTED SOLUTION

Thanks for the replies! I actually managed to solve this on my own in the end 🙂

 

@ImkeF - your solution is interesting, I'm guessing FillUp will find the bottom-most non-null value and fill any rows above with it? 

 

My solution - wrote a function that will find and return the first non-null value in a list (or a default if all null), and use that as the aggregator. I then build a list from the original list of column names that will run the operation on each named column:

 

//FirstNotNull
let
    Source = (sourceList as list) =>
        let
            firstNotNull = List.First(List.RemoveNulls(sourceList), "Not Applicable")
        in
            firstNotNull
in
    Source

//DynamicTableGroupColumns
let
    Source = (sourceTable as table, columns as list, aggregateFunction as function) =>
        let
            result = List.Transform(columns, each 
                                            // build lists with {columnName, aggregateFunction}
                                            let
                                                //save current _ (column name) to use in next each statement
                                                columnName = _,
                                                columnToFunctionList =  {columnName, each 
                                                                            //_ will be the grouping table as it's called by Table.Group
                                                                            aggregateFunction(Table.Column(_, columnName))}
                                            in
                                                columnToFunctionList)
        in
            result
in
    Source

//DynamicTableGroup
let
    Source = (sourceTable as table, groupBy as list, columns as list, aggregateFunction as function) =>
        let 
            result = Table.Group(sourceTable , groupBy, DynamicTableGroupColumns(sourceTable, columns, aggregateFunction))
        in
            result
in
    Source

 

Any comments on one method being better than the other? Will your method of FillUp into a single column and then expanding the relevant fields be more performant that preparing a list of lists to feed to Table.Group?

 

EDIT: @ImkeF just timed the 2 queries, and filling up into one column and then expanding seemed to take 2min20s, while my approach took 58s! Yesterday I had also timed doing an unpivot/pivot over all columns, and that was taking about 1min45s. I'm not sure how the unpivot/pivot scales with more columns and rows, but I'd assume our 2 methods would scale similarly. 

 

Feel free to use the set of functions I put up in case you find use for them to speed up any queries! Or let me know if don't see similar results 🙂

View solution in original post

16 REPLIES 16
ImkeF
Super User
Super User

A possible solution with dynamic column headers is this:

 

let
    Source = YourTable,
    #"Grouped Rows" = Table.Group(Source, {"id"}, {{"FillUp", each Table.FirstN(Table.FillUp(_,Table.ColumnNames(_)), 1), type table}}),
    #"Expanded FillUp" = Table.ExpandTableColumn(#"Grouped Rows", "FillUp", List.Skip(Table.ColumnNames(Source),1), List.Skip(Table.ColumnNames(Source),1))
in
    #"Expanded FillUp"

 

It pushes all items to the first row and then just keeps that.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

I'm late to the party here, but was wondering if you could help me. I'm trying to use the functions OP posted above. Unfortunately, I cannot figure out how to call them in my query.

 

I have a list containing my "groupBy" columns

I have a list containing my "columns" I'd like summed

 

I just cant figure out the "aggregateFunction" argument.

 

Do you know how to actually use these functions in a query to List.Sum the dynamic columns? If I can impliment these functions, it would save me a lot of steps and an expensive unpivot.

Sounds to me that what you @jfclark27 are asking for is a bit different. Does this code do the job?:

 

let
    GroupColumns = {"Group"},
    SumColumn = {"Col1", "Col2"},
    AggregationFunctions = List.Transform(SumColumn, each {_, (x)=> List.Sum(Table.Column(x,_)), type number}),
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUTIEYmOlWB0IzwiITcA8JyDLHIhN4TwLIDZTio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Group = _t, Col1 = _t, Col2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Group", type text}, {"Col1", Int64.Type}, {"Col2", Int64.Type}}),
    DynamicAggregation = Table.Group(#"Changed Type", GroupColumns, AggregationFunctions)
in
    DynamicAggregation

The tricky part is how to create the AggregationFunction: 

 

List.Transform(SumColumn, each {_, (x)=> List.Sum(Table.Column(x,_)), type number}) 

 

There you have to work with different environments: The _ represents each element from your list with columns to be aggregated ("SumColumn") and will actually be "used" in the step "AggregationFunction", while the "x" represents the table that will be passed into the function once it is called in step "DynamicAggregation".

 

If you are interested to learn more about Power Query's environment-concept, I recommend this article-series: http://ssbi-blog.de/technical-topics-english/the-environment-concept-in-m-for-power-query-and-power-...

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

You are so awesome. I adapted your code to my Query, and it works perfect!!!

 

I do have a couple columns that will need a MAX aggrigation, but I'm pretty sure I can do another List.Transform, then List.Combine then run the new list through Table.Group.

 

I look forward to a new pot of coffee and the rabbit hole you have showed me. Time to see how deep it goes Smiley Very Happy

Looks like you've got it 🙂

 

Does this solution actually runs significantly faster than the unpivot -> pivot-back-alternative?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Yes, it does run faster. I'm using Power Query from Excel 365. The Excel workbook is a template that I will eventually distribute to my estimators. My estimators will use the template to prepare bid proposals for commercial construction projects. The original template was made over 20 years ago. It is not dynamic, and has been abused over the years by estimators trying to add functionality to it. So, I've tasked myself with upgrading to the 21st century. (I still believe I shouldn't be using Excel at all)

 

Speed is very much an issue. On bid day, we will run multiple scenerios, adjusting labor costs, crew size, playing with profit and overhead, etc. Pivoting, merging, and expanding does slow down refreshes. The fact table is 50 - 60 columns, and averages +/- 500 rows and 3 dimension tables. Not a big fact table, but I need refreshes to be as close to a 1/10th second as possible, or estimators will refuse to use it.

 

The nice thing about your solution is the larger the fact table, the more benifit I get from grouping first. On the tests I've ran so far, whether I have 700 rows or 50, I'm able to group down to around 25-30 rows before I have to start shaping.

 

The main structure of my new template is complete. Now, I'm performance tuning. Unfortunately, I'm new to PQ, so this has been a very slow process.

Thanks for confirming @jfclark27!

 

Performance tuning can be a pain. I've collected some tipps for it here: https://www.thebiccountant.com/speedperformance-aspects/ 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thanks for the replies! I actually managed to solve this on my own in the end 🙂

 

@ImkeF - your solution is interesting, I'm guessing FillUp will find the bottom-most non-null value and fill any rows above with it? 

 

My solution - wrote a function that will find and return the first non-null value in a list (or a default if all null), and use that as the aggregator. I then build a list from the original list of column names that will run the operation on each named column:

 

//FirstNotNull
let
    Source = (sourceList as list) =>
        let
            firstNotNull = List.First(List.RemoveNulls(sourceList), "Not Applicable")
        in
            firstNotNull
in
    Source

//DynamicTableGroupColumns
let
    Source = (sourceTable as table, columns as list, aggregateFunction as function) =>
        let
            result = List.Transform(columns, each 
                                            // build lists with {columnName, aggregateFunction}
                                            let
                                                //save current _ (column name) to use in next each statement
                                                columnName = _,
                                                columnToFunctionList =  {columnName, each 
                                                                            //_ will be the grouping table as it's called by Table.Group
                                                                            aggregateFunction(Table.Column(_, columnName))}
                                            in
                                                columnToFunctionList)
        in
            result
in
    Source

//DynamicTableGroup
let
    Source = (sourceTable as table, groupBy as list, columns as list, aggregateFunction as function) =>
        let 
            result = Table.Group(sourceTable , groupBy, DynamicTableGroupColumns(sourceTable, columns, aggregateFunction))
        in
            result
in
    Source

 

Any comments on one method being better than the other? Will your method of FillUp into a single column and then expanding the relevant fields be more performant that preparing a list of lists to feed to Table.Group?

 

EDIT: @ImkeF just timed the 2 queries, and filling up into one column and then expanding seemed to take 2min20s, while my approach took 58s! Yesterday I had also timed doing an unpivot/pivot over all columns, and that was taking about 1min45s. I'm not sure how the unpivot/pivot scales with more columns and rows, but I'd assume our 2 methods would scale similarly. 

 

Feel free to use the set of functions I put up in case you find use for them to speed up any queries! Or let me know if don't see similar results 🙂

Hi @jPinhao, that's pretty cool!

Wasn't aware that FillUp is even slower than pivoting 🙂

 

You can further play around with List or Table.Buffer to see if this speeds it up even more.

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Yea, I thought it was curious too 🙂 But again, I'm not sure if that approach would scale better than pivoting, it might be a case of one approach being better in particular scenarios.

 

I do intend to play with Buffering at some point to see where it can help improve performance. Do you know of any general rules where using Buffer will help?

No, most often it is trial & error.

Only for List.Generate I will always use it  for the input-tables or -lists to the function

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Maybe it's too early in the morning that I tackled this, but I can't yet visualize how to flatten this data. 

 

What I start with is as follows:

 

Column10Column3Column14
18/05/2017As75489.044189453125
18/05/2017As75529.010314941406
18/05/2017As7540914.140625
18/05/2017As753145.38793945313
18/05/2017As7543844.9296875
18/05/2017Ca442365.14038085938
18/05/2017Ca4420024.193359375
18/05/2017Ca446499.56396484375
18/05/2017Ca4449550.2265625
18/05/2017Ca44125394.3984375
18/05/2017Cu65818.863037109375
18/05/2017Cu65120588.8828125
18/05/2017Cu655401.640625
18/05/2017Cu651566.38427734375
18/05/2017Cu65

119667.9921875

 

What I would like to do is turn it into 

 

DateAs75Ca44Cu65
18/05/2017489.0441894531252365.14038085938818.863037109375
18/05/2017529.01031494140620024.193359375120588.8828125
18/05/201740914.1406256499.563964843755401.640625
18/05/20173145.3879394531349550.22656251566.38427734375
18/05/201743844.9296875125394.3984375119667.9921875

 

Any ideas?

Thats actually not trivial if you want to make it dynamic. Check out this code:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fZE7DsNQCATv4hopfBYelFGOYbnIGaLcP9hJOj9XNKNhYdd1kbyx35RlLLTcX8N7IIsYkCy4ifqy0Tno2qCwCQoCjikILgHtyIWtNU6Wo+zYa3ObJUClFTlOdI8n0EMtfF9pyellOQeZtcOVWWMXwkAVeVgFElcgyp1JNfz03B/Vj+1DyWome8d+bEpShrEN4Um8LyjKns2m5mlnP8zBQjFp4q/yiG4COoZdhhOpiEFVKkcV2wc=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column10 = _t, Column3 = _t, Column14 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column10", type date}, {"Column3", type text}, {"Column14", type number}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Added to Column" = Table.TransformColumns(#"Added Index", {{"Index", each Number.Mod(_, List.Count(#"Added Index"[Column3])/List.Count(List.Distinct(#"Added Index"[Column3]))), type number}}),
    #"Pivoted Column" = Table.Pivot(#"Added to Column", List.Distinct(#"Added to Column"[Column3]), "Column3", "Column14"),
    #"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in
    #"Removed Columns"

 

If you don't know how to apply this code, check out this video: https://www.youtube.com/watch?v=S9xlq5KUZ60

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

There is a more elegant version for it, which should also perform faster:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fZExDsMwCEXv4hlFgAHDWPUYUYaeoer96ypul0Anhv/gf2DfG/mGujHSaNBuz6GziAegtANyWXnKVMqCQQJU93cShR71gO4iEP0K3B8is3A3TQ2+OiLPBInDAkwiQK3UJVQRuE5ArD0EBBPiZZ8lnBw8czhlYlSfhJeEChJYtuSaoGbQ636iMBsQ6wrxJwN7yawUaKwJs45xJhEe48qsn/7SMLfjeAM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column10 = _t, Column3 = _t, Column14 = _t]),
    Group = Table.Group(Source, {"Column10", "Column3"}, {{"All", each _[Column14], type table}}),
    ToColumns = Table.Group(Group, {"Column10"}, {{"All", each Table.FromColumns(_[All], _[Column3]), type table}}),
    Expand = Table.ExpandTableColumn(ToColumns, "All", List.Distinct(Group[Column3]))
in
    Expand

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Maybe it's too early in the morning that I tackled this, but I can't yet visualize how to flatten this data. 

 

What I start with is as follows:

 

Column10Column3Column14
18/05/2017As75489.044189453125
18/05/2017As75529.010314941406
18/05/2017As7540914.140625
18/05/2017As753145.38793945313
18/05/2017As7543844.9296875
18/05/2017Ca442365.14038085938
18/05/2017Ca4420024.193359375
18/05/2017Ca446499.56396484375
18/05/2017Ca4449550.2265625
18/05/2017Ca44125394.3984375
18/05/2017Cu65818.863037109375
18/05/2017Cu65120588.8828125
18/05/2017Cu655401.640625
18/05/2017Cu651566.38427734375
18/05/2017Cu65

119667.9921875

 

What I would like to do is turn it into 

 

DateAs75Ca44Cu65
18/05/2017489.0441894531252365.14038085938818.863037109375
18/05/2017529.01031494140620024.193359375120588.8828125
18/05/201740914.1406256499.563964843755401.640625
18/05/20173145.3879394531349550.22656251566.38427734375
18/05/201743844.9296875125394.3984375119667.9921875

 

Any ideas?

v-haibl-msft
Employee
Employee

@jPinhao

 

Please try with following Power Query in Advanced Editor.

 

let
    Source = Excel.Workbook(File.Contents("C:\11032016\Flattening multiple related rows in Power Query.xlsx"), null, true),
    Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Table1_Table,{{"id", Int64.Type}, {"key1", type text}, {"key2", type text}, {"key3", type text}, {"key4", type text}}),
    #"Combine" = Combiner.CombineTextByDelimiter(""),
    #"GroupRows" = Table.Group( #"Changed Type", {"id"}, {{"key1", each Combine([key1]), type text}, {"key2", each Combine([key2]), type text}, {"key3", each Combine([key3]), type text}, {"key4", each Combine([key4]), type text}} )
in
    #"GroupRows"

Flattening multiple related rows in Power Query_1.jpg

 

Best Regards,

Herbert

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors