Solved: Flattening multiple related rows in Power Query - Page 2

jPinhao · ‎11-02-2016

We are loading some JSON data, which holds some arrays with varying property-value pairs. Due to the JSON structure, once you start expanding the fields you end up with something along these lines:

id | key1 | key2 | key3 | key4
0  | a    | null | null | null 
0  | null | b    | null | null
0  | null | null | c    | null
1  | a    | null | null | null
1  | null | b    | null | null
1  | null | null | c    | null
1  | null | null | null | d

We'd like to flatten these rows by ID to end up with something like:

id | key1 | key2 | key3 | key4
0  | a    | b    | c    | null 
1  | a    | b    | c    | d

The simplest way I found to solve this is by unpivotting the key columns, and then pivotting it back. However in some cases, we actually have quite a few more columns, and thousands of rows, in a quite sparse table. From previous trials I found that unpivot/pivot can be a big performance hit.

Is there a better way to solve this issue? Either in Power Query or with an R script?

EDIT: I beleive I found a way to do this with Table.Group:

Table.Group(sourceTable, {"id"}, {{"key1", each List.RemoveNulls([key1]){0},  type none}, {"key2", ...},  ...})

I'd like to do this across all columns (except ID), or from a List of column names - how would I go about making Table.Group work from a list of values dynamically?

jPinhao · ‎11-03-2016

Thanks for the replies! I actually managed to solve this on my own in the end 🙂

@ImkeF - your solution is interesting, I'm guessing FillUp will find the bottom-most non-null value and fill any rows above with it?

My solution - wrote a function that will find and return the first non-null value in a list (or a default if all null), and use that as the aggregator. I then build a list from the original list of column names that will run the operation on each named column:

//FirstNotNull
let
    Source = (sourceList as list) =>
        let
            firstNotNull = List.First(List.RemoveNulls(sourceList), "Not Applicable")
        in
            firstNotNull
in
    Source

//DynamicTableGroupColumns
let
    Source = (sourceTable as table, columns as list, aggregateFunction as function) =>
        let
            result = List.Transform(columns, each 
                                            // build lists with {columnName, aggregateFunction}
                                            let
                                                //save current _ (column name) to use in next each statement
                                                columnName = _,
                                                columnToFunctionList =  {columnName, each 
                                                                            //_ will be the grouping table as it's called by Table.Group
                                                                            aggregateFunction(Table.Column(_, columnName))}
                                            in
                                                columnToFunctionList)
        in
            result
in
    Source

//DynamicTableGroup
let
    Source = (sourceTable as table, groupBy as list, columns as list, aggregateFunction as function) =>
        let 
            result = Table.Group(sourceTable , groupBy, DynamicTableGroupColumns(sourceTable, columns, aggregateFunction))
        in
            result
in
    Source

Any comments on one method being better than the other? Will your method of FillUp into a single column and then expanding the relevant fields be more performant that preparing a list of lists to feed to Table.Group?

EDIT: @ImkeF just timed the 2 queries, and filling up into one column and then expanding seemed to take 2min20s, while my approach took 58s! Yesterday I had also timed doing an unpivot/pivot over all columns, and that was taking about 1min45s. I'm not sure how the unpivot/pivot scales with more columns and rows, but I'd assume our 2 methods would scale similarly.

Feel free to use the set of functions I put up in case you find use for them to speed up any queries! Or let me know if don't see similar results 🙂

View solution in original post

Flattening multiple related rows in Power Query

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly