Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
jPinhao
Helper II
Helper II

Flattening multiple related rows in Power Query

We are loading some JSON data, which holds some arrays with varying property-value pairs. Due to the JSON structure, once you start expanding the fields you end up with something along these lines:

 

id | key1 | key2 | key3 | key4
0  | a    | null | null | null 
0  | null | b    | null | null
0  | null | null | c    | null
1  | a    | null | null | null
1  | null | b    | null | null
1  | null | null | c    | null
1  | null | null | null | d

We'd like to flatten these rows by ID to end up with something like:

id | key1 | key2 | key3 | key4
0  | a    | b    | c    | null 
1  | a    | b    | c    | d

The simplest way I found to solve this is by unpivotting the key columns, and then pivotting it back. However in some cases, we actually have quite a few more columns, and thousands of rows, in a quite sparse table. From previous trials I found that unpivot/pivot can be a big performance hit.

 

Is there a better way to solve this issue? Either in Power Query or with an R script?

 

 

EDIT: I beleive I found a way to do this with Table.Group:

Table.Group(sourceTable, {"id"}, {{"key1", each List.RemoveNulls([key1]){0},  type none}, {"key2", ...},  ...})

I'd like to do this across all columns (except ID), or from a List of column names - how would I go about making Table.Group work from a list of values dynamically?

1 ACCEPTED SOLUTION

Thanks for the replies! I actually managed to solve this on my own in the end 🙂

 

@ImkeF - your solution is interesting, I'm guessing FillUp will find the bottom-most non-null value and fill any rows above with it? 

 

My solution - wrote a function that will find and return the first non-null value in a list (or a default if all null), and use that as the aggregator. I then build a list from the original list of column names that will run the operation on each named column:

 

//FirstNotNull
let
    Source = (sourceList as list) =>
        let
            firstNotNull = List.First(List.RemoveNulls(sourceList), "Not Applicable")
        in
            firstNotNull
in
    Source

//DynamicTableGroupColumns
let
    Source = (sourceTable as table, columns as list, aggregateFunction as function) =>
        let
            result = List.Transform(columns, each 
                                            // build lists with {columnName, aggregateFunction}
                                            let
                                                //save current _ (column name) to use in next each statement
                                                columnName = _,
                                                columnToFunctionList =  {columnName, each 
                                                                            //_ will be the grouping table as it's called by Table.Group
                                                                            aggregateFunction(Table.Column(_, columnName))}
                                            in
                                                columnToFunctionList)
        in
            result
in
    Source

//DynamicTableGroup
let
    Source = (sourceTable as table, groupBy as list, columns as list, aggregateFunction as function) =>
        let 
            result = Table.Group(sourceTable , groupBy, DynamicTableGroupColumns(sourceTable, columns, aggregateFunction))
        in
            result
in
    Source

 

Any comments on one method being better than the other? Will your method of FillUp into a single column and then expanding the relevant fields be more performant that preparing a list of lists to feed to Table.Group?

 

EDIT: @ImkeF just timed the 2 queries, and filling up into one column and then expanding seemed to take 2min20s, while my approach took 58s! Yesterday I had also timed doing an unpivot/pivot over all columns, and that was taking about 1min45s. I'm not sure how the unpivot/pivot scales with more columns and rows, but I'd assume our 2 methods would scale similarly. 

 

Feel free to use the set of functions I put up in case you find use for them to speed up any queries! Or let me know if don't see similar results 🙂

View solution in original post

16 REPLIES 16

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.