Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
We are loading some JSON data, which holds some arrays with varying property-value pairs. Due to the JSON structure, once you start expanding the fields you end up with something along these lines:
id | key1 | key2 | key3 | key4 0 | a | null | null | null 0 | null | b | null | null 0 | null | null | c | null 1 | a | null | null | null 1 | null | b | null | null 1 | null | null | c | null 1 | null | null | null | d
We'd like to flatten these rows by ID to end up with something like:
id | key1 | key2 | key3 | key4 0 | a | b | c | null 1 | a | b | c | d
The simplest way I found to solve this is by unpivotting the key columns, and then pivotting it back. However in some cases, we actually have quite a few more columns, and thousands of rows, in a quite sparse table. From previous trials I found that unpivot/pivot can be a big performance hit.
Is there a better way to solve this issue? Either in Power Query or with an R script?
EDIT: I beleive I found a way to do this with Table.Group:
Table.Group(sourceTable, {"id"}, {{"key1", each List.RemoveNulls([key1]){0}, type none}, {"key2", ...}, ...})
I'd like to do this across all columns (except ID), or from a List of column names - how would I go about making Table.Group work from a list of values dynamically?
Solved! Go to Solution.
Thanks for the replies! I actually managed to solve this on my own in the end 🙂
@ImkeF - your solution is interesting, I'm guessing FillUp will find the bottom-most non-null value and fill any rows above with it?
My solution - wrote a function that will find and return the first non-null value in a list (or a default if all null), and use that as the aggregator. I then build a list from the original list of column names that will run the operation on each named column:
//FirstNotNull let Source = (sourceList as list) => let firstNotNull = List.First(List.RemoveNulls(sourceList), "Not Applicable") in firstNotNull in Source //DynamicTableGroupColumns let Source = (sourceTable as table, columns as list, aggregateFunction as function) => let result = List.Transform(columns, each // build lists with {columnName, aggregateFunction} let //save current _ (column name) to use in next each statement columnName = _, columnToFunctionList = {columnName, each //_ will be the grouping table as it's called by Table.Group aggregateFunction(Table.Column(_, columnName))} in columnToFunctionList) in result in Source //DynamicTableGroup let Source = (sourceTable as table, groupBy as list, columns as list, aggregateFunction as function) => let result = Table.Group(sourceTable , groupBy, DynamicTableGroupColumns(sourceTable, columns, aggregateFunction)) in result in Source
Any comments on one method being better than the other? Will your method of FillUp into a single column and then expanding the relevant fields be more performant that preparing a list of lists to feed to Table.Group?
EDIT: @ImkeF just timed the 2 queries, and filling up into one column and then expanding seemed to take 2min20s, while my approach took 58s! Yesterday I had also timed doing an unpivot/pivot over all columns, and that was taking about 1min45s. I'm not sure how the unpivot/pivot scales with more columns and rows, but I'd assume our 2 methods would scale similarly.
Feel free to use the set of functions I put up in case you find use for them to speed up any queries! Or let me know if don't see similar results 🙂
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
111 | |
94 | |
80 | |
67 | |
59 |
User | Count |
---|---|
150 | |
119 | |
104 | |
87 | |
67 |