Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have a bulky (55mb) excel sheet of repeating attribute readings. There are 800k rows & about 25 columns with the repeating format of ID / Datetime / Value. For example:
| ID | Datetime | Value | (blank) | ID_1 | Datetime_1 | Value_1 | (blank) ....
| 47 | 17/3 | 17.45 | (blank) | 455 | 18/4 | 2 | (blank) ....
| 47 | 21/3 | 12 | (blank) | 455 | 12/4 | 21 | (blank) ....
I would like to combine these into a single power query table of 3 columns and possibly a couple million rows
| ID | Datetime | Value |
Solved! Go to Solution.
Hi @pistachio
sorry, just read your second comment previously. you can try the following technique, less "intelligent" action required:
let
Source = Web.Page(Web.Contents("https://community.powerbi.com/t5/Desktop/Unpivot-Append-Repeating-column-formats/m-p/964070/highlight/false#M462025")),
Data0 = Source{0}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Data0, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", Int64.Type}, {"Datetime", type datetime}, {"Value", Int64.Type}, {"", type text}, {"ID_1", Int64.Type}, {"Datetime_2", type datetime}, {"Value_3", type number}, {"_4", type text}, {"ID_5", Int64.Type}, {"Datetime_6", type datetime}, {"Value_7", type number}, {"_8", type text}, {"ID_9", Int64.Type}, {"Datetime_10", type datetime}, {"Value_11", type number}}),
Custom1 = Table.ToColumns(#"Changed Type"),
#"Converted to Table" = Table.FromList(Custom1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Added Index" = Table.AddIndexColumn(#"Converted to Table", "Index", 0, 1),
#"Integer-Divided Column" = Table.TransformColumns(#"Added Index", {{"Index", each Number.IntegerDivide(_, 4), Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Integer-Divided Column", {"Index"}, {{"Partition", each Table.FromColumns(_[Column1]), type table [Column1=list, Index=number]}}, GroupKind.Local),
Custom2 = Table.Combine(#"Grouped Rows"[Partition])
in
Custom2
For performance it is crucial to use the "GroupKind.Local" in step "Grouped Rows"
Please let me know about the performance difference to the first Pivot-solution, thanks.
Please not that for this solution it is crucial that you always have the same number of columns per repetition!!
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Refer, if these couple of link can help
https://www.powerquery.training/grouping-or-summarizing-data/
https://www.poweredsolutions.co/2019/07/30/grouping-rows-with-power-bi-power-query/
Hi @pistachio
the following steps should do the work:
1) Add an Index column to the table
2) Check ID-column & new Index-column -> Transform (or righ-mouse-click) -> Unpivot other columns
3) Split "Attribute"-column by underscore "_"
4) Pivot "back" on the first splitted column ("Attribute.1") with "Values.1" as the "Value"-field and select "Don't aggregate" in the advanced options
Paste the following code into the advanced editor and you can follow the steps:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjFX0lEyNNc3MAbTOiamQBrEtNA3AVJGSrE6UEVGhvpgNUZQBUb6RmBRpdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Datetime = _t, Value = _t, Column1 = _t, Datetime_1 = _t, Value_1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Datetime", type date}, {"Value", Int64.Type}, {"Column1", type text}, {"Datetime_1", type date}, {"Value_1", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"ID", "Index"}, "Attribute", "Value.1"),
#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByDelimiter("_", QuoteStyle.Csv), {"Attribute.1", "Attribute.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Attribute.1", type text}, {"Attribute.2", Int64.Type}}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Changed Type1", {{"Attribute.1", type text}}, "en-GB"), List.Distinct(Table.TransformColumnTypes(#"Changed Type1", {{"Attribute.1", type text}}, "en-GB")[Attribute.1]), "Attribute.1", "Value.1"),
#"Removed Other Columns" = Table.SelectColumns(#"Pivoted Column",{"ID", "Value", "Datetime"})
in
#"Removed Other Columns"
Next time if you paste sample data, please use HTML-table like described here: https://community.powerbi.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-Forum/ba-...
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi Imke,
Thanks very much for your detailed response. I get confused at the unpivoting stage but I think this is the right track. When I pasted your code steps into power query they show the base data as only having one ID column. I've taken your advice and pasted more example data as HTML.
This is straight from the Excel, I assume Power Query puts the _2, _3 suffixes etc when I use first row as headers. In reality there are many more column groups
ID | Datetime | Value | ID | Datetime | Value | ID | Datetime | Value | ID | Datetime | Value | |||
407 | 28/01/2020 14:36:27 | 0 | 7 | 28/01/2020 14:36:40 | 2.83 | 68 | 28/01/2020 14:36:46 | 0.26 | 57 | 28/01/2020 14:36:38 | 0.72 | |||
407 | 28/01/2020 14:37:27 | 0 | 7 | 28/01/2020 14:37:40 | 2.82 | 68 | 28/01/2020 14:37:46 | 0.25 | 57 | 28/01/2020 14:37:38 | 0.72 | |||
407 | 28/01/2020 14:38:27 | 0 | 7 | 28/01/2020 14:38:40 | 2.83 | 68 | 28/01/2020 14:38:46 | 0.26 | 57 | 28/01/2020 14:37:56 | 0.66 | |||
407 | 28/01/2020 14:39:27 | 0 | 7 | 28/01/2020 14:39:40 | 2.77 | 68 | 28/01/2020 14:39:46 | 0.25 | 57 | 28/01/2020 14:38:58 | 0.68 | |||
407 | 28/01/2020 14:40:27 | 0 | 7 | 28/01/2020 14:40:18 | 2.81 | 68 | 28/01/2020 14:40:46 | 0.26 | 57 | 28/01/2020 14:39:58 | 0.7 |
Can I convert this to a single 3 column table?
So, there are probably easier ways to do this and @ImkeF probably has the solution. But, worst case, create a query from the Excel file and remove all but the first 3 columns. Create a second query, choose the next 3 columns (ID_1, Datetime_1, Value_1) and remove all the other columns. Rename the columns to ID, Datetime, Value. Rinse and repeat the same steps for the second query for a third, fourth, fifth query, etc. Then, use an Append query to append them all together.
Yep cheers I've done this but each query seems like it has to reload the 55mb workbook so it takes a long time to get all together. Wondering if there was a better way
Hi @pistachio
sorry, just read your second comment previously. you can try the following technique, less "intelligent" action required:
let
Source = Web.Page(Web.Contents("https://community.powerbi.com/t5/Desktop/Unpivot-Append-Repeating-column-formats/m-p/964070/highlight/false#M462025")),
Data0 = Source{0}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Data0, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", Int64.Type}, {"Datetime", type datetime}, {"Value", Int64.Type}, {"", type text}, {"ID_1", Int64.Type}, {"Datetime_2", type datetime}, {"Value_3", type number}, {"_4", type text}, {"ID_5", Int64.Type}, {"Datetime_6", type datetime}, {"Value_7", type number}, {"_8", type text}, {"ID_9", Int64.Type}, {"Datetime_10", type datetime}, {"Value_11", type number}}),
Custom1 = Table.ToColumns(#"Changed Type"),
#"Converted to Table" = Table.FromList(Custom1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Added Index" = Table.AddIndexColumn(#"Converted to Table", "Index", 0, 1),
#"Integer-Divided Column" = Table.TransformColumns(#"Added Index", {{"Index", each Number.IntegerDivide(_, 4), Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Integer-Divided Column", {"Index"}, {{"Partition", each Table.FromColumns(_[Column1]), type table [Column1=list, Index=number]}}, GroupKind.Local),
Custom2 = Table.Combine(#"Grouped Rows"[Partition])
in
Custom2
For performance it is crucial to use the "GroupKind.Local" in step "Grouped Rows"
Please let me know about the performance difference to the first Pivot-solution, thanks.
Please not that for this solution it is crucial that you always have the same number of columns per repetition!!
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
That works great! I struggle to understand the steps but was able to replicate it across all of my data thank you.
The performance was about twice as fast as the original method (make individual query for each row grouping, append into one query). Interestingly, the size of data pulled in is still much larger than the Excel sheet i.e. the excel sheet is 55mb, but when refreshing the .pbix file the progress bar loads upwards of 200mb from that query.
Still, very pleased thank you
This is really great solution!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
95 | |
76 | |
65 | |
51 |
User | Count |
---|---|
146 | |
109 | |
106 | |
88 | |
61 |