Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi,
I need to compare the Count values by Sector at every Time interval (e.g. row at Index 2 with row at Index 7) and output the result in Count.1 column.
- These Rows may be at different Index distances at every run (depending on sector's data received).
- The Sector values itself may vary.
- After the first 1-2 rows (always 0 Count value) the Time interval is constant every 10s.
I cannot play with custom Index columns given the requirements mentioned above, so I came up with the below solution (Count.1 values in the Table below are the correct output):
#"Added Count.1" = Table.AddColumn(#"Previous step", "Count.1", (a) =>
let
b = Table.SelectRows(#"Previous step", (x) => x[Sector] = a[Sector] and x[Time] = a[Time]+10)
in
b[#"Count"]{0}
)
But this is veeery slow, and I'm using test data (300 rows), when the report will be live the number of rows will grow exponentially. Any idea how to improve my code to a more efficient/fast way?
Sample data:
Index | Time | Sector | Count | Count.1 |
0 | 1593018439 | 200 | 0 | 0 |
1 | 1593018440 | 200 | 0 | 0 |
2 | 1593018450 | 4 | 198 | 208 |
3 | 1593018450 | 5 | 176 | 214 |
4 | 1593018450 | 6 | 176 | 180 |
5 | 1593018450 | 3 | 176 | 180 |
6 | 1593018450 | 1 | 178 | 216 |
7 | 1593018460 | 4 | 208 | 208 |
8 | 1593018460 | 5 | 214 | 220 |
9 | 1593018460 | 6 | 180 | 180 |
10 | 1593018460 | 3 | 180 | 180 |
11 | 1593018460 | 1 | 216 | 220 |
12 | 1593018470 | 1 | 220 | 220 |
13 | 1593018470 | 6 | 180 | 180 |
14 | 1593018470 | 3 | 180 | 180 |
15 | 1593018470 | 4 | 208 | 208 |
16 | 1593018470 | 5 | 220 | 220 |
17 | 1593018480 | 4 | 208 | 208 |
18 | 1593018480 | 5 | 220 | 220 |
19 | 1593018480 | 6 | 180 | 180 |
20 | 1593018480 | 3 | 180 | 180 |
21 | 1593018480 | 1 | 220 | 220 |
Thanks
Solved! Go to Solution.
try making a partition of the source table with the table.group function on the sector field.
Then apply a function to each sub-table that does the calculation you need.
PS
how many rows you table has?
how many sector do you have?
@Anonymous Power Query is not optimized for this type of analysis as you've discovered. @ImkeF has written an excellent article on this here, which essentially says to first group the data in some way first, so maybe 10,000 records becomes 500 groups. then within each group, do your lookup code, and now each lookup would be processing 200 rows at a time, not 10,000.
But it will eventually get too large.
This is one of those times where a Calculated Column in DAX may be more efficient. A measure is even better - MUCH BETTER - if you can do that. Or, do this at the source system. In SQL Server for example, if possible. But within the Tablular model of Power BI, either columns or as a measure, it is designed to rapidly scan and filter millions of rows for these operations.
In general, try to avoid calculated columns. There are times to use them, but it is rare. Getting data out of the source system, creating columns in Power Query, or DAX Measures are usually preferred to calculated columns. See these references:
Calculated Columns vs Measures in DAX
Calculated Columns and Measures in DAX
Storage differences between calculated columns and calculated tables
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI Reporting@Anonymous Power Query is not optimized for this type of analysis as you've discovered. @ImkeF has written an excellent article on this here, which essentially says to first group the data in some way first, so maybe 10,000 records becomes 500 groups. then within each group, do your lookup code, and now each lookup would be processing 200 rows at a time, not 10,000.
But it will eventually get too large.
This is one of those times where a Calculated Column in DAX may be more efficient. A measure is even better - MUCH BETTER - if you can do that. Or, do this at the source system. In SQL Server for example, if possible. But within the Tablular model of Power BI, either columns or as a measure, it is designed to rapidly scan and filter millions of rows for these operations.
In general, try to avoid calculated columns. There are times to use them, but it is rare. Getting data out of the source system, creating columns in Power Query, or DAX Measures are usually preferred to calculated columns. See these references:
Calculated Columns vs Measures in DAX
Calculated Columns and Measures in DAX
Storage differences between calculated columns and calculated tables
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingThanks @edhans , I will explore the Table.Group as mentioned in the article. Otherwise I will look into some DAX options as well
Great @Anonymous - post back in the DAX area if you decide to go with a DAX solution and need help.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI Reporting@ImkeF or @edhans or another Power Query guru can probably assist.
try making a partition of the source table with the table.group function on the sector field.
Then apply a function to each sub-table that does the calculation you need.
PS
how many rows you table has?
how many sector do you have?
Thanks @Anonymous I will explore the Table.Group option. The rows number will keep growing at every run, and the amount of sectors and the Sector values received may vary as well
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.