Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi Experts,
I have a table with datetime column. Each row of the table is equal to one output.
I wish to calculate the output per hour with moving time window. The expected result is as table below.
DateTime | DateTime + 1 Hour | Count |
01/01/2021 9:00:01 AM | 01/01/2021 10:00:01 AM | 4 |
01/01/2021 9:01:01 AM | 01/01/2021 10:01:01 AM | 3 |
01/01/2021 9:50:00 AM | 01/01/2021 10:50:00 AM | 4 |
01/01/2021 10:00:01 AM | 01/01/2021 11:00:01 AM | 3 |
01/01/2021 10:10:01 AM | 01/01/2021 11:10:01 AM | 2 |
01/01/2021 10:20:00 AM | 01/01/2021 11:20:00 AM | 1 |
02/01/2021 8:00:00 AM | 02/01/2021 9:00:00 AM | 1 |
I managed to get the expected result using calculated columns, however it takes like "forever" to load each time I refresh the data. Currently the table contains 200,000++ rows and it's going to increase as time goes.
It's been said that performing the calculation in Power Query could help to improve the performance issue. Could please advise how this can be done in Power Qeury? Or how can I solve this performance issue?
Thanks in advance for the help.
Apologies for late reply.
A Power Query custom column would look like this (I hope):
let sd = [CreatedDate], hl = [HourLater] in
Table.RowCount(Table.SelectRows(#"Changed Type", each ([CreatedDate] >= sd) and [CreatedDate] <= hl))
I haven't really tested it so see how you get on.
Performance-wise, I don't know how it's going to do.
The overall problem is that to calculate the figure for each row all 200,000 rows have to be read. And then it happens again for the next row.
-----
You could experiment with Table.Buffer in the Power Query solution to read the table into memory.
---
What about Incremental Refresh if you go with the Power Query solution? It will depend on the source of the data implementing query folding but you do have a date field in the data. If it works, the first run will be slow and refreshes should be a lot faster.
I have tried out the formula at Power Query custom column. Over 2 hours, it was still loading at the Power Query.
For Table.Buffer and incremental refresh, I would need time to figure it out as these are new to me.
Thanks for the advice.
Hi, @lktan88
Would it be helpful to use measure instead of calculated column?
Calculated columns will consume a lot of memory when loading and refreshing, you could try to use measures.
If what you need is a column instead of a measure, then as you said you could try to get it in the Power query.
It would be great if there is a sample file without any sesentive information here.
Best Regards,
Community Support Team _ Zeon Zheng
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @v-angzheng-msft ,
Sorry for the late response.
I would like to try out calculated measure. In fact, I'm a newbie to Power BI and tried hard to get the measure, the result turned out was not what I expected. Appreciate much if you can guide this.
The file originally looks like below table. I will need to add 1 hour for each row to the Create Date. Then, I used the formula as mentioned in previous post to create the calculated column.
Create Date | Unit Code |
02/08/2021 5:56:30 PM | C06026071A146 |
02/08/2021 5:00:26 PM | C06028071B026 |
02/08/2021 4:59:20 PM | C06028071B027 |
02/08/2021 4:58:12 PM | C06028071B019 |
02/08/2021 4:57:08 PM | C10014071B072 |
02/08/2021 4:56:15 PM | C02121071B106 |
02/08/2021 4:55:48 PM | C06028071B031 |
02/08/2021 4:55:16 PM | C06028071B039 |
02/08/2021 4:54:48 PM | C02121071B088 |
02/08/2021 4:54:27 PM | C01021071A046 |
Hi, @lktan88
Try to create a measure like this to get DateTime+1 Hour
DateTime + 1 =
SELECTEDVALUE('Table'[DateTime])+TIME(1,0,0)
Result:
The Count measure could be like:
Count =
VAR _Start =
MAX ( 'Shipment Throughput'[Create Date] )
VAR _End =
MAX ( 'Shipment Throughput'[Create Date] ) + TIME ( 1, 0, 0 )
//VAR _End =_Start + TIME ( 1, 0, 0 )
//VAR _End =[DateTime + 1]
VAR _Count =
COUNTROWS (
FILTER (
ALL ( 'Shipment Throughput' ),
'Shipment Throughput'[Create Date] >= _Start
&& 'Shipment Throughput'[Create Date] <= _End
)
)
RETURN
IF ( ISBLANK ( _Count ), 0, _Count )
Best Regards,
Community Support Team _ Zeon Zheng
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
and how long does it take on the full dataset?
Below is the calculated column DAX:
Can you post that as text please?
Can you explain the logic behind the calculation please?
And can you post your calculated column DAX.
Hi @HotChilli ,
Thanks for your reply.
The table shows that the 1st output is at 9:00:01AM, 2nd output is at 9:01:01AM, 3rd is at 9:50:00AM and so on. I actually want to know that for the 1st row 9:00:01AM, how many are there the outputs within one hour, which is from 9:00:01AM to 10:00:01AM. Then 2nd row 9:01:01AM, how many are there the outputs from 9:01:01AM to 10:01:01AM and same for the rest of rows.
My calculated column is as below:
Appreciate much for your help.
User | Count |
---|---|
125 | |
106 | |
99 | |
63 | |
60 |
User | Count |
---|---|
135 | |
116 | |
101 | |
71 | |
61 |