Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
I need to calculate the number concurrent events by category for each time period and then the peak/max concurrent events for each category. I was able to do this, however:
The sample file’s base data table is 100 rows from an excel file, my real data is closer to 100,000,000 and connected to a Rest API. When I load in my data it takes forever. When I try to do my calculations everything is slow. I am unable to even save the file after I load in my data. I can’t pre-aggregate because there are connecting tables I need to filter by using slicers.
SO In short, I need a more efficient way of making these calculations. Or advice on what are the next best steps (ex: get better hardware, etc.).
My data looks like this prior to anything:
EventID ApplicationId StartTime EndTime
"A1" "TV" 2020-09-10 13:00:12.000 2020-09-11 07:29:17.000
"A2" "Food" 2020-09-09 01:00:00.000 2020-09-09 16:11:25.000
"A3" "Food" 2020-09-10 04:00:00.000 2020-09-10 06:44:17.000
"A4" "TV" 2020-09-10 10:00:00.000 2020-09-10 18:36:59.000
But for my calculations the only solution I’ve found expands my table to this with M:
EventID ApplicationId Time
"A1" "TV" 2020-09-10 01:00:00.000
"A1" "TV" 2020-09-10 01:01:00.000
"A1" "TV" 2020-09-10 01:02:00.000
"A1" "TV" 2020-09-10 01:03:00.000
"A1" "TV" 2020-09-10 01:04:00.000
OR this with DAX:
EventID ApplicationId StartTime EndTime Time
"A1" "TV" 2020-09-10 13:00:12.000 2020-09-11 07:29:17.000 2020-09-10 13:04:00.000
"A1" "TV" 2020-09-10 13:00:12.000 2020-09-11 07:29:17.000 2020-09-10 13:03:00.000
"A1" "TV" 2020-09-10 13:00:12.000 2020-09-11 07:29:17.000 2020-09-10 13:02:00.000
"A1" "TV" 2020-09-10 13:00:12.000 2020-09-11 07:29:17.000 2020-09-10 13:01:00.000
"A1" "TV" 2020-09-10 13:00:12.000 2020-09-11 07:29:17.000 2020-09-10 13:00:00.000
I then have a combination of 2 measures to calculate #1 number of concurrent events for any given time and #2 The maximum number of concurrent events for the interval determined by a slicer.
[_#category_sessions] =
VAR _categories=DISTINCTCOUNT(myTable_m_expand[EventID])
RETURN
SUMX(DISTINCT(myTable_m_expand[Time]),_categories)
[_#maxConcurrent] =
MAXX(
KEEPFILTERS(VALUES('myTable_m_expand'[Time])),
CALCULATE([_#category_sessions])
)
CURRENT PC SPECS:
I5-8500
16GB RAM
Hi, @lance_6
Sorry I know very little about Rest API, but I would like to know what is your connection mode.
Are you connecting data in import mode? in that case if the data source you are connecting to is too big it will overload your computer, why don't you try to change the connection mode to direct query or live connection, maybe the computer will run more smoothly this way
Best Regards,
Community Support Team _ Zeon Zheng
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi, The first suggestion I'd make is first try and change this line of code VAR _categories=DISTINCTCOUNT(myTable_m_expand[EventID])
To VAR _categories=SUMX(DISTINCT(myTable_m_expand[EventID]),1)
Distinct Count is memory intensive where you have high cardinality in a column.
If you have experience with Dax Studio, it is very useful for checking measure performance within your model.
Hope this helps.
So I tried this with my actuall data model and it crashed it due to high ram consumption where as COUNDISTINCT is able to run.
Thank you, I'll give this a try and see what it does to perfomance.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
102 | |
84 | |
79 | |
70 |
User | Count |
---|---|
120 | |
110 | |
95 | |
82 | |
77 |