Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I have a data model with a single table 'Sessions' (1.5M rows) with columns for UserId, StartTime, and EndTime. A user can have multiple sessions throughout the day, and sometimes they are concurrent meaning a 2nd session will start after the first one starts but before it ends, and the 2nd session can end before or after the first event. Each event is typically under 30 seconds.
I am trying to create a calculated column that has the count of other sessions by the same UserId anytime during the duration of the current session. This is what I came up with and it takes >45 minutes to run and I am not sure if the results are even correct, but in general is there anything I can do to improve performance?
isConcurrent =
VAR currentUID = 'Sessions'[UID]
VAR currentStart = 'Sessions'[StartTime]
VAR currentEnd = 'Sessions'[EndTime]
VAR t=CALCULATETABLE('Sessions', FILTER('Sessions', //Should there be an All(Sessions) here?
'Sessions'[UID] = currentUID &&
(
('Sessions'[StartTime] <= currentStart &&
('Sessions'[EndTime] > currentStart && 'Sessions'[EndTime] <= currentEnd)) // O2
||
('Sessions'[StartTime] <= currentStart &&
'Sessions'[EndTime] >= currentEnd) //O1
||
(('Sessions'[StartTime] >= currentStart && 'Sessions'[StartTime] <= currentEnd) &&
'Sessions'[EndTime] >= currentEnd) //O3
||
(('Sessions'[StartTime] >= currentStart && 'Sessions'[StartTime] <= currentEnd) &&
'Sessions'[EndTime] <= currentEnd) //O4
)
)
)
RETURN
COUNTROWS(t)
@hymieho3
Obviously this has been ages since you posted this, but did you find a good solution?
I have a similar issue that I've been struggling with for months.
Thank you!
Hi @hymieho3
How about this measure?
Measure =
CALCULATE (
COUNT ( Sessions[UID] ),
FILTER (
ALLEXCEPT ( Sessions, Sessions[UID] ),
NOT (
[EndTime] < MAX ( Sessions[StartTime] )
|| [StartTime] > MAX ( Sessions[EndTime] )
)
)
)
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
The measure is not very performant--it actually makes my machine with 24GB RAM run out of resources when added to a table with the full dataset. Also I am not sure that the measure captures all of the overlap scenarios:
I am more looking for data modeling strategies... For example if I were just looking at the day grain then I could use a date table to filter. But the StartTime and EndTime are down to milleseconds so I am at a loss. Or maybe I can use a date table to at least narrow the scope of the events to compare (events by the same user on the same day), and then do comparison against the events within scope. I tried this approach using a table variable, but I cannot access the columns of a table variable for the 2nd part so I am stuck.
The measure is not very performant--it actually makes my machine with 24GB RAM run out of resources when added to a table with the full dataset. Also I am not sure that the measure captures all of the overlap scenarios:
I am more looking for data modeling strategies... For example if I were just looking at the day grain then I could use a date table to filter. But the StartTime and EndTime are down to milleseconds so I am at a loss. Or maybe I can use a date table to at least narrow the scope of the events to compare (events by the same user on the same day), and then do comparison against the events within scope. I tried this approach using a table variable, but I cannot access the columns of a table variable for the 2nd part so I am stuck.
Were you ever able to figure out a solution to this? I have been stuck on this for over a year and the best I can do is come up with an approximate answer through a measure that never seems to work just right.
User | Count |
---|---|
140 | |
113 | |
104 | |
77 | |
65 |
User | Count |
---|---|
136 | |
118 | |
101 | |
71 | |
61 |