Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi all,
I'm struggling to reduce the calculation time of a calculated column that is trying to find the distinct count of another table's column.
I currently have two tables:
The first describes when each person was being seen by a particular department, which looks like this:
Department Visits
ClientID | Department | Entry Datetime | Exit Datetime | Urgency |
1 | X | 1/1/2019 00:00 | 9/1/2019 00:00 | 0 |
1 | A | 1/1/2019 14:00 | 2/1/2019 00:00 | 1 |
1 | B | 2/1/2019 00:00 | 2/1/2019 12:00 | 2 |
1 | C | 2/1/2019 01:00 | 5/1/2019 00:00 | 3 |
2 | A | 3/1/2019 00:00 | 4/1/2019 00:00 | 1 |
2 | B | 4/1/2019 00:00 | 5/1/2019 12:00 | 2 |
2 | B | 7/1/2019 00:00 | 8/1/2019 00:00 | 2 |
etc.
This table is ~275,000 rows, but has 37 total columns (I've only shown the 5 columns that are relevant.)
Note: Urgency to Department is 1:* relationship
(multiple departments can have the same urgency, but the same department doesn't have two urgencies)
I then have a table I've generated through Powerquery (for the first two columns) then the rest are calculated columns after I've loaded the query.
It looks like this:
Client Location
Client ID | Date | Highest Urgency at start of day | Highest Urgency Department at start of day | Number of departments seen on day |
1 | 1/1/2019 | 0 | X | 1 |
1 | 2/1/2019 | 1 | A | 3 |
1 | 3/1/2019 | 3 | C | 1 |
1 | 4/1/2019 | 3 | C | 1 |
1 | 5/1/2019 | 3 | C | 1 |
1 | 6/1/2019 | 0 | X | 0 |
1 | ... until end of dataset |
This table has 26.7 million rows (which I've reduced as much as I can I think)
Client ID & Date (from query, using M): summarizes client ID and generates list for each, starting from first day for that client. Includes all days they are "active" with a department
Highest Urgency at start of day: ignores department visits starting at midnight
Highest Urgency Department at start of day: ignores department visits starting at midnight
Number of Departments seen on day: Department that overlap at all with that date, and have an urgency greater than 0
The formulas for each row look like this, and calculate fine (<5 mins):
Highest Urgency at start of day =
var clientID = 'Client Location'[Client ID]
var currentdate = 'Client Location'[Date]
var maxurgency =
CALCULATE(
MAX('Department Visits'[Urgency]),
filter(
all('Department Visits'),
'Department Visits'[Client ID] = clientID
&&
'Department Visits'[Entry Date] < currentdate
&&
'Department Visits'[Exit Date] >= currentdate
)
)
return maxurgency
Highest Urgency Department =
var currentdate = 'Client Location'[Date]
//Checks at midnight what department client was in. If client entered department on that day at midnight, they are not counted
return
if (
isblank('Client Location'[Highest Urgency at start of day]),
blank(),
CALCULATE(
SELECTEDVALUE('Department Visits'[Department], "multiple departments"),
filter(
SUMMARIZE(
'Department Visits',
'Department Visits'[Client ID],
'Department Visits'[Department],
'Department Visits'[Entry Date],
'Department Visits'[Exit Date],
'Department Visits'[Urgency]
),
'Department Visits'[ClientID] = 'Client Location'[Client ID]
&&
'Department Visits'[Entry Date] < currentdate
&&
'Department Visits'[Exit Date] >= currentdate
&&
'Department Visits'[Urgency] = 'Client Location'[Highest Urgency at start of day]
)
)
)
My calculation issue comes in with my "Number of Departments seen on day" column (I've included above what my expectation of how it should look is). My formula is below but never seems to calculate (even after leaving it hours):
Number of departments seen on day =
var currentdate = 'Client Location'[Date]
return
CALCULATE(
DISTINCTCOUNT('Department Visits'[Department]),
filter(
SUMMARIZE(
'Department Visits',
'Department Visits'[Client ID],
'Department Visits'[Entry Date].[Date],
'Department Visits'[Exit Date].[Date],
'Department Visits'[Department],
'Department Visits'[Urgency]
),
'Department Visits'[Urgency] > 0
&&
'Department Visits'[Client ID] = 'Client Location'[Client ID]
&&
'Department Visits'[Entry Date].[Date] <= currentdate
&&
'Department Visits'[Exit Date].[Date] >= currentdate
)
)
)
Apologies in advance for any inconsistent formula errors, I've had to anonymise the data manually.
Thanks very much!!
Solved! Go to Solution.
Hi @Anonymous,
Just to make the information clear can you answer me this questions:
Believe that you don't need tom had this 26 Million rows on your model to get the calculation if you have a calendar table with the corrections calculations you can have this made with the base of you model that as a lot less columns and will not take that long to calculate.
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsHi @Anonymous,
Just to make the information clear can you answer me this questions:
Believe that you don't need tom had this 26 Million rows on your model to get the calculation if you have a calendar table with the corrections calculations you can have this made with the base of you model that as a lot less columns and will not take that long to calculate.
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsHi MFelix, I've answered your questions below:
Hi @Anonymous ,
As refered believe that your issue is related with the fact that you create a table with 26M rows to track the movements of the client, I believe you need to have a calendar table and then based on that create the flow.
Please check the pbix file with an example based on your data, this may needs some adjustments but believe is better for your calculations than a table with 26M rows.
Just giving some examples that the unfold of a model into a bigger table can be made using measures or dimension tables.
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsAhh this makes sense.
I've been using this table approach so I can think through the logic to determine each next step, but didn't think to include as measures to speed it up!
I did manage to find another solution in the meantime, that works on the table approach with a very short calculation time for anyone else who's interested:
Thanks for your help!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
98 | |
80 | |
64 | |
57 |
User | Count |
---|---|
145 | |
111 | |
92 | |
84 | |
66 |