Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Distinct Count Very Slow - am I using it inefficiently?

Hi all,

I'm struggling to reduce the calculation time of a calculated column that is trying to find the distinct count of another table's column.

 

I currently have two tables:

 

The first describes when each person was being seen by a particular department, which looks like this:

Department Visits

ClientIDDepartmentEntry DatetimeExit DatetimeUrgency
1X1/1/2019 00:009/1/2019 00:000
1A1/1/2019 14:002/1/2019 00:001
1B2/1/2019 00:002/1/2019 12:002
1C2/1/2019 01:005/1/2019 00:003
2A3/1/2019 00:004/1/2019 00:001
2B4/1/2019 00:00

5/1/2019 12:00

2

2

B7/1/2019 00:008/1/2019 00:002

etc.

This table is ~275,000 rows, but has 37 total columns (I've only shown the 5 columns that are relevant.)

Note: Urgency to Department is 1:* relationship

(multiple departments can have the same urgency, but the same department doesn't have two urgencies)

 

I then have a table I've generated through Powerquery (for the first two columns) then the rest are calculated columns after I've loaded the query.

 

It looks like this:

Client Location

Client IDDateHighest Urgency at start of dayHighest Urgency Department at start of dayNumber of departments seen on day
11/1/20190X1
12/1/20191A3
13/1/20193C1
14/1/20193C1
15/1/20193C1
16/1/20190X0
1... until end of dataset   

 

This table has 26.7 million rows (which I've reduced as much as I can I think)

 

Client ID & Date (from query, using M): summarizes client ID and generates list for each, starting from first day for that client. Includes all days they are "active" with a department

Highest Urgency at start of day: ignores department visits starting at midnight

Highest Urgency Department at start of day: ignores department visits starting at midnight 

Number of Departments seen on day: Department that overlap at all with that date, and have an urgency greater than 0

 

The formulas for each row look like this, and calculate fine (<5 mins):

 

 

 

 

Highest Urgency at start of day = 

var clientID = 'Client Location'[Client ID]
var currentdate = 'Client Location'[Date]

var maxurgency =
CALCULATE(
    MAX('Department Visits'[Urgency]),
    filter(
        all('Department Visits'),
        'Department Visits'[Client ID] = clientID
        &&
        'Department Visits'[Entry Date] < currentdate
        &&
        'Department Visits'[Exit Date] >= currentdate
    )
)

return maxurgency

 

 

 

 

 

 

 

Highest Urgency Department = 

var currentdate = 'Client Location'[Date]

//Checks at midnight what department client was in. If client entered department on that day at midnight, they are not counted
return
if (
    isblank('Client Location'[Highest Urgency at start of day]),
    blank(),
    CALCULATE(
        SELECTEDVALUE('Department Visits'[Department], "multiple departments"),
        filter(
            SUMMARIZE(
                'Department Visits',
                'Department Visits'[Client ID],
                'Department Visits'[Department],
                'Department Visits'[Entry Date],
                'Department Visits'[Exit Date],
                'Department Visits'[Urgency]
            ),
            'Department Visits'[ClientID] = 'Client Location'[Client ID]
            &&
            'Department Visits'[Entry Date] < currentdate
            &&
            'Department Visits'[Exit Date] >= currentdate
            &&
            'Department Visits'[Urgency] = 'Client Location'[Highest Urgency at start of day]
        )
    )
)

 

 

 

 

My calculation issue comes in with my "Number of Departments seen on day" column (I've included above what my expectation of how it should look is). My formula is below but never seems to calculate (even after leaving it hours):

 

 

 

Number of departments seen on day =

var currentdate = 'Client Location'[Date]

return
CALCULATE(
    DISTINCTCOUNT('Department Visits'[Department]),
    filter(
        SUMMARIZE(
            'Department Visits',
            'Department Visits'[Client ID],
            'Department Visits'[Entry Date].[Date],
            'Department Visits'[Exit Date].[Date],
            'Department Visits'[Department],
            'Department Visits'[Urgency]
        ),
        'Department Visits'[Urgency] > 0
        &&
        'Department Visits'[Client ID] = 'Client Location'[Client ID]
        &&
        'Department Visits'[Entry Date].[Date] <= currentdate
        &&
        'Department Visits'[Exit Date].[Date] >= currentdate
    )
)
)

 

 

 

 

Apologies in advance for any inconsistent formula errors, I've had to anonymise the data manually.

 

Thanks very much!!

1 ACCEPTED SOLUTION
MFelix
Super User
Super User

Hi  @Anonymous,

 

Just to make the information clear can you answer me this questions:

  • You have one table that is Department Visits that is the basis of your model (275K rows)?
  • Add another table (ClientLocation) that is calculated that creates a row for each date of the client visits with 3 other columns that makes a model with 26,7 million rows?
  • Is the ClientLocation a detail of the department visit by day / client?
  • What do you want to calculate with the ClientLocation table?

Believe that you don't need tom had this 26 Million rows on your model to get the calculation if you have a calendar table with the corrections calculations you can have this made with the base of you model that as a lot less columns and will not take that long to calculate.


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



View solution in original post

4 REPLIES 4
MFelix
Super User
Super User

Hi  @Anonymous,

 

Just to make the information clear can you answer me this questions:

  • You have one table that is Department Visits that is the basis of your model (275K rows)?
  • Add another table (ClientLocation) that is calculated that creates a row for each date of the client visits with 3 other columns that makes a model with 26,7 million rows?
  • Is the ClientLocation a detail of the department visit by day / client?
  • What do you want to calculate with the ClientLocation table?

Believe that you don't need tom had this 26 Million rows on your model to get the calculation if you have a calendar table with the corrections calculations you can have this made with the base of you model that as a lot less columns and will not take that long to calculate.


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



Anonymous
Not applicable

Hi MFelix, I've answered your questions below:

 

  • You have one table that is Department Visits that is the basis of your model (275K rows)? Yes, this is uncalculated, source data
  • Add another table (ClientLocation) that is calculated that creates a row for each date of the client visits with 3 other columns that makes a model with 26,7 million rows? Yes. If a client visits from the 1/1/2018 - 10/1/2018, this shows a row for every one of those dates.
  • Is the ClientLocation a detail of the department visit by day / client? Yes. The logic is that the most urgent department that they are being seen by is the one they are technically "in".
  • What do you want to calculate with the ClientLocation table? My final aim is to create a set of flows that shows client moves between departments, however it's complicated by the logic that they may be "open" to a specific department, but then transfer to another department (a more urgent one) without closing the record. Therefore you have to take into account urgency to see where each client actually is. This is even further complicated by the fact that they will visit multiple departments in the same day!

Hi @Anonymous ,

 

As refered believe that your issue is related with the fact that you create a table with 26M rows to track the movements of the client, I believe you need to have a calendar table and then based on that create the flow.

 

Please check the pbix file with an example based on your data, this may needs some adjustments but believe is better for your calculations than a table with 26M rows.

 

Just giving some examples that the unfold of a model into a bigger table can be made using measures or dimension tables.


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



Anonymous
Not applicable

Ahh this makes sense.

 

I've been using this table approach so I can think through the logic to determine each next step, but didn't think to include as measures to speed it up!

 

I did manage to find another solution in the meantime, that works on the table approach with a very short calculation time for anyone else who's interested:

 

Number of Departments in day =
var currentdate = 'Client Location'[Date]
 
return
COUNTROWS(
SELECTCOLUMNS(
FILTER(
'Department visits',
'Department visits'[Urgency] > 0
&&
'Department visits'[Client ID] = 'Client Location'[ClientID]
&&
'Department visits'[Entry Date].[Date] <= currentdate
&&
'Department visits'[Exit Date].[Date] >= currentdate
),
"Department",DISTINCT('Department visits'[Department])
)

 

 

Thanks for your help!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.