Optimizing Calculated Column for approx 2million R...

gra_wang · ‎03-21-2018

I have the following columns:

refCallDetail[ANI_DIALNUM],
refCallDetail[contact_name],

refCallDetail[Contact Type],
refCallDetail[start_datetime],
refCallDetail[end_datetime]

each with pretty high amounts of cardinality except for [Contact Type], which only has 4 distinct values. I have also have nearly 2 million rows of data in this table. I realize that you generally want to reduce cardinality as much as possible and you want to split datetime columns, but I felt it was necessary to leave the columns as they were (you'll see why in the formula below).

I attempted this calculated column:

XFerCheck=

VAR
FutureANI=refCallDetail[contact_name]

VAR
startdt=refCallDetail[start_datetime]

VAR
enddt=refCallDetail[end_datetime]

RETURN

COUNTX(
    FILTER(
        FILTER(refCallDetail,
            (refCallDetail[ANI_DIALNUM]=FutureANI)&&
            (refCallDetail[Contact Type]="Inbound Call")
        ),
        (refCallDetail[start_datetime]>startdt)&&
        (refCallDetail[start_datetime]<enddt)
    ),
    refCallDetail[contact_name])

This results in a sort of endless wait for this to finish calculating (I've never been able to finish so I'm stuck here), but it also consumes a lot of my computer's memory. Is there a way to optimize this calculation? Better yet, is there a way to translate this into a measure?
I had intended to create this calculated field so that I could use the counts in a simple measure.

Phil_Seamark · ‎03-22-2018

and just out of curiority, does this fare any better? I get it to return in about 25 seconds using simulated data over 2,000,000 rows

				VAR
					FutureANI=refCallDetail[contact_name]

				VAR
					startdt=refCallDetail[start_datetime]
		
				VAR
					enddt=refCallDetail[end_datetime]

				VAR t = 
       				 FILTER(
       				 	refCallDetail,
            				        refCallDetail[ANI_DIALNUM]=FutureANI &&
		           		 	refCallDetail[Contact Type]="B" && 
		       				refCallDetail[start_datetime]>startdt &&
		        			refCallDetail[start_datetime]<enddt
    						)

				RETURN COUNTX(T,[Contact Type])))

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

gra_wang · ‎03-22-2018

I attempted the modified formula you suggested. Unfortunately, I was still stuck on 'Working On It'. I let it run for at least 10 minutes before just force closing PowerBI. However, for this iteration of the formula, I noticed that my memory usage did not spike like it did with my initial attempt.

As for my intent, here is a snippet of my data table as it may better explain what I'm trying to accomlish (I originally ommitted the [skill_name] column):

Here is what I would like to accomlish with this calculated field:

For each [skill_name] = "Tech Support ACB", give me a count of (imagine each point as an 'AND' statement):
- [ANI_DIALNUM] matches the [contact_name] of each skill_name 'Tech Support ACB'
- [start_datetime] is between the [start_datetime] and [end_datetime] of each skill_name 'Tech Support ACB'

- [Contact Type]="Inbound Call"

Once I have this calculated field, I could then create a simple measure to calculate the percentage of Tech Support ACB skills that were transferred (i.e. have a count >1). It is not possible for another Tech Support ACB to be counted within the count of each ACB which is why I have everything in one table.

I am not opposed to other ideas. If you have a measure in mind that would allow me to accomplish my intent without having to create a calculated field, I'm all ears.

Phil_Seamark · ‎03-22-2018

HI @gra_wang

If you can obfuscate your data and share it privately, I can do a better job at trying to optimise over the actual data for you.

I'd still recommend a calculated column over a calculated measure for this. You only want the calculation to run once, rather than everytime a user does something on your report.

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

gra_wang · ‎03-22-2018

Sent you a PM

Phil_Seamark · ‎03-21-2018

HI @gra_wang

It looks pretty well optimised, but can you please clarify what you are trying to achieve?

It looks like, for every row in the table, it wants to count the number of other Contacts that have a row from the same table between the start/end dates

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Optimizing Calculated Column for approx 2million Rows of Data

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly