cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
gra_wang Regular Visitor
Regular Visitor

Optimizing Calculated Column for approx 2million Rows of Data

I have the following columns:

refCallDetail[ANI_DIALNUM],
refCallDetail[contact_name],

refCallDetail[Contact Type],
refCallDetail[start_datetime],
refCallDetail[end_datetime]

 

each with pretty high amounts of cardinality except for [Contact Type], which only has 4 distinct values. I have also have nearly 2 million rows of data in this table. I realize that you generally want to reduce cardinality as much as possible and you want to split datetime columns, but I felt it was necessary to leave the columns as they were (you'll see why in the formula below).

I attempted this calculated column:

 
XFerCheck=

VAR
FutureANI=refCallDetail[contact_name]

VAR
startdt=refCallDetail[start_datetime]

VAR
enddt=refCallDetail[end_datetime]


RETURN


COUNTX(
    FILTER(
        FILTER(refCallDetail,
            (refCallDetail[ANI_DIALNUM]=FutureANI)&&
            (refCallDetail[Contact Type]="Inbound Call")
        ),
        (refCallDetail[start_datetime]>startdt)&&
        (refCallDetail[start_datetime]<enddt)
    ),
    refCallDetail[contact_name])

This results in a sort of endless wait for this to finish calculating (I've never been able to finish so I'm stuck here), but it also consumes a lot of my computer's memory. Is there a way to optimize this calculation? Better yet, is there a way to translate this into a measure?
I had intended to create this calculated field so that I could use the counts in a simple measure.

5 REPLIES 5
Phil_Seamark Super Contributor
Super Contributor

Re: Optimizing Calculated Column for approx 2million Rows of Data

HI @gra_wang

 

 

It looks pretty well optimised, but can  you please clarify what you are trying to achieve?

 

It looks like, for every row in the table, it wants to count the number of other Contacts that have a row from the same table between the start/end dates


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Phil_Seamark Super Contributor
Super Contributor

Re: Optimizing Calculated Column for approx 2million Rows of Data

and just out of curiority, does this fare any better?  I get it to return in about 25 seconds using simulated data over 2,000,000 rows

 

 

				VAR
					FutureANI=refCallDetail[contact_name]

				VAR
					startdt=refCallDetail[start_datetime]
		
				VAR
					enddt=refCallDetail[end_datetime]

				VAR t = 
       				 FILTER(
       				 	refCallDetail,
            				        refCallDetail[ANI_DIALNUM]=FutureANI &&
		           		 	refCallDetail[Contact Type]="B" && 
		       				refCallDetail[start_datetime]>startdt &&
		        			refCallDetail[start_datetime]<enddt
    						)

				RETURN COUNTX(T,[Contact Type])))

 


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

gra_wang Regular Visitor
Regular Visitor

Re: Optimizing Calculated Column for approx 2million Rows of Data

I attempted the modified formula you suggested. Unfortunately, I was still stuck on 'Working On It'. I let it run for at least 10 minutes before just force closing PowerBI. However, for this iteration of the formula, I noticed that my memory usage did not spike like it did with my initial attempt.

As for my intent, here is a snippet of my data table as it may better explain what I'm trying to accomlish (I originally ommitted the [skill_name] column):
Capture.JPG

 

Here is what I would like to accomlish with this calculated field:


For each [skill_name] = "Tech Support ACB", give me a count of (imagine each point as an 'AND' statement):
- [ANI_DIALNUM] matches the [contact_name] of each skill_name 'Tech Support ACB'
- [start_datetime] is between the [start_datetime] and [end_datetime] of each skill_name 'Tech Support ACB'

- [Contact Type]="Inbound Call"

 

Once I have this calculated field, I could then create a simple measure to calculate the percentage of Tech Support ACB skills that were transferred (i.e. have a count >1). It is not possible for another Tech Support ACB to be counted within the count of each ACB which is why I have everything in one table.

I am not opposed to other ideas. If you have a measure in mind that would allow me to accomplish my intent without having to create a calculated field, I'm all ears.

Phil_Seamark Super Contributor
Super Contributor

Re: Optimizing Calculated Column for approx 2million Rows of Data

HI @gra_wang

 

If you can obfuscate your data and share it privately, I can do a better job at trying to optimise over the actual data for you.

 

I'd still recommend a calculated column over a calculated measure for this.  You only want the calculation to run once, rather than everytime a user does something on your report. 


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

gra_wang Regular Visitor
Regular Visitor

Re: Optimizing Calculated Column for approx 2million Rows of Data

Sent you a PM