cancel
Showing results for
Did you mean:
Highlighted
Regular Visitor

## Optimizing Calculated Column for approx 2million Rows of Data

I have the following columns:

refCallDetail[ANI_DIALNUM],
refCallDetail[contact_name],

refCallDetail[Contact Type],
refCallDetail[start_datetime],
refCallDetail[end_datetime]

each with pretty high amounts of cardinality except for [Contact Type], which only has 4 distinct values. I have also have nearly 2 million rows of data in this table. I realize that you generally want to reduce cardinality as much as possible and you want to split datetime columns, but I felt it was necessary to leave the columns as they were (you'll see why in the formula below).

I attempted this calculated column:

XFerCheck=

VAR
FutureANI=refCallDetail[contact_name]

VAR
startdt=refCallDetail[start_datetime]

VAR
enddt=refCallDetail[end_datetime]

RETURN

COUNTX(
FILTER(
FILTER(refCallDetail,
(refCallDetail[ANI_DIALNUM]=FutureANI)&&
(refCallDetail[Contact Type]="Inbound Call")
),
(refCallDetail[start_datetime]>startdt)&&
(refCallDetail[start_datetime]<enddt)
),
refCallDetail[contact_name])

This results in a sort of endless wait for this to finish calculating (I've never been able to finish so I'm stuck here), but it also consumes a lot of my computer's memory. Is there a way to optimize this calculation? Better yet, is there a way to translate this into a measure?
I had intended to create this calculated field so that I could use the counts in a simple measure.

5 REPLIES 5
Super Contributor

## Re: Optimizing Calculated Column for approx 2million Rows of Data

It looks pretty well optimised, but can  you please clarify what you are trying to achieve?

It looks like, for every row in the table, it wants to count the number of other Contacts that have a row from the same table between the start/end dates

Proud to be a Datanaut!

Super Contributor

## Re: Optimizing Calculated Column for approx 2million Rows of Data

and just out of curiority, does this fare any better?  I get it to return in about 25 seconds using simulated data over 2,000,000 rows

```				VAR
FutureANI=refCallDetail[contact_name]

VAR
startdt=refCallDetail[start_datetime]

VAR
enddt=refCallDetail[end_datetime]

VAR t =
FILTER(
refCallDetail,
refCallDetail[ANI_DIALNUM]=FutureANI &&
refCallDetail[Contact Type]="B" &&
refCallDetail[start_datetime]>startdt &&
refCallDetail[start_datetime]<enddt
)

RETURN COUNTX(T,[Contact Type])))```

Proud to be a Datanaut!

Regular Visitor

## Re: Optimizing Calculated Column for approx 2million Rows of Data

I attempted the modified formula you suggested. Unfortunately, I was still stuck on 'Working On It'. I let it run for at least 10 minutes before just force closing PowerBI. However, for this iteration of the formula, I noticed that my memory usage did not spike like it did with my initial attempt.

As for my intent, here is a snippet of my data table as it may better explain what I'm trying to accomlish (I originally ommitted the [skill_name] column):

Here is what I would like to accomlish with this calculated field:

For each [skill_name] = "Tech Support ACB", give me a count of (imagine each point as an 'AND' statement):
- [ANI_DIALNUM] matches the [contact_name] of each skill_name 'Tech Support ACB'
- [start_datetime] is between the [start_datetime] and [end_datetime] of each skill_name 'Tech Support ACB'

- [Contact Type]="Inbound Call"

Once I have this calculated field, I could then create a simple measure to calculate the percentage of Tech Support ACB skills that were transferred (i.e. have a count >1). It is not possible for another Tech Support ACB to be counted within the count of each ACB which is why I have everything in one table.

I am not opposed to other ideas. If you have a measure in mind that would allow me to accomplish my intent without having to create a calculated field, I'm all ears.

Super Contributor

## Re: Optimizing Calculated Column for approx 2million Rows of Data

If you can obfuscate your data and share it privately, I can do a better job at trying to optimise over the actual data for you.

I'd still recommend a calculated column over a calculated measure for this.  You only want the calculation to run once, rather than everytime a user does something on your report.