Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
MatheusLPS
Helper I
Helper I

Calculation average slow with quartiles

Hello, may someone help me with a task? I have a table data for this example with just two columns: Key anda data.

 

I want to calculate the average value by key.

My data have some outliners values that I want to remove from the average calculation.

 

Task: Remove outliners:

1-Calculate first quartile

2-Calculate third quartile

3-Calculate outer limits. 

4-Calculate average value without outliners.

 

I am using a table visual to agregate the key value and a measure to calculate all of this. But it is slow.

 

 

AverageCalc = 

var k1 = 0.25
var k2 = 0.75

//This part is to calculate the boundaries of quartiles. Took the code from here:
https://community.powerbi.com/t5/Desktop/Percentile-exc-Error/td-p/637337

var NumValues = COUNT ( Planilha1[Data] )
var LowerBound = 1 / ( NumValues + 1 )
var UpperBound = NumValues / ( NumValues + 1 )

//If my k is inside the boundaries, calculate quartile 1 and quartile 3
var q1 = 
    IF (
        AND ( k1 >= LowerBound, k1 <= UpperBound ),
        PERCENTILE.EXC ( Planilha1[Data], k1)
    )

var q3 = 
    IF (
        AND ( k2 >= LowerBound, k2 <= UpperBound ),
        PERCENTILE.EXC ( Planilha1[Data], k2)
    )

//Calculate interquartile
var iqr = q3 - q1

//calculate inner boudaries from my data
var lowValue = q1 - (1.5 * iqr)
var highValue = q3 + (1.5 * iqr)

//calculate outer boudaries from my data
var ultraLowValue = q1 - (3.0 * iqr)
var ultraHighValue = q3 + (3.0 * iqr)

//calculate my average without outliers
var media = 
    CALCULATE(
        AVERAGE(Planilha1[Data]),
        Planilha1[Data] <= ultraHighValue
    )

return
media

 

 

My data: http://www.sharecsv.com/s/5e5b0aad68beec3c8764bcf737ef2bf5/Test_Data.csv

1 REPLY 1
lbendlin
Super User
Super User

That is some interesting code.  The conditionals are missing the alternate value so you end up doing a lot of math on BLANK().  A few of your variables are unused in the final result but I think DAX is smart enough not to calculate them.

 

In any case - load your query into DAX Studio and investigate the query plan and server timings there.  I used your sample data and saw this 

 

Annotation 2020-07-14 203102.png

 

and this

 

Annotation 2020-07-14 203145.png

(this goes on for 3000 more lines)

 

Definitely has room for improvement.  First question would be - do you really have to compute that across all keys ?

 

I modified the code slightly but the real issue is the average calculation at the end. Not sure how to speed that up.

 

 

AverageCalc = 
var k=SELECTEDVALUE(Planilha1[key])
var k1 = 0.25
var k2 = 0.75
var NumValues = COUNT ( Planilha1[Data] )
var LowerBound = 1 / ( NumValues + 1 )
var UpperBound = NumValues / ( NumValues + 1 )
var q1 = IF ( k1 >= LowerBound && k1 <= UpperBound, PERCENTILE.EXC ( Planilha1[Data], k1),0)
var q3 = IF ( k2 >= LowerBound && k2 <= UpperBound, PERCENTILE.EXC ( Planilha1[Data], k2),0)
var ultraHighValue = 4*q3 - 3*q1
var media = CALCULATE(
        AVERAGE(Planilha1[Data]),
        Planilha1[key]=k,
        Planilha1[Data] <= ultraHighValue
    )
return media

 

Here is a version that you may want to consider. It still has a horrible query plan but seems to execute quite a bit faster.

 

AverageCalc = 
var k1 = 0.25
var k2 = 0.75
var NumValues = COUNT ( Planilha1[Data] )
var LowerBound = 1 / ( NumValues + 1 )
var UpperBound = NumValues / ( NumValues + 1 )
var q1 = IF ( k1 >= LowerBound && k1 <= UpperBound, PERCENTILE.EXC ( Planilha1[Data], k1),0)
var q3 = IF ( k2 >= LowerBound && k2 <= UpperBound, PERCENTILE.EXC ( Planilha1[Data], k2),0)
var ultraHighValue = 4*q3 - 3*q1
var media = AVERAGEX(filter(Planilha1,Planilha1[Data] <= ultraHighValue),Planilha1[Data])
return media

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.