Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello, may someone help me with a task? I have a table data for this example with just two columns: Key anda data.
I want to calculate the average value by key.
My data have some outliners values that I want to remove from the average calculation.
Task: Remove outliners:
1-Calculate first quartile
2-Calculate third quartile
3-Calculate outer limits.
4-Calculate average value without outliners.
I am using a table visual to agregate the key value and a measure to calculate all of this. But it is slow.
AverageCalc =
var k1 = 0.25
var k2 = 0.75
//This part is to calculate the boundaries of quartiles. Took the code from here:
https://community.powerbi.com/t5/Desktop/Percentile-exc-Error/td-p/637337
var NumValues = COUNT ( Planilha1[Data] )
var LowerBound = 1 / ( NumValues + 1 )
var UpperBound = NumValues / ( NumValues + 1 )
//If my k is inside the boundaries, calculate quartile 1 and quartile 3
var q1 =
IF (
AND ( k1 >= LowerBound, k1 <= UpperBound ),
PERCENTILE.EXC ( Planilha1[Data], k1)
)
var q3 =
IF (
AND ( k2 >= LowerBound, k2 <= UpperBound ),
PERCENTILE.EXC ( Planilha1[Data], k2)
)
//Calculate interquartile
var iqr = q3 - q1
//calculate inner boudaries from my data
var lowValue = q1 - (1.5 * iqr)
var highValue = q3 + (1.5 * iqr)
//calculate outer boudaries from my data
var ultraLowValue = q1 - (3.0 * iqr)
var ultraHighValue = q3 + (3.0 * iqr)
//calculate my average without outliers
var media =
CALCULATE(
AVERAGE(Planilha1[Data]),
Planilha1[Data] <= ultraHighValue
)
return
media
My data: http://www.sharecsv.com/s/5e5b0aad68beec3c8764bcf737ef2bf5/Test_Data.csv
That is some interesting code. The conditionals are missing the alternate value so you end up doing a lot of math on BLANK(). A few of your variables are unused in the final result but I think DAX is smart enough not to calculate them.
In any case - load your query into DAX Studio and investigate the query plan and server timings there. I used your sample data and saw this
and this
(this goes on for 3000 more lines)
Definitely has room for improvement. First question would be - do you really have to compute that across all keys ?
I modified the code slightly but the real issue is the average calculation at the end. Not sure how to speed that up.
AverageCalc =
var k=SELECTEDVALUE(Planilha1[key])
var k1 = 0.25
var k2 = 0.75
var NumValues = COUNT ( Planilha1[Data] )
var LowerBound = 1 / ( NumValues + 1 )
var UpperBound = NumValues / ( NumValues + 1 )
var q1 = IF ( k1 >= LowerBound && k1 <= UpperBound, PERCENTILE.EXC ( Planilha1[Data], k1),0)
var q3 = IF ( k2 >= LowerBound && k2 <= UpperBound, PERCENTILE.EXC ( Planilha1[Data], k2),0)
var ultraHighValue = 4*q3 - 3*q1
var media = CALCULATE(
AVERAGE(Planilha1[Data]),
Planilha1[key]=k,
Planilha1[Data] <= ultraHighValue
)
return media
Here is a version that you may want to consider. It still has a horrible query plan but seems to execute quite a bit faster.
AverageCalc =
var k1 = 0.25
var k2 = 0.75
var NumValues = COUNT ( Planilha1[Data] )
var LowerBound = 1 / ( NumValues + 1 )
var UpperBound = NumValues / ( NumValues + 1 )
var q1 = IF ( k1 >= LowerBound && k1 <= UpperBound, PERCENTILE.EXC ( Planilha1[Data], k1),0)
var q3 = IF ( k2 >= LowerBound && k2 <= UpperBound, PERCENTILE.EXC ( Planilha1[Data], k2),0)
var ultraHighValue = 4*q3 - 3*q1
var media = AVERAGEX(filter(Planilha1,Planilha1[Data] <= ultraHighValue),Planilha1[Data])
return media
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
112 | |
100 | |
80 | |
64 | |
57 |
User | Count |
---|---|
146 | |
110 | |
93 | |
84 | |
67 |