Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I have a custom measure [% Change] which is calculating the month-over-month percent change of value over time. This measure is included in a matrix to display [% Change] for multiple accounts.
I have another custom measure [Average Return] which is calculating the average of [% Change] for each date period as defined by the rows of the matrix.
everything is working, HOWEVER... I now want to EXCLUDE outliers from the [Average Return] calculation.
currently Average Return = AVERAGEX(values(VeoBalHistory[Date]),[% Change])
any ideas on how to calculate the above but exclude outliers (for example the -47.62% displayed below)?
Solved! Go to Solution.
I would recommend a measure like this:
Average Return Excluding Outliers = // Use a method of your choosing to define thresholds for outliers
// This example uses:
// Lower Quartile - 1.5 * Interquartile range &
// Upper Quartile + 1.5 * Interquartile range VAR LowerQuartile = PERCENTILEX.INC ( VALUES ( VeoBalHistory[Date] ), [% Change], .25 ) VAR UpperQuartile = PERCENTILEX.INC ( VALUES ( VeoBalHistory[Date] ), [% Change], .75 ) VAR InterQuartileRange = UpperQuartile - LowerQuartile // Anything < OutlierThresholdLower is an outlier VAR OutlierThresholdLower = LowerQuartile - InterQuartileRange * 1.5 // Anything > than OutlierThresholdUpper is an outlier VAR OutlierThresholdUpper = UpperQuartile + InterQuartileRange * 1.5
// Calculate average over [% Change] only for dates when [% Change] is not an outlier
// For dates where [% Change] is an outlier, the IF function returns BLANK()
// which is not included in the average calculated by AVERAGEX RETURN AVERAGEX ( VALUES ( VeoBalHistory[Date] ), VAR ChangePct = [% Change] RETURN IF ( AND ( ChangePct >= OutlierThresholdLower, ChangePct <= OutlierThresholdUpper ), ChangePct ) )
Can you wrap your AVERAGEX in an CALCULATE and filter out outliers that way?
and keep in mind, my goal is to exclude the outlier values from the average calculation.
I would recommend a measure like this:
Average Return Excluding Outliers = // Use a method of your choosing to define thresholds for outliers
// This example uses:
// Lower Quartile - 1.5 * Interquartile range &
// Upper Quartile + 1.5 * Interquartile range VAR LowerQuartile = PERCENTILEX.INC ( VALUES ( VeoBalHistory[Date] ), [% Change], .25 ) VAR UpperQuartile = PERCENTILEX.INC ( VALUES ( VeoBalHistory[Date] ), [% Change], .75 ) VAR InterQuartileRange = UpperQuartile - LowerQuartile // Anything < OutlierThresholdLower is an outlier VAR OutlierThresholdLower = LowerQuartile - InterQuartileRange * 1.5 // Anything > than OutlierThresholdUpper is an outlier VAR OutlierThresholdUpper = UpperQuartile + InterQuartileRange * 1.5
// Calculate average over [% Change] only for dates when [% Change] is not an outlier
// For dates where [% Change] is an outlier, the IF function returns BLANK()
// which is not included in the average calculated by AVERAGEX RETURN AVERAGEX ( VALUES ( VeoBalHistory[Date] ), VAR ChangePct = [% Change] RETURN IF ( AND ( ChangePct >= OutlierThresholdLower, ChangePct <= OutlierThresholdUpper ), ChangePct ) )
Would I be able to use the same measure if I wanted to just trim out the top and bottom 20% of values?
Yes, the structure of the measure would be similar. You would just redefine OutlierThresholdLower and OutlierThresholdUpper to be the 20th and 80th percentiles instead, with no need for the interquartile range.
Please post back if you need more assistance with writing the measure in your particular model.
Regards,
Owen
I'm using this measure below. Is this correct? I'm getting a slighlty different number when using the TRIMMEAN function in excel
Average Return Excluding Outliers = VAR LowerQuartile = PERCENTILEX.INC ( VALUES ( Account[Acct. Name] ), [Difference], .20 ) VAR UpperQuartile = PERCENTILEX.INC ( VALUES ( Account[Acct. Name] ),[Difference], .80 ) // Anything < OutlierThresholdLower is an outlier VAR OutlierThresholdLower = LowerQuartile // Anything > than OutlierThresholdUpper is an outlier VAR OutlierThresholdUpper = UpperQuartile // Calculate average over [% Change] only for dates when [% Change] is not an outlier // For dates where [% Change] is an outlier, the IF function returns BLANK() // which is not included in the average calculated by AVERAGEX RETURN AVERAGEX( VALUES ( Account[Acct. Name] ), VAR ChangePct = [Difference] RETURN IF ( AND ( ChangePct >= OutlierThresholdLower, ChangePct <= OutlierThresholdUpper ), ChangePct ) )
To exactly replicate the behaviour of TRIMMEAN requires a slightly more careful measure.
TRIMMEAN in Excel trims an equal number of top & bottom items, and breaks ties arbitrarily, so that it is possible for items with the same value to be partially trimmed (which my earlier measure didn't do). Also, the percentage provided is multiplied by the number of items, then rounded down to the nearest even number.
Here is a measure that should replicate TRIMMEAN's behaviour. In this example, TrimPercent is set to 0.4, so 40% of items are trimmed, i.e. 20% at the top + 20% at the bottom.
I won't vouch for performance - this can probably be optimised! 🙂
Average Return Excluding Outliers v2 = VAR TrimPercent = 0.4 -- Same as the second argument of TRIMMEAN VAR Items = VALUES ( Account[Acct. Name] ) VAR ItemCount = COUNTROWS ( Items ) VAR TrimCount = FLOOR ( TrimPercent * ItemCount, 2 ) / 2 -- Count of items to be trimmed at top (& bottom)
// ItemsToTrim is the union of the top & bottom items.
// If two items have the same [Difference value], then ties are broken
// using the ordering of Acct. Name VAR ItemsToTrim = UNION ( TOPN ( TrimCount, Items, RANKX ( Items, [Difference] + DIVIDE ( RANKX ( Items, Account[Acct. Name] ), ItemCount + 1 ) ), ASC ), TOPN ( TrimCount, Items, RANKX ( Items, [Difference] + DIVIDE ( RANKX ( Items, Account[Acct. Name] ), ItemCount + 1 ) ), DESC ) ) VAR ItemsToKeep = EXCEPT ( Items, ItemsToTrim ) RETURN AVERAGEX ( ItemsToKeep, [Difference] )
Regards,
Owen
Thank you so much for this!
Great solution .
I don't understand why there isn't a trimmean() in Excel, this function would be very usefull .
new to DAX, so not sure how to do that.
User | Count |
---|---|
139 | |
113 | |
104 | |
77 | |
63 |
User | Count |
---|---|
135 | |
126 | |
110 | |
70 | |
61 |