Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
ehahn
Frequent Visitor

AVERAGEX -- EXCLUDING OUTLIERS?

I have a custom measure [% Change] which is calculating the month-over-month percent change of value over time. This measure is included in a matrix to display [% Change] for multiple accounts.

 

I have another custom measure [Average Return] which is calculating the average of [% Change] for each date period as defined by the rows of the matrix. 

 

everything is working, HOWEVER... I now want to EXCLUDE outliers from the [Average Return] calculation. 

 

currently Average Return = AVERAGEX(values(VeoBalHistory[Date]),[% Change])

 

any ideas on how to calculate the above but exclude outliers (for example the -47.62% displayed below)?

2017-10-04_12-30-58.png

1 ACCEPTED SOLUTION

@ehahn

I would recommend a measure like this:

 

Average Return Excluding Outliers =
// Use a method of your choosing to define thresholds for outliers
// This example uses:

// Lower Quartile - 1.5 * Interquartile range &
// Upper Quartile + 1.5 * Interquartile range VAR LowerQuartile = PERCENTILEX.INC ( VALUES ( VeoBalHistory[Date] ), [% Change], .25 ) VAR UpperQuartile = PERCENTILEX.INC ( VALUES ( VeoBalHistory[Date] ), [% Change], .75 ) VAR InterQuartileRange = UpperQuartile - LowerQuartile // Anything < OutlierThresholdLower is an outlier VAR OutlierThresholdLower = LowerQuartile - InterQuartileRange * 1.5 // Anything > than OutlierThresholdUpper is an outlier VAR OutlierThresholdUpper = UpperQuartile + InterQuartileRange * 1.5
// Calculate average over [% Change] only for dates when [% Change] is not an outlier
// For dates where [% Change] is an outlier, the IF function returns BLANK()

// which is not included in the average calculated by AVERAGEX RETURN AVERAGEX ( VALUES ( VeoBalHistory[Date] ), VAR ChangePct = [% Change] RETURN IF ( AND ( ChangePct >= OutlierThresholdLower, ChangePct <= OutlierThresholdUpper ), ChangePct ) )

 


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

View solution in original post

11 REPLIES 11
Greg_Deckler
Super User
Super User

Can you wrap your AVERAGEX in an CALCULATE and filter out outliers that way?


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

and keep in mind, my goal is to exclude the outlier values from the average calculation.

@ehahn

I would recommend a measure like this:

 

Average Return Excluding Outliers =
// Use a method of your choosing to define thresholds for outliers
// This example uses:

// Lower Quartile - 1.5 * Interquartile range &
// Upper Quartile + 1.5 * Interquartile range VAR LowerQuartile = PERCENTILEX.INC ( VALUES ( VeoBalHistory[Date] ), [% Change], .25 ) VAR UpperQuartile = PERCENTILEX.INC ( VALUES ( VeoBalHistory[Date] ), [% Change], .75 ) VAR InterQuartileRange = UpperQuartile - LowerQuartile // Anything < OutlierThresholdLower is an outlier VAR OutlierThresholdLower = LowerQuartile - InterQuartileRange * 1.5 // Anything > than OutlierThresholdUpper is an outlier VAR OutlierThresholdUpper = UpperQuartile + InterQuartileRange * 1.5
// Calculate average over [% Change] only for dates when [% Change] is not an outlier
// For dates where [% Change] is an outlier, the IF function returns BLANK()

// which is not included in the average calculated by AVERAGEX RETURN AVERAGEX ( VALUES ( VeoBalHistory[Date] ), VAR ChangePct = [% Change] RETURN IF ( AND ( ChangePct >= OutlierThresholdLower, ChangePct <= OutlierThresholdUpper ), ChangePct ) )

 


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

@OwenAuger

 

Would I be able to use the same measure if I wanted to just trim out the top and bottom 20% of values? 

@powerBIpeon

Yes, the structure of the measure would be similar. You would just redefine OutlierThresholdLower and OutlierThresholdUpper to be the 20th and 80th percentiles instead, with no need for the interquartile range.

 

Please post back if you need more assistance with writing the measure in your particular model.

 

Regards,

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

@OwenAuger

 

I'm using this measure below. Is this correct? I'm getting a slighlty different number when using the TRIMMEAN function in excel

 

Average Return Excluding Outliers = 

VAR LowerQuartile =
    PERCENTILEX.INC ( VALUES ( Account[Acct. Name] ), [Difference], .20 )
VAR UpperQuartile =
    PERCENTILEX.INC ( VALUES ( Account[Acct. Name] ),[Difference], .80 )

// Anything < OutlierThresholdLower is an outlier
VAR OutlierThresholdLower = LowerQuartile
 
// Anything > than OutlierThresholdUpper is an outlier
VAR OutlierThresholdUpper = UpperQuartile
    
// Calculate average over [% Change] only for dates when [% Change] is not an outlier
// For dates where [% Change] is an outlier, the IF function returns BLANK()
// which is not included in the average calculated by AVERAGEX
RETURN
    AVERAGEX( 
        VALUES ( Account[Acct. Name] ),
        VAR ChangePct = [Difference]
        RETURN
            IF (
                AND ( ChangePct >= OutlierThresholdLower, ChangePct <= OutlierThresholdUpper ),
                ChangePct
            )
    )
    

@powerBIpeon

To exactly replicate the behaviour of TRIMMEAN requires a slightly more careful measure.

 

TRIMMEAN in Excel trims an equal number of top & bottom items, and breaks ties arbitrarily, so that it is possible for items with the same value to be partially trimmed (which my earlier measure didn't do). Also, the percentage provided is multiplied by the number of items, then rounded down to the nearest even number.

 

Here is a measure that should replicate TRIMMEAN's behaviour. In this example, TrimPercent is set to 0.4, so 40% of items are trimmed, i.e. 20% at the top + 20% at the bottom.

I won't vouch for performance - this can probably be optimised! 🙂

Average Return Excluding Outliers v2 = 
VAR TrimPercent = 0.4 -- Same as the second argument of TRIMMEAN
VAR Items =
    VALUES ( Account[Acct. Name] )
VAR ItemCount =
    COUNTROWS ( Items )
VAR TrimCount =
    FLOOR ( TrimPercent * ItemCount, 2 ) / 2 -- Count of items to be trimmed at top (& bottom)
// ItemsToTrim is the union of the top & bottom items.
// If two items have the same [Difference value], then ties are broken
// using the ordering of Acct. Name VAR ItemsToTrim = UNION ( TOPN ( TrimCount, Items, RANKX ( Items, [Difference] + DIVIDE ( RANKX ( Items, Account[Acct. Name] ), ItemCount + 1 ) ), ASC ), TOPN ( TrimCount, Items, RANKX ( Items, [Difference] + DIVIDE ( RANKX ( Items, Account[Acct. Name] ), ItemCount + 1 ) ), DESC ) ) VAR ItemsToKeep = EXCEPT ( Items, ItemsToTrim ) RETURN AVERAGEX ( ItemsToKeep, [Difference] )

Regards,

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

Thank you so much for this!

Thank you for all the help! @OwenAuger

Great solution . 

I don't understand why there isn't a trimmean() in Excel, this function would be very usefull . 

new to DAX, so not sure how to do that.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.