Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
cpiercey
Frequent Visitor

Calculate Outlier For Multiple Data Sets In Table

Hello, I am fairly new to Power BI and need some help determining the best method to calculate outliers using IQR for many unique items in one transaction table ('Unfiltered Forecast') with transaction data. The idea is to be able to forecast inventory needs while excluding any outliers which would skew the data. I would like to create another transaction table while filtering out the outliers so that they are not included in the item forecast demand. For this step of the process, I need to filter out the outliers. There are roughly 900,000 rows for 1,000 unique items in the transaction table, here is an example of what it contains:

 

GroupSKU    Total Quantity
Item155
Item154
Item158
Item1500
Item168
Item295
Item298
Item297
Item27000
Item273
Item35
Item34
Item35
Item36
Item375
Item33

 

I would expect the output to look like this:

 

GroupSKU    Total Quantity    isOutlier
Item155

no

Item154no
Item158no
Item1500yes
Item168no
Item295no
Item298no
Item297no
Item27000yes
Item273no
Item35no
Item34no
Item35no
Item36no
Item375yes
Item33no

 

Since there are about 1,000 unique items, it does not make sense to me to generate a table for each item. The main issue I have come across is being able to specify that I want the outlier calculations to apply only to each item in the table, rather than perform the calculation on the entire data set. Here is the calculated column I am currently working with:

 

 

IsOutlier =
VAR LowerQuartile = PERCENTILEX.INC ( filter('Unfiltered Forecast', 'Unfiltered Forecast'[GroupSKU] = [GroupSKU]), 'Unfiltered Forecast'[Total Quantity], .25)
VAR UpperQuartile = PERCENTILEX.INC ( filter('Unfiltered Forecast', 'Unfiltered Forecast'[GroupSKU] = [GroupSKU]), 'Unfiltered Forecast'[Total Quantity], .75)
VAR InterQuartileRange = UpperQuartile - LowerQuartile
VAR OutlierThresholdLower = LowerQuartile - InterQuartileRange * 1.5
VAR OutlierThresholdUpper = UpperQuartile + InterQuartileRange * 1.5

RETURN if('Unfiltered Forecast'[Total Quantity] >= OutlierThresholdUpper,"yes", "no")
 
What is the best method to do this? Is there some way I can filter the data or create a temp table for each individual calculation without causing unbearable slowdown due to inefficiency? Thanks for taking a look.
1 ACCEPTED SOLUTION
mahoneypat
Employee
Employee

Please try this variation on your expression as a calculated column in your original table.  You can then use that column as a filter in your other measures for analysis.

 

IsOutlier =
VAR thisSKU = 'Unfiltered Forecast'[GroupSKU]
VAR SKUValues =
    FILTER ( 'Unfiltered Forecast', 'Unfiltered Forecast'[GroupSKU] = thisSKU )
VAR LowerQuartile =
    PERCENTILEX.INC ( SKUValues, 'Unfiltered Forecast'[Total Quantity], .25 )
VAR UpperQuartile =
    PERCENTILEX.INC ( SKUValues, 'Unfiltered Forecast'[Total Quantity], .75 )
VAR InterQuartileRange = UpperQuartile - LowerQuartile
VAR OutlierThresholdLower = LowerQuartile - InterQuartileRange * 1.5
VAR OutlierThresholdUpper = UpperQuartile + InterQuartileRange * 1.5
RETURN
    IF (
        'Unfiltered Forecast'[Total Quantity] >= OutlierThresholdUpper,
        "yes",
        "no"
    )

 

If this works for you, please mark it as the solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat

 

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


View solution in original post

2 REPLIES 2
mahoneypat
Employee
Employee

Please try this variation on your expression as a calculated column in your original table.  You can then use that column as a filter in your other measures for analysis.

 

IsOutlier =
VAR thisSKU = 'Unfiltered Forecast'[GroupSKU]
VAR SKUValues =
    FILTER ( 'Unfiltered Forecast', 'Unfiltered Forecast'[GroupSKU] = thisSKU )
VAR LowerQuartile =
    PERCENTILEX.INC ( SKUValues, 'Unfiltered Forecast'[Total Quantity], .25 )
VAR UpperQuartile =
    PERCENTILEX.INC ( SKUValues, 'Unfiltered Forecast'[Total Quantity], .75 )
VAR InterQuartileRange = UpperQuartile - LowerQuartile
VAR OutlierThresholdLower = LowerQuartile - InterQuartileRange * 1.5
VAR OutlierThresholdUpper = UpperQuartile + InterQuartileRange * 1.5
RETURN
    IF (
        'Unfiltered Forecast'[Total Quantity] >= OutlierThresholdUpper,
        "yes",
        "no"
    )

 

If this works for you, please mark it as the solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat

 

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Pat, thanks so much! This worked perfectly. Great idea to put those into their own variables.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.