Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Greg_Deckler
Super User
Super User

StdDev from Mean by Category

OK, been working with school data and can't seem to wrap my head around one particular issue so hopefully someone can help out.

 

Basically, what I want to do is to calculate how many standard deviations each school district is away from the average score of all school districts but have it so that the average score and the standard deviation is influenced by what category the school district falls into.

 

So, for example, let's say that school districts have an average score of 2 and the standard deviation is .5. For all rural schools only, the average score is 2.75 and the standard deviation is .25. If I therefore have a school that ranks a 3, the resulting standard deviations from mean would be 2 if looking at all schools ((3 - 2)/.5) and would be .5 for rural schools only ((3 - 2.75)/.25). Hopefully that makes sense. The idea here is that this metric can be used to find the outliers, those schools that do better than their peers and thus they can be studied to see what they do differently than the other schools in order to help all students do better in those other schools.

 

The issue is that I can't seem to get it to work the way I want. I can calculate the mean and stddev easy enough and I can create my measure easy enough but it doesn't do what I want because the visualization filters it to each school individually, which isn't helpful in this case. So, I tried measures like:

 

Overall Score StdDevs from Mean = (MAX([OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALL(DISTRICT))) / CALCULATE([Overall Score District StdDev],ALL(DISTRICT))

The problem with the above measure is that this score is not impacted by the school category when I select one in a slicer. The one below:

 

Overall Score StdDevs from Mean = (MAX([OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))) / CALCULATE([Overall Score District StdDev],ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))

This returns NaN.

 

And this version also remains constant regardless.

 

Overall Score StdDevs from Mean = (MAX(DISTRICT[OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALL(DISTRICT),ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))) / CALCULATE([Overall Score District StdDev],ALL(DISTRICT),ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))

 

It seems to me that there has to be an easy solution to this that I am just missing but I can't wrap my head around how to get the result I want. Perhaps I am going down the wrong path here and there is a different method to achieve what I want?

 

 

So, here is some sample data:

 

DISTRICT

 

DISTRICT IRNDISTRICT NAMEOVERALL SCORE
442Manchester Local2.75
43489Akron City1.31
43497Alliance City1.63
43505Ashland City2.63
43513Ashtabula Area City1.94
43521Athens City2.69
43539Barberton City1.69
43547Bay Village City4.19
43554Beachwood City3.63
43562Bedford City1.69
43570Bellaire Local1.69
43588Bellefontaine City3.25
43596Bellevue City2.94
43604Belpre City2.19
43612Berea City1.94
43620Bexley City2.88
43638Bowling Green City3.25
43646Brecksville-Broadview Heights City4.19
43653Brooklyn City2.69
43661Brunswick City3.5
43679Bryan City2.94
43687Bucyrus City1.81
43695Cambridge City2.75
43703Campbell City1.75
43711Canton City1.56

 

TYPOLOGY BY DISTRICT

 

 

 

 

IRNDistrict Name2013 Typology
442Manchester Local (Adams)1
43489Akron City8
43497Alliance City7
43505Ashland City4
43513Ashtabula Area City4
43521Athens City5
43539Barberton City7
43547Bay Village City6
43554Beachwood City6
43562Bedford City7
43570Bellaire Local4
43588Bellefontaine City4
43596Bellevue City1
43604Belpre City4
43612Berea City7
43620Bexley City6
43638Bowling Green City5
43646Brecksville-Broadview Heights City6
43653Brooklyn City7
43661Brunswick City5
43679Bryan City4
43687Bucyrus City4
43695Cambridge City4
43703Campbell City7
43711Canton City8

 

 

Topologies

 

 2013 Typology Code Major Grouping
0Island
1Rural
2Rural
3Small Town
4Small Town
5Suburban
6Suburban
7Urban
8Urban

 

I built these measures (in DISTRICT):

Overall Score District Mean = AVERAGE(DISTRICT[OVERALL SCORE])

Overall Score District StdDev = STDEV.P(DISTRICT[OVERALL SCORE])

 

 

Here are the relationships

 

DISTRICTS.png


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
1 ACCEPTED SOLUTION

Thanks for the suggestion @v-haibl-msft. I actually solved this over the weekend a different way but I'll see if I can find the time to go back and try that method. What I ended up doing was merging my three tables into a single table and then wrapping the calculation of the average in a CALCULATE statement with an ALLEXCEPT clause that included the two categories that I wanted to factor into the calculation dynamically. Worked as expected once the data was all in a single table. You can actually see the end result here in the Data Stories Gallery:

 

http://community.powerbi.com/t5/Data-Stories-Gallery/Ohio-Primary-and-Secondary-Education-Performanc...

 

I just published it last night. Thanks again @KGrice for taking a look at this also.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

6 REPLIES 6
Toots_N
Frequent Visitor

Hi @KGrice,

I have almost the same problem as Greg but I've had to calculate my StdDev via the below using Summarize and Calculatetable. My data is a tranactional table, standard deviation is required at the monthly level with filtering to two seperate hierarchies built on location. I need the standard deviation (based on month) to dynmaically reflect the filters.

 

VAR _table = SUMMARIZE(CALCULATETABLE(Fct_Incident,Tbl_Calendar[CurMonthOffset]>=-26,Tbl_Calendar[CurMonthOffset] < -1), Fct_Incident[Start of Month],"_CountID",distinctcount(Fct_Incident[Incident ID]))
VAR _STDEV = STDEVX.P(_table,[_CountID])
 
My problem is the same as Greg's in that I want to Chart the results as a Statistical Process Chart (where I can control the "baseline" - this is problem I'm having with the availble add in apps) but I can't get the StdDev to display on each (month) row. Unfortunately the same solution that you provided for Greg doesn't work for me because I'm using a calculated/summarized table.
 
Do you have any thoughts on how to do this? Or should I be using a different way to group by Fct_Incident[Start of Month].
 
Would appreciate any assistance.
 
EDIT: Solved my own issue
STDEVX.P(CALCULATETABLE(SUMMARIZE(Fct_Incident,Fct_Incident[Start of Month],"Count_ID",DISTINCTCOUNT(Fct_Incident[Incident ID])), ALL(Fct_Incident[Start of Month])),[Count_ID])
KGrice
Memorable Member
Memorable Member

Hi @Greg_Deckler. I think this works, but I'm not sure if you're wanting to show your rural and not rural scores in the same measure depending on where the shool is, or if they should be separate. Here's what I have now based on the sample data:

 

StdDevs.PNG

 

And here are the measures in the order they appear:

 

Overall Score District Mean = AVERAGE(DISTRICT[OVERALL SCORE])
Overall Score District StdDev = STDEV.P(DISTRICT[OVERALL SCORE])
Mean All Districts = CALCULATE(
[Overall Score District Mean],
ALL(District[DISTRICT NAME])
)
StdDev All Districts = CALCULATE(
[Overall Score District StdDev],
ALL(District[DISTRICT NAME])
)
Overall Score StdDevs from Mean =
([Overall Score District Mean] - [Mean All Districts]) / [StdDev All Districts]
Mean All Districts Rural = CALCULATE(
[Mean All Districts],
Topologies[ Major Grouping]="Rural"
)
StdDev All Districts Rural = CALCULATE(
[StdDev All Districts],
Topologies[ Major Grouping]="Rural"
)
Overall Score StdDevs from Mean Rural = IF(
HASONEVALUE(Topologies[ Major Grouping]),
IF(
VALUES(Topologies[ Major Grouping])="Rural",
([Overall Score District Mean] - [Mean All Districts Rural]) / [StdDev All Districts Rural],
BLANK()
),
([Overall Score District Mean Rural] - [Mean All Districts Rural]) / [StdDev All Districts Rural]
)

 

 

Hi @KGrice,

I have almost the same problem as Greg but I've had to calculate my StdDev via the below using Summarize and Calculatetable. My data is a tranactional table, standard deviation is required at the monthly level with filtering to two seperate hierarchies built on location. I need the standard deviation (based on month) to dynmaically reflect the filters.

 

VAR _table = SUMMARIZE(CALCULATETABLE(Fct_Incident,Tbl_Calendar[CurMonthOffset]>=-26,Tbl_Calendar[CurMonthOffset] < -1), Fct_Incident[Start of Month],"_CountID",distinctcount(Fct_Incident[Incident ID]))
VAR _STDEV = STDEVX.P(_table,[_CountID])
 
My problem is the same as Greg's in that I want to Chart the results as a Statistical Process Chart (where I can control the "baseline" - this is problem I'm having with the availble add in apps) but I can't get the StdDev to display on each (month) row. Unfortunately the same solution that you provided for Greg doesn't work for me because I'm using a calculated/summarized table.
 
Do you have any thoughts on how to do this? Or should I be using a different way to group by Fct_Incident[Start of Month].
 
Would appreciate any assistance.

Thanks @KGrice. This definitely works and essentially what I asked for, but not what I want! 🙂

 

I was *hoping* that I could make this measure dynamic such that I could put a slicer on the page and when the user clicked the slicer, it calculated things correctly.

 

The end go is that I will probably have two categories, "rural", "urban", etc. and then a category on level of poverty "<= 10%", "11%-20%".

 

Sorry, I feel like I'm whining... 🙂


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler

 

Maybe you can try to use the ALLSELECTED function to make the measure dynamic according to your slicer.

 

Best Regards,

Herbert

Thanks for the suggestion @v-haibl-msft. I actually solved this over the weekend a different way but I'll see if I can find the time to go back and try that method. What I ended up doing was merging my three tables into a single table and then wrapping the calculation of the average in a CALCULATE statement with an ALLEXCEPT clause that included the two categories that I wanted to factor into the calculation dynamically. Worked as expected once the data was all in a single table. You can actually see the end result here in the Data Stories Gallery:

 

http://community.powerbi.com/t5/Data-Stories-Gallery/Ohio-Primary-and-Secondary-Education-Performanc...

 

I just published it last night. Thanks again @KGrice for taking a look at this also.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.