cancel
Showing results for
Did you mean:
Super User

## StdDev from Mean by Category

OK, been working with school data and can't seem to wrap my head around one particular issue so hopefully someone can help out.

Basically, what I want to do is to calculate how many standard deviations each school district is away from the average score of all school districts but have it so that the average score and the standard deviation is influenced by what category the school district falls into.

So, for example, let's say that school districts have an average score of 2 and the standard deviation is .5. For all rural schools only, the average score is 2.75 and the standard deviation is .25. If I therefore have a school that ranks a 3, the resulting standard deviations from mean would be 2 if looking at all schools ((3 - 2)/.5) and would be .5 for rural schools only ((3 - 2.75)/.25). Hopefully that makes sense. The idea here is that this metric can be used to find the outliers, those schools that do better than their peers and thus they can be studied to see what they do differently than the other schools in order to help all students do better in those other schools.

The issue is that I can't seem to get it to work the way I want. I can calculate the mean and stddev easy enough and I can create my measure easy enough but it doesn't do what I want because the visualization filters it to each school individually, which isn't helpful in this case. So, I tried measures like:

`Overall Score StdDevs from Mean = (MAX([OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALL(DISTRICT))) / CALCULATE([Overall Score District StdDev],ALL(DISTRICT))`

The problem with the above measure is that this score is not impacted by the school category when I select one in a slicer. The one below:

`Overall Score StdDevs from Mean = (MAX([OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))) / CALCULATE([Overall Score District StdDev],ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))`

This returns NaN.

And this version also remains constant regardless.

`Overall Score StdDevs from Mean = (MAX(DISTRICT[OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALL(DISTRICT),ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))) / CALCULATE([Overall Score District StdDev],ALL(DISTRICT),ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))`

It seems to me that there has to be an easy solution to this that I am just missing but I can't wrap my head around how to get the result I want. Perhaps I am going down the wrong path here and there is a different method to achieve what I want?

So, here is some sample data:

DISTRICT

 DISTRICT IRN DISTRICT NAME OVERALL SCORE 442 Manchester Local 2.75 43489 Akron City 1.31 43497 Alliance City 1.63 43505 Ashland City 2.63 43513 Ashtabula Area City 1.94 43521 Athens City 2.69 43539 Barberton City 1.69 43547 Bay Village City 4.19 43554 Beachwood City 3.63 43562 Bedford City 1.69 43570 Bellaire Local 1.69 43588 Bellefontaine City 3.25 43596 Bellevue City 2.94 43604 Belpre City 2.19 43612 Berea City 1.94 43620 Bexley City 2.88 43638 Bowling Green City 3.25 43646 Brecksville-Broadview Heights City 4.19 43653 Brooklyn City 2.69 43661 Brunswick City 3.5 43679 Bryan City 2.94 43687 Bucyrus City 1.81 43695 Cambridge City 2.75 43703 Campbell City 1.75 43711 Canton City 1.56

TYPOLOGY BY DISTRICT

 IRN District Name 2013 Typology 442 Manchester Local (Adams) 1 43489 Akron City 8 43497 Alliance City 7 43505 Ashland City 4 43513 Ashtabula Area City 4 43521 Athens City 5 43539 Barberton City 7 43547 Bay Village City 6 43554 Beachwood City 6 43562 Bedford City 7 43570 Bellaire Local 4 43588 Bellefontaine City 4 43596 Bellevue City 1 43604 Belpre City 4 43612 Berea City 7 43620 Bexley City 6 43638 Bowling Green City 5 43646 Brecksville-Broadview Heights City 6 43653 Brooklyn City 7 43661 Brunswick City 5 43679 Bryan City 4 43687 Bucyrus City 4 43695 Cambridge City 4 43703 Campbell City 7 43711 Canton City 8

Topologies

 2013 Typology Code Major Grouping 0 Island 1 Rural 2 Rural 3 Small Town 4 Small Town 5 Suburban 6 Suburban 7 Urban 8 Urban

I built these measures (in DISTRICT):

```Overall Score District Mean = AVERAGE(DISTRICT[OVERALL SCORE])

Overall Score District StdDev = STDEV.P(DISTRICT[OVERALL SCORE])```

Here are the relationships

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...
1 ACCEPTED SOLUTION
Super User

Thanks for the suggestion @v-haibl-msft. I actually solved this over the weekend a different way but I'll see if I can find the time to go back and try that method. What I ended up doing was merging my three tables into a single table and then wrapping the calculation of the average in a CALCULATE statement with an ALLEXCEPT clause that included the two categories that I wanted to factor into the calculation dynamically. Worked as expected once the data was all in a single table. You can actually see the end result here in the Data Stories Gallery:

http://community.powerbi.com/t5/Data-Stories-Gallery/Ohio-Primary-and-Secondary-Education-Performanc...

I just published it last night. Thanks again @KGrice for taking a look at this also.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...
6 REPLIES 6
Frequent Visitor

Hi @KGrice,

I have almost the same problem as Greg but I've had to calculate my StdDev via the below using Summarize and Calculatetable. My data is a tranactional table, standard deviation is required at the monthly level with filtering to two seperate hierarchies built on location. I need the standard deviation (based on month) to dynmaically reflect the filters.

VAR _table = SUMMARIZE(CALCULATETABLE(Fct_Incident,Tbl_Calendar[CurMonthOffset]>=-26,Tbl_Calendar[CurMonthOffset] < -1), Fct_Incident[Start of Month],"_CountID",distinctcount(Fct_Incident[Incident ID]))
VAR _STDEV = STDEVX.P(_table,[_CountID])

My problem is the same as Greg's in that I want to Chart the results as a Statistical Process Chart (where I can control the "baseline" - this is problem I'm having with the availble add in apps) but I can't get the StdDev to display on each (month) row. Unfortunately the same solution that you provided for Greg doesn't work for me because I'm using a calculated/summarized table.

Do you have any thoughts on how to do this? Or should I be using a different way to group by Fct_Incident[Start of Month].

Would appreciate any assistance.

EDIT: Solved my own issue
STDEVX.P(CALCULATETABLE(SUMMARIZE(Fct_Incident,Fct_Incident[Start of Month],"Count_ID",DISTINCTCOUNT(Fct_Incident[Incident ID])), ALL(Fct_Incident[Start of Month])),[Count_ID])
Solution Sage

Hi @Greg_Deckler. I think this works, but I'm not sure if you're wanting to show your rural and not rural scores in the same measure depending on where the shool is, or if they should be separate. Here's what I have now based on the sample data:

And here are the measures in the order they appear:

```Overall Score District Mean = AVERAGE(DISTRICT[OVERALL SCORE])

Overall Score District StdDev = STDEV.P(DISTRICT[OVERALL SCORE])

Mean All Districts = CALCULATE(	[Overall Score District Mean], 	ALL(District[DISTRICT NAME])	)

StdDev All Districts = CALCULATE(	[Overall Score District StdDev], 	ALL(District[DISTRICT NAME])	)

Overall Score StdDevs from Mean = 	([Overall Score District Mean] - [Mean All Districts]) / [StdDev All Districts]

Mean All Districts Rural = CALCULATE(	[Mean All Districts], 	Topologies[ Major Grouping]="Rural"	)

StdDev All Districts Rural = CALCULATE(	[StdDev All Districts], 	Topologies[ Major Grouping]="Rural"	)

Overall Score StdDevs from Mean Rural = IF(	HASONEVALUE(Topologies[ Major Grouping]), 	IF(		VALUES(Topologies[ Major Grouping])="Rural", 		([Overall Score District Mean] - [Mean All Districts Rural]) / [StdDev All Districts Rural], 		BLANK()		), 	([Overall Score District Mean Rural] - [Mean All Districts Rural]) / [StdDev All Districts Rural]	)```

Frequent Visitor

Hi @KGrice,

I have almost the same problem as Greg but I've had to calculate my StdDev via the below using Summarize and Calculatetable. My data is a tranactional table, standard deviation is required at the monthly level with filtering to two seperate hierarchies built on location. I need the standard deviation (based on month) to dynmaically reflect the filters.

VAR _table = SUMMARIZE(CALCULATETABLE(Fct_Incident,Tbl_Calendar[CurMonthOffset]>=-26,Tbl_Calendar[CurMonthOffset] < -1), Fct_Incident[Start of Month],"_CountID",distinctcount(Fct_Incident[Incident ID]))
VAR _STDEV = STDEVX.P(_table,[_CountID])

My problem is the same as Greg's in that I want to Chart the results as a Statistical Process Chart (where I can control the "baseline" - this is problem I'm having with the availble add in apps) but I can't get the StdDev to display on each (month) row. Unfortunately the same solution that you provided for Greg doesn't work for me because I'm using a calculated/summarized table.

Do you have any thoughts on how to do this? Or should I be using a different way to group by Fct_Incident[Start of Month].

Would appreciate any assistance.
Super User

Thanks @KGrice. This definitely works and essentially what I asked for, but not what I want! 🙂

I was *hoping* that I could make this measure dynamic such that I could put a slicer on the page and when the user clicked the slicer, it calculated things correctly.

The end go is that I will probably have two categories, "rural", "urban", etc. and then a category on level of poverty "<= 10%", "11%-20%".

Sorry, I feel like I'm whining... 🙂

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...
Microsoft

@Greg_Deckler

Maybe you can try to use the ALLSELECTED function to make the measure dynamic according to your slicer.

Best Regards,

Herbert

Super User

Thanks for the suggestion @v-haibl-msft. I actually solved this over the weekend a different way but I'll see if I can find the time to go back and try that method. What I ended up doing was merging my three tables into a single table and then wrapping the calculation of the average in a CALCULATE statement with an ALLEXCEPT clause that included the two categories that I wanted to factor into the calculation dynamically. Worked as expected once the data was all in a single table. You can actually see the end result here in the Data Stories Gallery:

http://community.powerbi.com/t5/Data-Stories-Gallery/Ohio-Primary-and-Secondary-Education-Performanc...

I just published it last night. Thanks again @KGrice for taking a look at this also.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

Announcements

#### 2023 Release Wave 1 Plans

Power BI release plans for 2023 release wave 1 describes all new features releasing from April 2023 through September 2023.

#### Global Power BI Training

Make sure you register today for the Power BI Summit 2023. Don't miss all of the great sessions and speakers!

#### 2022 Monthly Feature Releases

We had a great 2022 with a ton of feature releases to help you drive a data culture.

Top Solution Authors
Top Kudoed Authors