Solved: StdDev from Mean by Category

Greg_Deckler · ‎09-28-2016

OK, been working with school data and can't seem to wrap my head around one particular issue so hopefully someone can help out.

Basically, what I want to do is to calculate how many standard deviations each school district is away from the average score of all school districts but have it so that the average score and the standard deviation is influenced by what category the school district falls into.

So, for example, let's say that school districts have an average score of 2 and the standard deviation is .5. For all rural schools only, the average score is 2.75 and the standard deviation is .25. If I therefore have a school that ranks a 3, the resulting standard deviations from mean would be 2 if looking at all schools ((3 - 2)/.5) and would be .5 for rural schools only ((3 - 2.75)/.25). Hopefully that makes sense. The idea here is that this metric can be used to find the outliers, those schools that do better than their peers and thus they can be studied to see what they do differently than the other schools in order to help all students do better in those other schools.

The issue is that I can't seem to get it to work the way I want. I can calculate the mean and stddev easy enough and I can create my measure easy enough but it doesn't do what I want because the visualization filters it to each school individually, which isn't helpful in this case. So, I tried measures like:

Overall Score StdDevs from Mean = (MAX([OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALL(DISTRICT))) / CALCULATE([Overall Score District StdDev],ALL(DISTRICT))

The problem with the above measure is that this score is not impacted by the school category when I select one in a slicer. The one below:

Overall Score StdDevs from Mean = (MAX([OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))) / CALCULATE([Overall Score District StdDev],ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))

This returns NaN.

And this version also remains constant regardless.

Overall Score StdDevs from Mean = (MAX(DISTRICT[OVERALL SCORE]) - CALCULATE([Overall Score District Mean],ALL(DISTRICT),ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))) / CALCULATE([Overall Score District StdDev],ALL(DISTRICT),ALLEXCEPT(TYPOLOGIES,TYPOLOGIES[ Major Grouping]))

It seems to me that there has to be an easy solution to this that I am just missing but I can't wrap my head around how to get the result I want. Perhaps I am going down the wrong path here and there is a different method to achieve what I want?

So, here is some sample data:

DISTRICT

DISTRICT IRN	DISTRICT NAME	OVERALL SCORE
442	Manchester Local	2.75
43489	Akron City	1.31
43497	Alliance City	1.63
43505	Ashland City	2.63
43513	Ashtabula Area City	1.94
43521	Athens City	2.69
43539	Barberton City	1.69
43547	Bay Village City	4.19
43554	Beachwood City	3.63
43562	Bedford City	1.69
43570	Bellaire Local	1.69
43588	Bellefontaine City	3.25
43596	Bellevue City	2.94
43604	Belpre City	2.19
43612	Berea City	1.94
43620	Bexley City	2.88
43638	Bowling Green City	3.25
43646	Brecksville-Broadview Heights City	4.19
43653	Brooklyn City	2.69
43661	Brunswick City	3.5
43679	Bryan City	2.94
43687	Bucyrus City	1.81
43695	Cambridge City	2.75
43703	Campbell City	1.75
43711	Canton City	1.56

TYPOLOGY BY DISTRICT

IRN	District Name	2013 Typology
442	Manchester Local (Adams)	1
43489	Akron City	8
43497	Alliance City	7
43505	Ashland City	4
43513	Ashtabula Area City	4
43521	Athens City	5
43539	Barberton City	7
43547	Bay Village City	6
43554	Beachwood City	6
43562	Bedford City	7
43570	Bellaire Local	4
43588	Bellefontaine City	4
43596	Bellevue City	1
43604	Belpre City	4
43612	Berea City	7
43620	Bexley City	6
43638	Bowling Green City	5
43646	Brecksville-Broadview Heights City	6
43653	Brooklyn City	7
43661	Brunswick City	5
43679	Bryan City	4
43687	Bucyrus City	4
43695	Cambridge City	4
43703	Campbell City	7
43711	Canton City	8

Topologies

2013 Typology Code	Major Grouping
0	Island
1	Rural
2	Rural
3	Small Town
4	Small Town
5	Suburban
6	Suburban
7	Urban
8	Urban

I built these measures (in DISTRICT):

Overall Score District Mean = AVERAGE(DISTRICT[OVERALL SCORE])

Overall Score District StdDev = STDEV.P(DISTRICT[OVERALL SCORE])

Here are the relationships

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Greg_Deckler · ‎10-03-2016

Thanks for the suggestion @v-haibl-msft. I actually solved this over the weekend a different way but I'll see if I can find the time to go back and try that method. What I ended up doing was merging my three tables into a single table and then wrapping the calculation of the average in a CALCULATE statement with an ALLEXCEPT clause that included the two categories that I wanted to factor into the calculation dynamically. Worked as expected once the data was all in a single table. You can actually see the end result here in the Data Stories Gallery:

http://community.powerbi.com/t5/Data-Stories-Gallery/Ohio-Primary-and-Secondary-Education-Performanc...

I just published it last night. Thanks again @KGrice for taking a look at this also.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

Toots_N · ‎03-15-2022

Hi @KGrice,

I have almost the same problem as Greg but I've had to calculate my StdDev via the below using Summarize and Calculatetable. My data is a tranactional table, standard deviation is required at the monthly level with filtering to two seperate hierarchies built on location. I need the standard deviation (based on month) to dynmaically reflect the filters.

VAR _table = SUMMARIZE(CALCULATETABLE(Fct_Incident,Tbl_Calendar[CurMonthOffset]>=-26,Tbl_Calendar[CurMonthOffset] < -1), Fct_Incident[Start of Month],"_CountID",distinctcount(Fct_Incident[Incident ID]))

VAR _STDEV = STDEVX.P(_table,[_CountID])

My problem is the same as Greg's in that I want to Chart the results as a Statistical Process Chart (where I can control the "baseline" - this is problem I'm having with the availble add in apps) but I can't get the StdDev to display on each (month) row. Unfortunately the same solution that you provided for Greg doesn't work for me because I'm using a calculated/summarized table.

Do you have any thoughts on how to do this? Or should I be using a different way to group by Fct_Incident[Start of Month].

Would appreciate any assistance.

EDIT: Solved my own issue

STDEVX.P(CALCULATETABLE(SUMMARIZE(Fct_Incident,Fct_Incident[Start of Month],"Count_ID",DISTINCTCOUNT(Fct_Incident[Incident ID])), ALL(Fct_Incident[Start of Month])),[Count_ID])

KGrice · ‎09-28-2016

Hi @Greg_Deckler. I think this works, but I'm not sure if you're wanting to show your rural and not rural scores in the same measure depending on where the shool is, or if they should be separate. Here's what I have now based on the sample data:

And here are the measures in the order they appear:

Overall Score District Mean = AVERAGE(DISTRICT[OVERALL SCORE])


Overall Score District StdDev = STDEV.P(DISTRICT[OVERALL SCORE])


Mean All Districts = CALCULATE(
	[Overall Score District Mean], 
	ALL(District[DISTRICT NAME])
	)


StdDev All Districts = CALCULATE(
	[Overall Score District StdDev], 
	ALL(District[DISTRICT NAME])
	)


Overall Score StdDevs from Mean = 
	([Overall Score District Mean] - [Mean All Districts]) / [StdDev All Districts]


Mean All Districts Rural = CALCULATE(
	[Mean All Districts], 
	Topologies[ Major Grouping]="Rural"
	)


StdDev All Districts Rural = CALCULATE(
	[StdDev All Districts], 
	Topologies[ Major Grouping]="Rural"
	)


Overall Score StdDevs from Mean Rural = IF(
	HASONEVALUE(Topologies[ Major Grouping]), 
	IF(
		VALUES(Topologies[ Major Grouping])="Rural", 
		([Overall Score District Mean] - [Mean All Districts Rural]) / [StdDev All Districts Rural], 
		BLANK()
		), 
	([Overall Score District Mean Rural] - [Mean All Districts Rural]) / [StdDev All Districts Rural]
	)

Toots_N · ‎03-14-2022

Hi @KGrice,

I have almost the same problem as Greg but I've had to calculate my StdDev via the below using Summarize and Calculatetable. My data is a tranactional table, standard deviation is required at the monthly level with filtering to two seperate hierarchies built on location. I need the standard deviation (based on month) to dynmaically reflect the filters.

VAR _table = SUMMARIZE(CALCULATETABLE(Fct_Incident,Tbl_Calendar[CurMonthOffset]>=-26,Tbl_Calendar[CurMonthOffset] < -1), Fct_Incident[Start of Month],"_CountID",distinctcount(Fct_Incident[Incident ID]))

VAR _STDEV = STDEVX.P(_table,[_CountID])

My problem is the same as Greg's in that I want to Chart the results as a Statistical Process Chart (where I can control the "baseline" - this is problem I'm having with the availble add in apps) but I can't get the StdDev to display on each (month) row. Unfortunately the same solution that you provided for Greg doesn't work for me because I'm using a calculated/summarized table.

Do you have any thoughts on how to do this? Or should I be using a different way to group by Fct_Incident[Start of Month].

Would appreciate any assistance.

Greg_Deckler · ‎09-28-2016

Thanks @KGrice. This definitely works and essentially what I asked for, but not what I want! 🙂

I was *hoping* that I could make this measure dynamic such that I could put a slicer on the page and when the user clicked the slicer, it calculated things correctly.

The end go is that I will probably have two categories, "rural", "urban", etc. and then a category on level of poverty "<= 10%", "11%-20%".

Sorry, I feel like I'm whining... 🙂

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

v-haibl-msft · ‎10-03-2016

@Greg_Deckler

Maybe you can try to use the ALLSELECTED function to make the measure dynamic according to your slicer.

Best Regards,

Herbert

Greg_Deckler · ‎10-03-2016

Thanks for the suggestion @v-haibl-msft. I actually solved this over the weekend a different way but I'll see if I can find the time to go back and try that method. What I ended up doing was merging my three tables into a single table and then wrapping the calculation of the average in a CALCULATE statement with an ALLEXCEPT clause that included the two categories that I wanted to factor into the calculation dynamically. Worked as expected once the data was all in a single table. You can actually see the end result here in the Data Stories Gallery:

http://community.powerbi.com/t5/Data-Stories-Gallery/Ohio-Primary-and-Secondary-Education-Performanc...

I just published it last night. Thanks again @KGrice for taking a look at this also.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

StdDev from Mean by Category

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly