Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Anonymous
Not applicable

Case weighting in power BI

Hello, 

 

I have been running into a bit of a wall lately, and seems others are two, when using survey data in powerBI. My problem is that I need to weight my data by case (respondent). I have a variable in my table which is the weight each respondent carries (their sum adds up to the total number of responsdents). I need to use this to up or down weight each respondent when I plot questions so our data matches targets (such as nat rep samples). I realise this isnt very straightforward in BI and most of my responses are not numerical (ie. are yes, no, male, female etc.). 

 

Does anyone have any suggestions for the best way to go about weighting my variables in BI? Currently I am trying to follow the solution in https://community.powerbi.com/t5/Desktop/case-weighting-in-data/td-p/51599, but having very little luck getting it to work on a simple column chart showing percentages of those male and female. Using this DAX code:

 

GenderWeightedMeasure = SUMX('Demographics and Single Punch Questions', 'Demographics and Single Punch Questions'[Weight.Weight] * 'Demographics and Single Punch Questions'[DUM_GENDER_ALT]) / SUM('Demographics and Single Punch Questions'[Weight.Weight])

 

This is something thats very common in market research and very easy to implement in other packages such as SPSS and Q, so seems strange theres no option to weight by case here. 

 

Any help or advice would be very much appreciated!

1 ACCEPTED SOLUTION
stevencape
Frequent Visitor

I use the GROUPBY function to weight consumer research data in PBI, and the CALCULATE function to be able to apply filters so that  I can use the break columns to dynamically calculate the base size - usually I create a numerator and a denominator to make the DAX easier to read and edit later if needs be: 

 

Base size denom = CALCULATE(SUMX(GROUPBY([name of table], [respondent ID], "GroupName", AVERAGEX(CURRENTGROUP(), [Weight column])), GroupName), ALLEXCEPT([name of table], [whatever columns you want to use as breaks in slicers, plus the column with the question in]))

 

Numerator = SUMX([name of table], [Weight column]

 

%age of respondents = [Numerator] / [Base size denom]

 

If you then have the %age of respondents measure in the values of your chart, and the response codes in the values, this will calculate the weighted total of respondents for each question / break combo that you have in slicers (or however else you choose to filter the survey). 

 

View solution in original post

12 REPLIES 12
stevencape
Frequent Visitor

I use the GROUPBY function to weight consumer research data in PBI, and the CALCULATE function to be able to apply filters so that  I can use the break columns to dynamically calculate the base size - usually I create a numerator and a denominator to make the DAX easier to read and edit later if needs be: 

 

Base size denom = CALCULATE(SUMX(GROUPBY([name of table], [respondent ID], "GroupName", AVERAGEX(CURRENTGROUP(), [Weight column])), GroupName), ALLEXCEPT([name of table], [whatever columns you want to use as breaks in slicers, plus the column with the question in]))

 

Numerator = SUMX([name of table], [Weight column]

 

%age of respondents = [Numerator] / [Base size denom]

 

If you then have the %age of respondents measure in the values of your chart, and the response codes in the values, this will calculate the weighted total of respondents for each question / break combo that you have in slicers (or however else you choose to filter the survey). 

 

Hello @stevencape! I have just come across this solution as I am trying to visualise survey responses in Power BI but I am very new to Power BI and DAX. I have used your code for "Base size denom" but I'm getting the following error message: "MdxScript(Model) (3,138) Calculation error in measure 'Calculations'[Base_size_denom]: The function SUMX cannot work with values of type String". 

 

Most of my data is text as it is consumer research data, could you help me understand what's going wrong please?

Anonymous
Not applicable


@stevencape wrote:

I use the GROUPBY function to weight consumer research data in PBI, and the CALCULATE function to be able to apply filters so that  I can use the break columns to dynamically calculate the base size - usually I create a numerator and a denominator to make the DAX easier to read and edit later if needs be: 

 

Base size denom = CALCULATE(SUMX(GROUPBY([name of table], [respondent ID], "GroupName", AVERAGEX(CURRENTGROUP(), [Weight column])), GroupName), ALLEXCEPT([name of table], [whatever columns you want to use as breaks in slicers, plus the column with the question in]))

 

Numerator = SUMX([name of table], [Weight column]

 

%age of respondents = [Numerator] / [Base size denom]

 

If you then have the %age of respondents measure in the values of your chart, and the response codes in the values, this will calculate the weighted total of respondents for each question / break combo that you have in slicers (or however else you choose to filter the survey). 

 


Also, (sorry for asking more questions!) but what would you use as your "group name" variable? Would that be age? And then I understand putting the measure in the values of your chart, but would the response codes go in the legend? 

 

Thanks again!

Apologies - I meant "put the response codes in the axis" rather than the values. Though it would work in the legend too - as long as you're filtering in some way with the response codes.

 

And the "Group name" is any name that you specify - it's a reference created by the GROUPBY function that then becomes the expression used by the SUMX function.

 

 

Anonymous
Not applicable

Hi @stevencape ,

 

Thanks for the response, thats great and I'll try it out! Do you also have any work arounds for when you use non-numeric data, such as male/female? Or do you change these to numeric values when you weight them?

 

Thanks again for your response!

You're welcome! The only numeric data I'd expect to have from consumer research data would be the weighting, so if if I was looking at the ratio of male to female respondents in a survey I would be summing up the total weights for male and dividing that by the sum of all the weights, and then comparing that to the sum of the total weights for female and dividing that by the total weights.  

 

Here's an example of the DAX I posted earlier being used with respondent level data from a few surveys to show the weighted responses - this data has columns for the question, the response (both text), the weights (decimal) and one for each of the breaks in the slicers (text). The value field is from the DAX.

 

https://www.ofcom.org.uk/research-and-data/multi-sector-research/accessibility-research/access-and-i...

Anonymous
Not applicable

Hi @stevencape ,

 

Thanks so much for sharing that, very useful and reassuring to see it in action! I was wondering, if there are no filters applied does that mean the proportion of respondents shown in the visual is unweighted?

 

Thanks again!

Nope - if there are no filters applied, the proportion of respondents shown for each response and for each question will be the weighted total, as the all the calcs are done on the column that contains the weight value in.

 

If you wanted the unweighted total, you'd need to use counts to summarise the data.

Anonymous
Not applicable


@stevencape wrote:

Nope - if there are no filters applied, the proportion of respondents shown for each response and for each question will be the weighted total, as the all the calcs are done on the column that contains the weight value in.

 

If you wanted the unweighted total, you'd need to use counts to summarise the data.


Okay thats great thank you! I have been setting up the weighting measures in PowerBI as you described and want to check if im along the right lines if thats okay?

 

I made the base denominator 

 

Base size denom = CALCULATE(SUMX(GROUPBY('Demographics and Single Punch Questions', [uuid: Respondent identifier], "Weighted base" , AVERAGEX(CURRENTGROUP(), [Weight.Weight])), [Weighted base]), ALLEXCEPT('Demographics and Single Punch Questions', 'Demographics and Single Punch Questions'[DUM_AGEBREAK_ALT], 'Demographics and Single Punch Questions'[DUM_GENDER_ALT]))
 
then the numerator :
 
Numerator = SUMX('Demographics and Single Punch Questions', 'Demographics and Single Punch Questions'[Weight.Weight])
 
and the weighted measure :
 
Weighted respondents = [Numerator]/ [Base size denom]
 
So I can then just plug this final measure into the values for all of my visualisations? Currently I have tried making a bar graph of gender, with the final Weighted respondents measure in the values, and gender as the axis, but its giving me values of 100% for each gender. Do you know why this is the case/ if I've managed to missunderstand something?
 
Thanks again for the help, its really useful and hopefully I can stop pestering soon!
 
 
Anonymous
Not applicable

Hi @stevencape ,

 

Hopefully the final pester. I think I figured out the above question so you can just ignore it. I was just wondering, for the multi response questions such as the one I've attached.

ofcom table.PNG

 

Currently when I use the above method for this type of question (which I have in its own table, pivoted so theres a column for pid, question and response, with a row for each of the iterations per person) all my bars in the graph are 100%. Did you have a slightly different work around for this type of question?

 

Thanks again for your help, has been a life saver!

I think the 100% results might be caused by which fields you've got in the ALLEXCEPT filter. If you have the same field in the filter as in the axis, then it is applying the same calculation to both the numerator and the denominator. 

 

The CALCULATE, SUMX, GROUPBY, ALLEXCEPT combo was used to specifically deal with multicode questions, as it calculates a deduplicated base size that flexes for each break used (the columns specified in the ALLEXCEPT function) - it sounds like your data is set up right for this, though. 

Anonymous
Not applicable

Hi @stevencape ,

 

Yes your right, the penny dropped this morning and I think I've got it figured out now! Thanks so much for the help!

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors