Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I'm working with weighted data and I would like to use slicers with the weights so I'm creating custom measures for weighted values. I already got the weighted average:
weighted_mean = SUMX ( 'data', [value] * [weight] ) / CALCULATE ( SUM ( 'data'[weight_age_group] ), FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) ) )
Now I'm having difficulties calculating a weighted median. I think I have a clue, but I have not made any progress in hours.
This is what I have came up this far:
weighted_median =
VAR weight_sum_half = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) ) ) / 2
VAR values = 'data'[values]
VAR ascending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] <= values ) )
VAR descending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] >= values ) )
RETURN CALCULATE ( AVERAGE ( [arvo] ), FILTER ( 'data', ascending >= weight_sum_half && descending >= weight_sum_half ) )
The idea is to calculate cumulative sum of weights in ascending and descending order by values and then to take the average of the values that meet up in the middle.
I first tried this idea by creating columns for each of the variables. This gave the right observation (the right row) the unweighted median of the values. So it found the right observations for the weighted median but gave the wrong value. When I tries this as written above as a measurement, Power Bi complains "A single value for column 'value' in table 'data' cannot be determined".
Some example data:
value weight
1 | 0.2 |
2 | 0.5 |
3 | 1.5 |
5 | 0.3 |
6 | 1.5 |
7 | 1 |
null | 0.2 |
2 | 0.8 |
4 | 2 |
null | 2 |
The real median is 5.5 and the weighted median is 4. There are some missing values just like in the real data.
I have been using the following DAX code for weighted medians. It works but is a bit slow calculating. If anyone has a more efficient formula, I would love to know it.
Weighted Median = (
MINX(
FILTER(
VALUES([value]),
CALCULATE(
SUM([weight]),
[value]
<= EARLIER([value])
)
> SUM([weight]) /2),
[value]
)
+ MINX(
FILTER(
VALUES([value]),
CALCULATE(
SUM([weight]),
[value]
<= EARLIER([value])
)
> (SUM([weight]) - 1) /2),
[value]
)) /2
I came up with this formula, it seems to work but I'm a DAX newbie so I'm not at all confident that it will always work. I'd love to get feedback on the expression and if it works for others. This gives the "upper weighted median" and doesn't attempt to average upper and lower when the boundary is exactly in the middle.
wgtmedian:= MINX( FILTER( table, SUMX( filter( table, table[value]<= earlier( table[value])) , table[weight]) >=sum([weight])/2), table[value])
I'm working with weighted data and I would like to use slicers with the weights so I'm creating custom measurements for weighted values. I already got the weighted average done:
weighted_mean = SUMX ( 'data', [value] * [weight] ) / CALCULATE ( SUM ( 'data'[weight_age_group] ), FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) ) )
Now I'm having difficulties with the weighted median. I think I have a clue, but I have not made any progress in hours.
This is what I have came up this far:
weighted_median =
VAR weight_sum_half = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) ) ) / 2
VAR values = 'data'[values]
VAR ascending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] <= values ) )
VAR descending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] >= values ) )
RETURN CALCULATE ( AVERAGE ( [arvo] ), FILTER ( 'data', ascending >= weight_sum_half && descending >= weight_sum_half ) )
The idea is to calculate cumulative sum of weights in ascending and descending order by values and then to take the average of the values that meet up in the middle.
I first tried this idea by creating columns for each of the variables. This gave the right observation the unweighted median of the values. When I tries this as written above as a measurement Power Bi complains "A single value for column 'value' in table 'data' cannot be determined.
Some example data:
value weight
1 | 0.2 |
2 | 0.5 |
3 | 1.5 |
5 | 0.3 |
6 | 1.5 |
7 | 1 |
null | 0.2 |
2 | 0.8 |
4 | 2 |
null | 2 |
The real median is 5.5 and the weighted median is 4. There are some missing values just like in the real data.
Unfortunately this problem needs a bit more complicated solution, because the weights represent observations. So a row with value 5 and weight 3 is equal to 3 observations of value 5 and weight 1. The weights are not all integers so simply adding the row times its weight to the dataset will not work (this would also make the dataset huge).
Taking the median of value * weight doesn't take this account and gives a wrong answer. This can be seen clearly from an extreme example:
value | weight | value * weight |
1 | 10000 | 10000 |
2 | 1 | 2 |
3 | 1 | 3 |
If we take the median by MEDIANX([value]*[weight]) we get 3 as the median. But as the weights are actually number of rows, the right answer would obviously be 1.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
100 | |
78 | |
75 | |
52 |
User | Count |
---|---|
144 | |
109 | |
108 | |
88 | |
61 |