cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
tuomo_kareoja Frequent Visitor
Frequent Visitor

Weighted median as a measurement in DAX

I'm working with weighted data and I would like to use slicers with the weights so I'm creating custom measures for weighted values. I already got the weighted average:

 

 

weighted_mean =
SUMX ( 'data', [value] * [weight] )
    / CALCULATE (
        SUM ( 'data'[weight_age_group] ),
        FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) )
    )

 

 Now I'm having difficulties calculating a weighted median. I think I have a clue, but I have not made any progress in hours.

 

This is what I have came up this far:

 

 

weighted_median =
VAR weight_sum_half = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) ) ) / 2
VAR values = 'data'[values]
VAR ascending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] <= values ) )
VAR descending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] >= values ) )
RETURN CALCULATE ( AVERAGE ( [arvo] ), FILTER ( 'data', ascending >= weight_sum_half && descending >= weight_sum_half ) )

 

 

The idea is to calculate cumulative sum of weights in ascending and descending order by values and then to take the average of the values that meet up in the middle.

 

I first tried this idea by creating columns for each of the variables. This gave the right observation (the right row) the unweighted median of the values. So it found the right observations for the weighted median but gave the wrong value. When I tries this as written above as a measurement, Power Bi complains "A single value for column 'value' in table 'data' cannot be determined".

 

Some example data:

 

value       weight

10.2
20.5
31.5
50.3
61.5
71
null0.2
20.8
42
null2

 

The real median is 5.5 and the weighted median is 4. There are some missing values just like in the real data.

5 REPLIES 5
tuomo_kareoja Frequent Visitor
Frequent Visitor

Weighted median measurement in DAX

I'm working with weighted data and I would like to use slicers with the weights so I'm creating custom measurements for weighted values. I already got the weighted average done:

 

 

weighted_mean =
SUMX ( 'data', [value] * [weight] )
    / CALCULATE (
        SUM ( 'data'[weight_age_group] ),
        FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) )
    )

 

 

 Now I'm having difficulties with the weighted median. I think I have a clue, but I have not made any progress in hours.

 

This is what I have came up this far:

 

 

weighted_median =
VAR weight_sum_half = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) ) ) / 2
VAR values = 'data'[values]
VAR ascending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] <= values ) )
VAR descending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] >= values ) )
RETURN CALCULATE ( AVERAGE ( [arvo] ), FILTER ( 'data', ascending >= weight_sum_half && descending >= weight_sum_half ) )

 

 

The idea is to calculate cumulative sum of weights in ascending and descending order by values and then to take the average of the values that meet up in the middle.

 

I first tried this idea by creating columns for each of the variables. This gave the right observation the unweighted median of the values. When I tries this as written above as a measurement Power Bi complains "A single value for column 'value' in table 'data' cannot be determined.

 

Some example data:

 

value       weight

10.2
20.5
31.5
50.3
61.5
71
null0.2
20.8
42
null2

 

The real median is 5.5 and the weighted median is 4. There are some missing values just like in the real data.

Moderator v-sihou-msft
Moderator

Re: Weighted median measurement in DAX

@tuomo_kareoja

 

In this scenario, your weighted value is [Value]*[Weight], you can create a calculate column like:

 

Column = [Value]*[Weight]

Then just use MEDIAN() function on this calculated column. If this calculation need to be based on filtered context, you can use MEDIANX() function.

 

 

Regards,

tuomo_kareoja Frequent Visitor
Frequent Visitor

Re: Weighted median measurement in DAX

Unfortunately this problem needs a bit more complicated solution, because the weights represent observations. So a row with value 5 and weight 3 is equal to 3 observations of value 5 and weight 1. The weights are not all integers so simply adding the row times its weight to the dataset will not work (this would also make the dataset huge). 

 

Taking the median of value * weight doesn't take this account and gives a wrong answer. This can be seen clearly from an extreme example:

 

valueweightvalue * weight
11000010000
212
313

 

If we take the median by MEDIANX([value]*[weight]) we get 3 as the median. But as the weights are actually number of rows, the right answer would obviously be 1.

ACfxva Visitor
Visitor

Re: Weighted median as a measurement in DAX

I have been using the following DAX code for weighted medians. It works but is a bit slow calculating. If anyone has a more efficient formula, I would love to know it.

 

Weighted Median = (

   MINX(

       FILTER(

             VALUES([value]),

             CALCULATE(

                 SUM([weight]),

                     [value]

                         <= EARLIER([value])

                 )

                    > SUM([weight]) /2),

                     [value]

             )

             + MINX(

                 FILTER(

             VALUES([value]),

             CALCULATE(

                 SUM([weight]),

                     [value]

                         <= EARLIER([value])

                 )

                     > (SUM([weight]) - 1) /2),

                     [value]

             )) /2

mmmb1111 Occasional Visitor
Occasional Visitor

Re: Weighted median as a measurement in DAX

I came up with this formula, it seems to work but I'm a DAX newbie so I'm not at all confident that it will always work. I'd love to get feedback on the expression and if it works for others. This gives the "upper weighted median" and doesn't attempt to average upper and lower when the boundary is exactly in the middle.

 

wgtmedian:= MINX( FILTER( table, SUMX( filter( table, table[value]<= earlier( table[value])) , table[weight]) >=sum([weight])/2), table[value])