Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
tuomo_kareoja
Frequent Visitor

Weighted median as a measurement in DAX

I'm working with weighted data and I would like to use slicers with the weights so I'm creating custom measures for weighted values. I already got the weighted average:

 

 

weighted_mean =
SUMX ( 'data', [value] * [weight] )
    / CALCULATE (
        SUM ( 'data'[weight_age_group] ),
        FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) )
    )

 

 Now I'm having difficulties calculating a weighted median. I think I have a clue, but I have not made any progress in hours.

 

This is what I have came up this far:

 

 

weighted_median =
VAR weight_sum_half = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) ) ) / 2
VAR values = 'data'[values]
VAR ascending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] <= values ) )
VAR descending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] >= values ) )
RETURN CALCULATE ( AVERAGE ( [arvo] ), FILTER ( 'data', ascending >= weight_sum_half && descending >= weight_sum_half ) )

 

 

The idea is to calculate cumulative sum of weights in ascending and descending order by values and then to take the average of the values that meet up in the middle.

 

I first tried this idea by creating columns for each of the variables. This gave the right observation (the right row) the unweighted median of the values. So it found the right observations for the weighted median but gave the wrong value. When I tries this as written above as a measurement, Power Bi complains "A single value for column 'value' in table 'data' cannot be determined".

 

Some example data:

 

value       weight

10.2
20.5
31.5
50.3
61.5
71
null0.2
20.8
42
null2

 

The real median is 5.5 and the weighted median is 4. There are some missing values just like in the real data.

5 REPLIES 5
ACfxva
Regular Visitor

I have been using the following DAX code for weighted medians. It works but is a bit slow calculating. If anyone has a more efficient formula, I would love to know it.

 

Weighted Median = (

   MINX(

       FILTER(

             VALUES([value]),

             CALCULATE(

                 SUM([weight]),

                     [value]

                         <= EARLIER([value])

                 )

                    > SUM([weight]) /2),

                     [value]

             )

             + MINX(

                 FILTER(

             VALUES([value]),

             CALCULATE(

                 SUM([weight]),

                     [value]

                         <= EARLIER([value])

                 )

                     > (SUM([weight]) - 1) /2),

                     [value]

             )) /2

I came up with this formula, it seems to work but I'm a DAX newbie so I'm not at all confident that it will always work. I'd love to get feedback on the expression and if it works for others. This gives the "upper weighted median" and doesn't attempt to average upper and lower when the boundary is exactly in the middle.

 

wgtmedian:= MINX( FILTER( table, SUMX( filter( table, table[value]<= earlier( table[value])) , table[weight]) >=sum([weight])/2), table[value])

 

tuomo_kareoja
Frequent Visitor

I'm working with weighted data and I would like to use slicers with the weights so I'm creating custom measurements for weighted values. I already got the weighted average done:

 

 

weighted_mean =
SUMX ( 'data', [value] * [weight] )
    / CALCULATE (
        SUM ( 'data'[weight_age_group] ),
        FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) )
    )

 

 

 Now I'm having difficulties with the weighted median. I think I have a clue, but I have not made any progress in hours.

 

This is what I have came up this far:

 

 

weighted_median =
VAR weight_sum_half = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', NOT ( ISBLANK ( [value] ) ) ) ) / 2
VAR values = 'data'[values]
VAR ascending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] <= values ) )
VAR descending = CALCULATE ( SUM ( 'data'[weight] ), FILTER ( 'data', 'data'[value] >= values ) )
RETURN CALCULATE ( AVERAGE ( [arvo] ), FILTER ( 'data', ascending >= weight_sum_half && descending >= weight_sum_half ) )

 

 

The idea is to calculate cumulative sum of weights in ascending and descending order by values and then to take the average of the values that meet up in the middle.

 

I first tried this idea by creating columns for each of the variables. This gave the right observation the unweighted median of the values. When I tries this as written above as a measurement Power Bi complains "A single value for column 'value' in table 'data' cannot be determined.

 

Some example data:

 

value       weight

10.2
20.5
31.5
50.3
61.5
71
null0.2
20.8
42
null2

 

The real median is 5.5 and the weighted median is 4. There are some missing values just like in the real data.

@tuomo_kareoja

 

In this scenario, your weighted value is [Value]*[Weight], you can create a calculate column like:

 

Column = [Value]*[Weight]

Then just use MEDIAN() function on this calculated column. If this calculation need to be based on filtered context, you can use MEDIANX() function.

 

 

Regards,

Unfortunately this problem needs a bit more complicated solution, because the weights represent observations. So a row with value 5 and weight 3 is equal to 3 observations of value 5 and weight 1. The weights are not all integers so simply adding the row times its weight to the dataset will not work (this would also make the dataset huge). 

 

Taking the median of value * weight doesn't take this account and gives a wrong answer. This can be seen clearly from an extreme example:

 

valueweightvalue * weight
11000010000
212
313

 

If we take the median by MEDIANX([value]*[weight]) we get 3 as the median. But as the weights are actually number of rows, the right answer would obviously be 1.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.