Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
jechris
Frequent Visitor

Calculating median from rows with counts of entries

I am trying to calculate the median value from a table where each row is not a separate data entry, rather each row contains the count of the number of entries per bin. In the case below each bin corresponds to a certain number of bits.

 

That is... Customer "A" in month 1 had 2 instances at 20 Bits and 3 instances at 30 Bits, according to the table below.

 

CustomerMonthBitsCount
A101
A1101
A1202
A1303
A1402
A1502
A1801
A2201
A2301
A2601
A3101
A3301


(update) Plotting this data in PowerBI in a histogram for all Customers and Months:
Capture.PNG

 

I have included Customer and Month fields in the table because I need to be able to do this calculation accross any number of Customers (only A is shown in the above table) and over any combination of months (say month 1 + 2 only). 

 

For certain customers these rowsets can be rather sparse.

 

What I have tried:
The example below centers around trying to calculate a cummulative sum. Filtering out all bins with a count less than that and then taking the smallest bin.... although.. it's not working for me. 

 

Median = 
SELECTCOLUMNS(
    TOPN(1,
        FILTER(
            ADDCOLUMNS(
                'table',
                "CumSum", 
                CALCULATE (
                    SUM ( 'table'[count] ),
                    FILTER (
                        ALL ( 'table'[Bits] ),
                        'table'[Bits] <= 
                            MAX ( 'table'[Bits] )
                    )
                )
            ),
            [CumSum] >= SUM('table'[Count])*0.5
        ),
        'table'[Bits],ASC        
        ),
    "test", 'table'[Bits]
)

If anyone has a correct, better or more efficient way of doing this I would love some help.
Thank you.

Expected result (update):
With no filters added the expected result is 30 bits.
For Month = 1 alone I expect 30 bits,
For Month = 2 alone I expect 30 bits,
For Month = 3 alone I would ideally expect the arithmetic mean of the 2 midlemost items therefore 20 bits.

1 ACCEPTED SOLUTION

@jechris

 

OK, I have misunderstood the meaning of column Count. To achieve your requirement, you should do some trick to expand all the rows first, then calculate the median value.

 

To expand all the rows, we should use power query script. Go to Edit Queries -> Add Column -> Custom Column. We can use List.Generate() function to expand all the rows. The expression is like:

 

 

List.Generate(()=>[Count], each _ > 0, each _ - 1)

 

1111.PNG

 

Then choose expand to new rows. Close & Apply.

 

 

2222.PNG

 

Then you can simply use the expression I posted to calculate the median value.

 

For measure median value: 

 

Measure Median =
CALCULATE (
    MEDIAN ( Table1[Bits] ),
    FILTER ( ALLSELECTED ( Table1 ), Table1[Customer] = MAX ( Table1[Customer] ) )
)

 

For calculated column median value:

 

Col Median =
CALCULATE (
    MEDIANX ( Table1, Table1[Bits] * 1.0 ),
    FILTER ( Table1, Table1[Customer] = EARLIER ( Table1[Customer] ) )
)

 3333.PNG

 

 

Thanks,
Xi Jin.

 

 

View solution in original post

8 REPLIES 8
v-xjiin-msft
Solution Sage
Solution Sage

@jechris

 

In Your scenario, you want to calculate the median value of Bits for each Customer in each month. Right?

 

Since you have shared the sample data. What’s your desired result?

 

And if you want to calculate the median values. You can try Median() function. Please refer to following expression:

 

Measure = CALCULATE(MEDIAN('Table'[Bits]),FILTER(ALLSELECTED('Table'),'Table'[Customer]=MAX('Table'[Customer]) && 'Table'[Month]=MAX('Table'[Month]))) 

1.PNG

Thanks,
Xi Jin.

Hi @v-xjiin-msft
Thank you for your answer. I have added a bit more context to my question to make it clearer.

I am expecting to get 30 Bits as a solution to the problem for all customers and all months. 
Your solution gives me 20 Bits.
Capture2.PNG

@jechris

 

Then try this:

 

Measure = CALCULATE(MEDIAN('Table'[Bits]),FILTER(ALLSELECTED('Table'),'Table'[Customer]=MAX('Table'[Customer])))

Thanks,
Xi Jin.

 

Hi @v-xjiin-msft

 

Thank you for your updated formula.

 

It seems I got the right answer for the original case. I tried with some other data sets and I get some inconsistent results.

To the original data set i add:

CustomerMonthBitsCount
A49020


The results I get are:

 

Capture6.PNG

In this case I expected the card with the Measure "Measure" to be 90 rather than 30.

 

Using this definition:

Measure = CALCULATE(MEDIAN('Table'[Bits]),FILTER(ALLSELECTED('Table'),'Table'[Customer]=MAX('Table'[Customer])))

@jechris

 

Do you want to calculate the median value of your data set? If so, how to return the median value as 90? It is the max value. Right?

 

Thanks,
Xi Jin.

Hi @v-xjiin-msft

 

Thank you for your reply.

In this data set with 20 counts in the 90 Bits bin the maximum is the same as the median.
In any data set where more than half the entries are the same they will be the median (and also the mode).

 

Remember that the basic problem I am having here is that my data set is not an entry per row... rather the data is compressed so each row will tell me how many of each entry value I have.

In this particular case I have:
1 entry at 0 Bits in Month 1 for Customer A, and
1 entry at 10 Bits in Month 1 for Customer A, and
2 entry at 20 Bits in Month 1 for Customer A, and
....
1 entry at 10 Bits in Month 3 for Custoemr A, and
1 entry at 30 Bits in Month 3 for Customer A, and
20 entries at 90 Bits in Month 4 for Customer A (the entries I added in my previous post).



....

@jechris

 

OK, I have misunderstood the meaning of column Count. To achieve your requirement, you should do some trick to expand all the rows first, then calculate the median value.

 

To expand all the rows, we should use power query script. Go to Edit Queries -> Add Column -> Custom Column. We can use List.Generate() function to expand all the rows. The expression is like:

 

 

List.Generate(()=>[Count], each _ > 0, each _ - 1)

 

1111.PNG

 

Then choose expand to new rows. Close & Apply.

 

 

2222.PNG

 

Then you can simply use the expression I posted to calculate the median value.

 

For measure median value: 

 

Measure Median =
CALCULATE (
    MEDIAN ( Table1[Bits] ),
    FILTER ( ALLSELECTED ( Table1 ), Table1[Customer] = MAX ( Table1[Customer] ) )
)

 

For calculated column median value:

 

Col Median =
CALCULATE (
    MEDIANX ( Table1, Table1[Bits] * 1.0 ),
    FILTER ( Table1, Table1[Customer] = EARLIER ( Table1[Customer] ) )
)

 3333.PNG

 

 

Thanks,
Xi Jin.

 

 

Thank you @v-xjiin-msft

 

Expanding the rows was the key to solving this issue. 

I compress a lot of my data sets this way so this will be a helpful tip for other applications as well.

 

Thank you again.

 

Jens

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.