Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply

Applying a Percentile on a Graph

Hi all, 

 

I am trying to work with a dataset that has these columns - 

Reporting DateForm Type

ID

Actual Days
12/2/2021Submittal01E-XXX-00002467
12/2/2021Field Observation Report01C-XXX-000046
12/2/2021Field Observation Report01C-XXX-00004377
12/2/2021Field Observation Report01C-XXX-000046
12/2/2021Submittal01C-XXX-00004377
12/2/2021Submittal05BB-XXX-000018
12/2/2021RFI09AA-XXX-00001411
12/2/2021RFI09AC-XXX-00003405
12/2/2021Submittal09AS-XXX-00002392
12/2/2021Submittal15AB-XXX-00002399
12/2/2021RFI15AC-XXX-000011

 

I have created this line chart with the data - 

felixthecat_nyc_0-1642600016958.png

I am wondering how do I exclude top 5% highest 'Actual Day's' values? 

 

The measure I am using so far to attempt to filter = 

 

BOTTOM 95% Latency = 
VAR __rt = CALCULATE(Sum(Data[Actual Days]), FILTER(All(Data),Data[Actual Days] <= MAX(Data[Actual Days])))
VAR __95p = CALCULATE(SUM(Data[Actual Days]), ALL(Data)) * 0.95

RETURN
IF(__rt <= __95p, 1, 0)

 

 

I am filtering by 1 for the chart but am not sure of the results. It only classifies 4 rows as '0' out of 1,045 total rows and there are rows classified as '1' where Actual Days value is as high as the values classified as '0'. 

 

Any input on this is greatly appreciated. 

 

 

 

1 ACCEPTED SOLUTION

Hi @felixthecatx ,

According to your description, 1045 data, 95 percent, 1045 * 0.95 = 992.75, that is, take the data after 993, if according to my previous method, when more than one duplicate value, may be all selected.

like below:

vluwangmsft_0-1643077800126.png

In my opinion, if you want to reach duplication with only one random peer, you need to create auxiliary index rows to ensure that 5 percent is 1.

insert index:

vluwangmsft_1-1643078228797.png

Then adjust measure to the below:

test1 = 
VAR RANK1 =
    RANKX ( ALL ( 'Table' ), CALCULATE ( SUM ( 'Table'[value] )+0.0000001*SUM('Table'[Index]) ) )
VAR MIN1 =
    CALCULATE ( COUNTROWS ( 'Table' ), ALL ( 'Table' ) ) * 0.05
RETURN
    IF ( RANK1 <= MIN1, 1, 0 )

final:

vluwangmsft_2-1643078510903.png

 

Best Regards

Lucien

View solution in original post

6 REPLIES 6
v-luwang-msft
Community Support
Community Support

Hi @felixthecat_nyc ,

Since I don't have your original data of 1045 rows, I created a new template based on what you mentioned, as follows.

vluwangmsft_0-1643007817023.png

The sample have 1045 rows ,then I use the below measure:

test1 =
VAR RANK1 =
    RANKX ( ALL ( 'Table'[Date] ), CALCULATE ( SUM ( 'Table'[value] ) ) )
VAR MIN1 =
    CALCULATE ( COUNTROWS ( 'Table' ), ALL ( 'Table' ) ) * 0.05
RETURN
    IF ( RANK1 <= MIN1, 1, 0 )

Fianal get:(when value start from 994,then return 1)

vluwangmsft_1-1643007922426.png

 

The first step is to get the first 5 percent of the total, the second step is to sort the values, and then to get the desired data according to the worthy sort and the number of 5 percent.

In your sample ,use the below:

BOTTOM 95% Latency = 
VAR __rt = RANKX ( ALL ( 'Data' ), CALCULATE ( SUM ( 'Data'[Actual Days] ) ) )
VAR __95p = CALCULATE ( COUNTROWS ( 'Data' ), ALL ( 'Data' ) ) * 0.05

RETURN
IF(__rt <= __95p, 1, 0)

 

 

 

 

Did I answer your question? Mark my post as a solution!


Best Regards

Lucien

Hi @v-luwang-msft . Thanks for your reply. The problem is I have repeating values so this method doesn't grab the right percentage. I am now trying to do something like this (although I haven't gotten this to work yet) - 

 

I have a measure that = 95th percentile of the values column. I'm trying to use that to create a filter. I haven't had luck so far but this is essentially what I am trying = 

 

Filter = IF((Data[Days]) >= [95% Days], 1, 0)

Hi @felixthecatx ,

According to your description, 1045 data, 95 percent, 1045 * 0.95 = 992.75, that is, take the data after 993, if according to my previous method, when more than one duplicate value, may be all selected.

like below:

vluwangmsft_0-1643077800126.png

In my opinion, if you want to reach duplication with only one random peer, you need to create auxiliary index rows to ensure that 5 percent is 1.

insert index:

vluwangmsft_1-1643078228797.png

Then adjust measure to the below:

test1 = 
VAR RANK1 =
    RANKX ( ALL ( 'Table' ), CALCULATE ( SUM ( 'Table'[value] )+0.0000001*SUM('Table'[Index]) ) )
VAR MIN1 =
    CALCULATE ( COUNTROWS ( 'Table' ), ALL ( 'Table' ) ) * 0.05
RETURN
    IF ( RANK1 <= MIN1, 1, 0 )

final:

vluwangmsft_2-1643078510903.png

 

Best Regards

Lucien

This is great, thank you! 

For more context - If I have 1,045 rows in my dataset, I am looking for 993 rows to be tagged as lower 95% in terms of 'Actual Days' values. 

To expand on this issue, I have also formulated this measure = 

95% Data = 
VAR _sum =
    SUM (Data[Actual Days])
VAR _top = 
    CALCULATE(
        COUNTROWS( ADDCOLUMNS( VALUES(Data[Form ID]), "_sum", _sum)),
        REMOVEFILTERS(Data[Form ID])
    ) * 0.95
RETURN
    CALCULATE(
        CALCULATE( AVERAGEA (Data[Actual Days])),
        KEEPFILTERS( 
            TOPN(
                _top,
                ALLSELECTED( Data[Form ID]),
                CALCULATE( AVERAGEA (Data[Actual Days]) ), ASC
            )
        )
    )

This seems to produce an average of the Actual Day values per Form ID with 5% excluded, which is great however - whenever I try to use this measure with other columns (namely the Reporting Date column) it breaks and the data no longer makes sense. 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.