Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi everyone,
I am trying to create a percentile table in Powerbi but having trouble doing that.
Below screenshot shows the Excel version:
From this we can see that my percentile jumps from increaing by 0.01 at 0.05 to 0.05 at 0.10.
So I created a "Enter Data" table in PowerBi and pasted this column values:
This allowed me to create a table in PowerBi with the percentile values I wanted to use but I encountered a problem after this step.
I have 2 tables,
stats table - contains Total Sale & Cost
size table - where I have size of place
These 2 tables have relationship.
Now I don't know how to use PERCENTILE.INC to create a table as per first screenshot.
I hope my description is clear, if not please let me know.
Thanks for your time.
Kind Regards,
Rahul
Solved! Go to Solution.
@McCow Thanks for explination, I have tried to follow the same steps for prod data and there seems to be issue.
Could you point out what I am doing wrong please? Here's the updated pbi file: https://1drv.ms/u/s!Aghc93erq8rBcUhH5N_y1X_MXN8
I have created 3 new tables exactly in the same structure:
size_prod, stats_prod, Persentille_prod
Thanks again for your time, patience. I am learning alot from you!
Kind Regards
Nope, sorry, not clear. What is the problem you encountered? What is the final result you are looking to achieve? Can you post sample data from your second table? How are the tables related?
Hi @Greg_Deckler,
The final result I am looking to achieve is like this:
Table 1
Date | ID | Sales | Cost |
01/01/2017 | 40505 | 114 | 357 |
02/01/2017 | 72878 | 901 | 416 |
03/01/2017 | 36370 | 463 | 3810 |
04/01/2017 | 41006 | 740 | 2805 |
05/01/2017 | 57904 | 987 | 3220 |
06/01/2017 | 45632 | 157 | 2348 |
07/01/2017 | 33843 | 856 | 2135 |
08/01/2017 | 71400 | 537 | 2159 |
09/01/2017 | 74717 | 289 | 1483 |
10/01/2017 | 77747 | 973 | 1922 |
11/01/2017 | 60239 | 830 | 2107 |
12/01/2017 | 65054 | 633 | 3405 |
13/01/2017 | 44939 | 326 | 1584 |
14/01/2017 | 71789 | 655 | 1328 |
15/01/2017 | 40704 | 558 | 3636 |
16/01/2017 | 68804 | 668 | 3165 |
17/01/2017 | 41137 | 99 | 138 |
18/01/2017 | 47474 | 759 | 516 |
19/01/2017 | 46244 | 748 | 1534 |
20/01/2017 | 26707 | 797 | 3943 |
21/01/2017 | 45512 | 427 | 1924 |
22/01/2017 | 74276 | 639 | 2995 |
23/01/2017 | 75824 | 605 | 390 |
24/01/2017 | 47363 | 495 | 3676 |
25/01/2017 | 72470 | 685 | 3021 |
26/01/2017 | 48808 | 94 | 2736 |
27/01/2017 | 32450 | 66 | 1238 |
28/01/2017 | 48837 | 213 | 921 |
29/01/2017 | 46723 | 857 | 378 |
30/01/2017 | 56503 | 776 | 2414 |
31/01/2017 | 50555 | 530 | 3044 |
Table 2
ID | Size |
40505 | 980 |
72878 | 7600 |
36370 | 420 |
41006 | 19000 |
57904 | 2100 |
45632 | 610 |
33843 | 2600 |
71400 | 420 |
23441 | 150 |
23471 | 5800 |
23478 | 4500 |
23488 | 3100 |
23554 | 800 |
23568 | 12000 |
23576 | 950 |
23586 | 2100 |
23607 | 7300 |
23662 | 90 |
23678 | 200 |
23702 | 1200 |
23715 | 5500 |
23751 | 430 |
23764 | 2900 |
23941 | 3800 |
24013 | 520 |
24224 | 620 |
24227 | 11000 |
24244 | 1800 |
24318 | 960 |
24333 | 120000 |
24334 | 370 |
24456 | 530 |
24487 | 2500 |
24498 | 410 |
Relationship is on ID columns.
Thanks for your time.
Kind Regards.
I have done some research and found out that this is the dax I need to use to calculate percentile but not sure how to adapt it for each percentile group: = PERCENTILEX.INC(<table>, <expression>;, k)
Does anyone know how I can have a dynamic value for "k"?
Thanks
Hi @micky123,
to build a column like your example, you can create DAX table (for ex. PersentilleT):
PersentilleT = UNION(GENERATESERIES(0;0,05;0,01);GENERATESERIES(0,1;0,9;0,05);GENERATESERIES(0,91;1;0,01))
Now you have one-column table with "Value" column.
And as second calculated column you can use this formula:
Column = PERCENTILEX.INC(Table1;Table1[Cost];[Value])
It's all as you need?
P.S. for dynamic k-value you need to build a measure. It' is a little complicated approach.
Best Regs
@McCow Thanks this was helpful and I think I am getting close to what I am trying to do.
I am unable to get this work
Column = PERCENTILEX.INC(Table1;Table1[Cost];[Value])
I have created a sample PowerBI file using your suggestion. URL: https://1drv.ms/u/s!Aghc93erq8rBcLygGX6qE2p7zXY
Please let me know if this simplify what I am after.
Thanks
Hi @micky123
good job, you was a pretty near.
i corrected you PBIX example and send back, LOOK HERE:
Two remarks:
1) The percentile calculation must be a column, not measure (see above)
2) The source Data type must be "Decimal" not Whole (see bellow):
And i added extra columns for better understanding you troubles. It can be leaved to better understanding, what i did. Enjoy!
If you have a questions ask back.
Best regs
@McCow Thanks alot! This is really helpful and also for explination what I was doing wrong. I am going to put this in production where I have quite a large dataset. Will let you know how it goes!
Thanks again for your help.
Kind Regards
@McCow There are couple of issues I am facing, please can you advise?
- Date selection doesn't affect anything, I would like to filter percentlie table based on different date ranges from stats[Date], do we need to modify the DAX?
- Cost percentile total is wrong cost_percentileC = PERCENTILEX.INC(stats,stats[Cost],[Value]) should be exctly the same as what we have in stats[Cost]?
- Sales percentile using this sales_percentileC = PERCENTILEX.INC(stats,stats[Sales],[Value]) gives me an error "Expressions that yield variant data-type cannot be used to define calculated columns."
Thanks for your time, I really appricate your help.
Kind Regards
Hi @micky123,
see bellow pls
@micky123 wrote:@McCow There are couple of issues I am facing, please can you advise?
- Date selection doesn't affect anything, I would like to filter percentlie table based on different date ranges from stats[Date], do we need to modify the DAX?
Yes, but it is more comlex, than our example
- Cost percentile total is wrong cost_percentileC = PERCENTILEX.INC(stats,stats[Cost],[Value]) should be exctly the same as what we have in stats[Cost]?
I maked extra page for complete table, look, summarize will be explain on picture bellow (yours [Cost] was "Don't summarise"):
- Sales percentile using this sales_percentileC = PERCENTILEX.INC(stats,stats[Sales],[Value]) gives me an error "Expressions that yield variant data-type cannot be used to define calculated columns."
observe on data type for all of yours data source, must be Decimal number, all number types can be summarized (or not):
Please check this LINK, I uploaded corrected to your request version.
Best regs.
@McCow Thanks for explination, I have tried to follow the same steps for prod data and there seems to be issue.
Could you point out what I am doing wrong please? Here's the updated pbi file: https://1drv.ms/u/s!Aghc93erq8rBcUhH5N_y1X_MXN8
I have created 3 new tables exactly in the same structure:
size_prod, stats_prod, Persentille_prod
Thanks again for your time, patience. I am learning alot from you!
Kind Regards
And @micky123
I'm not sure if you somehow can use it, but u can try experiment with this formula:
cost_percentileC2 = PERCENTILEX.INC( CALCULATETABLE( stats_prod; FILTER(stats_prod;stats_prod[_Size]>0); //from min FILTER(stats_prod;stats_prod[_Size]<100000) //to max ); // filter-table stats_prod[cost];[Value] )//end PERCENTILEX.INC
where Filter definition create low and high (min and max) data border from your data set.
Enjoy!
Hi @micky123,
as you know percenttille is a statistical measue (and the DAX functions PERCENTILEX.INC and PERCENTILEX.EXC), that shows how you data set will be exactly distributed from min to max unity value (and not the SUM). The most known is the normal distribution (Gauss).
In your case, this formula will show how often it will meet this or that "size", "cost" or "sale" value. On your example, you will see that the probability of meeting of any digit more than zero, is too small (k-value from "0" to "0.99"). And you'll meet just several digits with the max value of "5" (see [sale_percentileC]) very seldom.
And all of three columns ([size_percentile_prod], [cost_percentileC] and [sale_percentileC]) are independent. They will be calculated separately without any relationships between these values in single record (line of you data set).
The second difficulty is your relationship with [size] only. But in your case it's no matter because stat_prod has about 100K records and size_prod has only 5K. I don't understand what kind of statistic you want to calculate based on size_prod only. I created relation column [_Size] in stat_prod and suggest to make all calculations on the base of the biggest table (like stat_prod in you case).
And the task of Summarization of your columns is completly another / different task.
First version of calculation of Sum of the stats_prod table:
SumOfCost = CALCULATE(SUM(stats_prod[cost]))
The picture below shows why it is impossible to count the amount according to the distributions table:
Sorry for my English, and I hope my description provides a little more clarity to your question. If not, please ask back.
Best regs
Hi @micky123,
i don't see any big problems (except summarization options for Persentille_prod[size_percentile_prod], but it' no matter).
I'll be analyse and come back later.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
101 | |
78 | |
75 | |
49 |
User | Count |
---|---|
145 | |
108 | |
107 | |
89 | |
61 |