Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi, I've got a huge data set of >1.5mil rows loaded in Power BI. I'm trying to segmentise the top 2%, next 20%, remaining 78% by value within each category (A, B, C). I've been adding columns using M instead of DAX as I read that the performance is faster since the data set is quite big. I was exploring creating tables based on top N values but that might be quite tedious; also sorting from high to low and assigning % might be an option too but it'll be too taxing on the performance. Any suggestions on efficient ways to do it?
Thank you!
Resulting table should add the 4th column with tagging
Person | Category | Value | Tagging |
Person 1 | A | 100 | Top 2% |
Person 2 | A | 80 | Next 20% |
Person 3 | A | 60 | Remaining 70% |
Person 4 | A | 40 | Remaining 70% |
Person 5 | A | 20 | Remaining 70% |
Person 6 | A | 10 | Remaining 70% |
Person 7 | B | 200 | ... |
Person 8 | C | 50 | ... |
Person 9 | C | 50 | ... |
Solved! Go to Solution.
Hi @kucci
You may try to build calculated columns, and it works well.
I build a sample which has 1 million rows to have a test.
Rank = RANKX(FILTER(Sheet2,Sheet2[Category]=EARLIER(Sheet2[Category])),Sheet2[Value],,DESC,Dense)
Tag =
Var _MaxRank = MAXX(FILTER(Sheet2,Sheet2[Category] = EARLIER(Sheet2[Category])),Sheet2[Rank])
return
SWITCH(TRUE(),Sheet2[Rank]<=_MaxRank*0.02," Top 2%",Sheet2[Rank]<=_MaxRank*0.22,"Next 20%","Remaining 70%")
Result:
If this reply still could't help you solve your problem, could you tell me your calculate logic to calculate TopN?
Did you calcualte the Top N by the rank for each category?
You can download the pbix file from this link: Efficient way to calculate top % within a category
Best Regards,
Rico Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @kucci
You may try to build calculated columns, and it works well.
I build a sample which has 1 million rows to have a test.
Rank = RANKX(FILTER(Sheet2,Sheet2[Category]=EARLIER(Sheet2[Category])),Sheet2[Value],,DESC,Dense)
Tag =
Var _MaxRank = MAXX(FILTER(Sheet2,Sheet2[Category] = EARLIER(Sheet2[Category])),Sheet2[Rank])
return
SWITCH(TRUE(),Sheet2[Rank]<=_MaxRank*0.02," Top 2%",Sheet2[Rank]<=_MaxRank*0.22,"Next 20%","Remaining 70%")
Result:
If this reply still could't help you solve your problem, could you tell me your calculate logic to calculate TopN?
Did you calcualte the Top N by the rank for each category?
You can download the pbix file from this link: Efficient way to calculate top % within a category
Best Regards,
Rico Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks, this worked well! I didn't realise that I had to create a calculated column instead of a measure.
@kucci , see if this percentile blog can help
https://blog.enterprisedna.co/implementing-80-20-logic-in-your-power-bi-analysis/
https://forum.enterprisedna.co/t/testing-the-pareto-principle-80-20-rule-in-power-bi-w-dax/459
https://finance-bi.com/power-bi-pareto-analysis/
https://community.powerbi.com/t5/DAX-Commands-and-Tips/Calculate-the-sum-of-the-top-80/td-p/763156
I think there is a blog from Greg on cumulative bucketing. I do not have the link handy.
User | Count |
---|---|
125 | |
108 | |
99 | |
63 | |
62 |
User | Count |
---|---|
135 | |
116 | |
101 | |
71 | |
61 |