cancel
Showing results for
Did you mean:
Highlighted
Frequent Visitor

Efficient way to calculate top % within a category

Hi, I've got a huge data set of >1.5mil rows loaded in Power BI. I'm trying to segmentise the top 2%, next 20%, remaining 78% by value within each category (A, B, C). I've been adding columns using M instead of DAX as I read that the performance is faster since the data set is quite big. I was exploring creating tables based on top N values but that might be quite tedious; also sorting from high to low and assigning % might be an option too but it'll be too taxing on the performance. Any suggestions on efficient ways to do it?

Thank you!

Resulting table should add the 4th column with tagging

 Person Category Value Tagging Person 1 A 100 Top 2% Person 2 A 80 Next 20% Person 3 A 60 Remaining 70% Person 4 A 40 Remaining 70% Person 5 A 20 Remaining 70% Person 6 A 10 Remaining 70% Person 7 B 200 ... Person 8 C 50 ... Person 9 C 50 ...

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Community Support

Hi @kucci

You may try to build calculated columns, and it works well.

I build a sample which has 1 million rows to have a test.

``Rank = RANKX(FILTER(Sheet2,Sheet2[Category]=EARLIER(Sheet2[Category])),Sheet2[Value],,DESC,Dense)``
``````Tag =
Var  _MaxRank = MAXX(FILTER(Sheet2,Sheet2[Category] = EARLIER(Sheet2[Category])),Sheet2[Rank])
return
SWITCH(TRUE(),Sheet2[Rank]<=_MaxRank*0.02,"	Top 2%",Sheet2[Rank]<=_MaxRank*0.22,"Next 20%","Remaining 70%")``````

Result:

Did you calcualte the Top N by the rank for each category?

You can download the pbix file from this link: Efficient way to calculate top % within a category

Best Regards,

Rico Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

3 REPLIES 3
Highlighted
Super User IV

@kucci , see if this percentile blog can help

I think there is a blog from Greg on cumulative bucketing. I do not have the link handy.

Proud to be a Super User!

Highlighted
Community Support

Hi @kucci

You may try to build calculated columns, and it works well.

I build a sample which has 1 million rows to have a test.

``Rank = RANKX(FILTER(Sheet2,Sheet2[Category]=EARLIER(Sheet2[Category])),Sheet2[Value],,DESC,Dense)``
``````Tag =
Var  _MaxRank = MAXX(FILTER(Sheet2,Sheet2[Category] = EARLIER(Sheet2[Category])),Sheet2[Rank])
return
SWITCH(TRUE(),Sheet2[Rank]<=_MaxRank*0.02,"	Top 2%",Sheet2[Rank]<=_MaxRank*0.22,"Next 20%","Remaining 70%")``````

Result:

Did you calcualte the Top N by the rank for each category?

You can download the pbix file from this link: Efficient way to calculate top % within a category

Best Regards,

Rico Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Highlighted
Frequent Visitor

Thanks, this worked well! I didn't realise that I had to create a calculated column instead of a measure.

Announcements

Power Platform Community Conference

Check out the on demand sessions that are available now!

Microsoft Power Platform Communities

Check out the Winners!

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Top Solution Authors
Top Kudoed Authors