Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi, I'm working on a dataset that has 2 (or more) months worth of data. We mark the outliers manually in Excel by sorting the resolution hrs and then getting 5% of the total rows per month and then marking the bottom and top 5% with "Outlier" and the middle values with "Normal". In the final measure, we want the value to exclude all those marked with "Outlier" in a monthly basis.
I need a calculated column that will do this for me in PowerBI so that when I use the slicer for month, the marking for Outlier and Normal will automatically change as well.
Do you have any suggestions? Here's the sample data set.
ID | Restored Hours | Month Restored | Outlier Flag |
C-001 | 0.01 | 2021-05 | Outlier |
C-012 | 1.128878343 | 2021-05 | Outlier |
C-015 | 1.201420498 | 2021-05 | Normal |
C-010 | 1.249110117 | 2021-05 | Normal |
C-016 | 1.300406408 | 2021-05 | Normal |
C-009 | 1.427888319 | 2021-05 | Normal |
C-011 | 1.623073433 | 2021-05 | Normal |
C-013 | 1.642982713 | 2021-05 | Normal |
C-008 | 1.708077848 | 2021-05 | Normal |
C-014 | 1.803207668 | 2021-05 | Normal |
C-006 | 1.81398035 | 2021-05 | Normal |
C-007 | 1.96616962 | 2021-05 | Normal |
C-002 | 3 | 2021-05 | Normal |
C-017 | 5.089730167 | 2021-05 | Normal |
C-019 | 5.256709885 | 2021-05 | Normal |
C-022 | 5.310029202 | 2021-05 | Normal |
C-024 | 5.315079853 | 2021-05 | Normal |
C-018 | 5.469267153 | 2021-05 | Normal |
C-023 | 5.587305329 | 2021-05 | Normal |
C-021 | 5.742316862 | 2021-05 | Normal |
C-020 | 5.993940609 | 2021-05 | Normal |
C-003 | 51 | 2021-05 | Normal |
C-005 | 212 | 2021-05 | Outlier |
C-004 | 5775 | 2021-05 | Outlier |
C-046 | 3.043073878 | 2021-06 | Outlier |
C-042 | 3.18684955 | 2021-06 | Outlier |
C-043 | 3.605238051 | 2021-06 | Normal |
C-044 | 3.685329803 | 2021-06 | Normal |
C-045 | 3.950539466 | 2021-06 | Normal |
C-038 | 4.354058226 | 2021-06 | Normal |
C-041 | 4.648859957 | 2021-06 | Normal |
C-040 | 4.667396348 | 2021-06 | Normal |
C-037 | 4.874279353 | 2021-06 | Normal |
C-039 | 4.972751575 | 2021-06 | Normal |
C-027 | 5.04243555 | 2021-06 | Normal |
C-026 | 5.384948904 | 2021-06 | Normal |
C-025 | 5.439720567 | 2021-06 | Normal |
C-028 | 5.972486935 | 2021-06 | Normal |
C-049 | 6.074714573 | 2021-06 | Normal |
C-050 | 6.471525993 | 2021-06 | Normal |
C-048 | 6.67932056 | 2021-06 | Normal |
C-047 | 6.853102493 | 2021-06 | Normal |
C-035 | 11.2375777 | 2021-06 | Normal |
C-029 | 11.33680521 | 2021-06 | Normal |
C-030 | 11.39072533 | 2021-06 | Normal |
C-036 | 11.39332949 | 2021-06 | Normal |
C-031 | 11.62865016 | 2021-06 | Normal |
C-034 | 11.66961607 | 2021-06 | Normal |
C-032 | 11.83012277 | 2021-06 | Outlier |
C-033 | 11.9201954 | 2021-06 | Outlier |
Thanks!
Solved! Go to Solution.
Hi @Van22 ,
Try to create a new column like below:
outlier_test =
var rank_asc = RANKX(FILTER('Table',EARLIER('Table'[Month Restored])='Table'[Month Restored]),'Table'[Restored Hours],,ASC,Dense)
var rank_desc = RANKX(FILTER('Table',EARLIER('Table'[Month Restored])='Table'[Month Restored]),'Table'[Restored Hours],,DESC,Dense)
var count_row = COUNTROWS(FILTER('Table',EARLIER('Table'[Month Restored])='Table'[Month Restored]))*0.05
return IF(rank_asc<=count_row||rank_desc<=count_row,"Outlier","Normal")
Best Regards,
Liang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Van22 ,
Try to create a new column like below:
outlier_test =
var rank_asc = RANKX(FILTER('Table',EARLIER('Table'[Month Restored])='Table'[Month Restored]),'Table'[Restored Hours],,ASC,Dense)
var rank_desc = RANKX(FILTER('Table',EARLIER('Table'[Month Restored])='Table'[Month Restored]),'Table'[Restored Hours],,DESC,Dense)
var count_row = COUNTROWS(FILTER('Table',EARLIER('Table'[Month Restored])='Table'[Month Restored]))*0.05
return IF(rank_asc<=count_row||rank_desc<=count_row,"Outlier","Normal")
Best Regards,
Liang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thank you for this! Worked like a charm. 🙂
Ah, essentially, on my sample, the column Outlier should be changing and recalculated to show outlier and normal depending on the month that the user will select.
Sorry, just getting back into pbi and i'm a bit lost.
"I need a calculated column that will do this for me in PowerBI so that when I use the slicer for month, the marking for Outlier and Normal will automatically change as well. "
Calculated columns do not care about user interactions or slicers. Did you mean to say you need a measure?
User | Count |
---|---|
125 | |
108 | |
99 | |
63 | |
62 |
User | Count |
---|---|
135 | |
116 | |
101 | |
71 | |
61 |