Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello,
I've been trying to figure out a way to filter out outliers when plotting the counts of a string column. For example, my table:
Table = ActivityLogs
Date | Activity |
Jan 1, 2021 | Create |
Jan 1, 2021 | View |
Jan 1, 2021 | View |
Jan 1, 2021 | Delete |
Jan 1, 2021 | Create |
Jan 2, 2021 | Create |
Jan 2, 2021 | Create |
Jan 3, 2021 | View |
Jan 9, 2021 | View |
Jan 10, 2021 | View |
etc | etc |
And this would go on for hundreds of thousands of rows, etc.
For instance, there are a few days where View might have 10,000+ rows which is a clear outlier, as typically I would expect less than 1,000 views per day. I would like to find a way to filter out those rows so when plotted on a line graph, it's not heavily skewed.
I've tried making a new column below. The idea was to find the % compared to the overall sum of Activity counts and filter out anything over 97%+, for example. However, ActivityLogs_Count is a measure I created so the SUM function didn't like that ... SUM(Count(ActivityLogs[Activity])) doesn't work either since SUM doesn't like nesting a COUNT formula.
Percentage = ActivityLogs[ActivityLogs_Count] / CALCULATE(SUM([ActivityLogs_Count]),ALLSELECTED())*100
I also tried to use STDEV.P(Count(ActivityLogs[Activity])), but it doesn't work ... STDEV.P doesn't like a nested COUNT function either.
Any one have any suggestions for handling outliers like this? Thanks
Solved! Go to Solution.
Hi @shinney ,
You may apply the measure to filter pane, set as "is less than or equal to 0.97":
Measure =
var _count=CALCULATE(COUNTROWS('Table'),FILTER(ALL('Table'),[Date]<=MAX('Table'[Date])))
var _overall=CALCULATE( COUNTROWS('Table'),ALL('Table'))
return _count / _overall
Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @shinney ,
You may apply the measure to filter pane, set as "is less than or equal to 0.97":
Measure =
var _count=CALCULATE(COUNTROWS('Table'),FILTER(ALL('Table'),[Date]<=MAX('Table'[Date])))
var _overall=CALCULATE( COUNTROWS('Table'),ALL('Table'))
return _count / _overall
Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
@shinney , refer if this blog can help
https://bielite.com/blog/scale-down-outliers-power-bi/
Thanks for the link. The log trick may work, but the main component my table lacks is that count column. I had to create a Measure for Count of Activity but it doesn't work for these calculations or stdev.
Is there any way to make a Count of Activity calculated column instead?
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
111 | |
100 | |
80 | |
64 | |
58 |
User | Count |
---|---|
146 | |
110 | |
93 | |
84 | |
67 |