Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I am trying to clean time series data, which is impact by upstream processes whcih result in extreme errors. Intially a large outlier will flow into the series, and then this move is reversed the next day when upstream issues are resolved, resulting in an accurate two day move, but inaccurate daily changes. Any tips on cleaning it? It is easy to filter and replace large values in powerquery with a zero, but what is correct is the running total after two days, and zeroing out would result in an incorrect total.
For example, it would be something like below... how can i filter for outliers of a certain magnitude (100x), but dynamically alter the data such that 7/15 would be 0 and 7/16 would be 4mm? It would be relatively easy to implement this logic in excel, not so much in powerbi/powerquery.
7/1/2019 | 19,590.02 |
7/3/2019 | 1,561.33 |
7/5/2019 | (1,500,003.00) |
7/8/2019 | (25,000.00) |
7/9/2019 | 35,000.00 |
7/10/2019 | (2,000.00) |
7/11/2019 | 180,000.00 |
7/12/2019 | 250,000.00 |
7/15/2019 | (520,000,000,000.00) |
7/16/2019 | 520,004,000,000.00 |
7/17/2019 | 320,000.00 |
7/18/2019 | 650,000.00 |
7/19/2019 | (250,000.00) |
Hi @Anonymous ,
I am not sure what desired result would you want, could you please share your sample data and desired output screenshots for further analysis? You can also upload sample pbix to OneDrive and post the link here. Do mask sensitive data before uploading.
Please read this post to get your answer quickly: How to Get Your Question Answered Quickly.
Best Regards,
Amy
Hi Amy,
I did come up with a solution, but it isn't very flexible, and only catches one blip in the data. Basically, I filtered for outliers, zeroing out bad rows. Then I created a cummulative sum of the outliers , placing the final sum (4 million in below), into clean data column.
Date | Raw Data | Clean Data |
7/1/2019 | 19590.02 | 19590.02 |
7/3/2019 | 1561.33 | 1561.33 |
7/5/2019 | -1500003 | -1500003 |
7/8/2019 | -25000 | -25000 |
7/9/2019 | 35000 | 35000 |
7/10/2019 | -2000 | -2000 |
7/11/2019 | 180000 | 180000 |
7/12/2019 | 250000 | 250000 |
7/15/2019 | -5.2E+11 | 0 |
7/16/2019 | 5.20004E+11 | 4000000 |
7/17/2019 | 320000 | 320000 |
7/18/2019 | 650000 | 650000 |
7/19/2019 | -250000 | -250000 |
How about if you force the column in datatype "Fixed decimal number" and then do the aggregation?
Step: In the Edit queries, click on the field, go to modelling tab and under data type tab choose "fixed decimal number".
yeah it is already, that's just cut and paste formatting from excel.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
112 | |
97 | |
84 | |
67 | |
60 |
User | Count |
---|---|
150 | |
120 | |
99 | |
87 | |
68 |