cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Anonymous
Not applicable

Cleaning bad time series data

I am trying to clean time series data, which is impact by upstream processes whcih result in extreme errors. Intially a large outlier will flow into the series, and then this move is reversed the next day when upstream issues are resolved, resulting in an accurate two day move, but inaccurate daily changes. Any tips on cleaning it? It is easy to filter and replace large values in powerquery with a zero, but what is correct is the running total after two days, and zeroing out would result in an incorrect total.

 

For example, it would be something like below... how can i filter for outliers of a certain magnitude (100x), but dynamically alter the data such that  7/15 would be 0 and 7/16 would be 4mm? It would be relatively easy to implement this logic in excel, not so much in powerbi/powerquery.

 

7/1/2019                       19,590.02
7/3/2019                           1,561.33
7/5/2019                 (1,500,003.00)
7/8/2019                       (25,000.00)
7/9/2019                         35,000.00
7/10/2019                         (2,000.00)
7/11/2019                       180,000.00
7/12/2019                       250,000.00
7/15/2019   (520,000,000,000.00)
7/16/2019     520,004,000,000.00
7/17/2019                       320,000.00
7/18/2019                       650,000.00
7/19/2019                     (250,000.00)
4 REPLIES 4
Highlighted
Community Support
Community Support

Re: Cleaning bad time series data

Hi @Anonymous ,

 

I am not sure what desired result would you want, could you please share your sample data and desired output screenshots for further analysis? You can also upload sample pbix to OneDrive and post the link here. Do mask sensitive data before uploading.

 

Please read this post to get your answer quickly: How to Get Your Question Answered Quickly.

 

Best Regards,

Amy

 

Highlighted
Anonymous
Not applicable

Re: Cleaning bad time series data

Hi Amy,

I did come up with a solution, but it isn't very flexible, and only catches one blip in the data. Basically, I filtered for outliers, zeroing out bad rows. Then I created a cummulative sum of the outliers , placing the final sum (4 million in below), into clean data column.

 

DateRaw DataClean Data

7/1/2019

19590.0219590.02
7/3/20191561.331561.33
7/5/2019-1500003-1500003
7/8/2019-25000-25000
7/9/20193500035000
7/10/2019-2000-2000
7/11/2019180000180000
7/12/2019250000250000
7/15/2019-5.2E+110
7/16/20195.20004E+114000000
7/17/2019320000320000
7/18/2019650000650000
7/19/2019-250000-250000
Highlighted
Frequent Visitor

Re: Cleaning bad time series data

How about if you force the column in datatype "Fixed decimal number" and then do the aggregation?

 

Step: In the Edit queries, click on the field, go to modelling tab and under data type tab choose "fixed decimal number".

Highlighted
Anonymous
Not applicable

Re: Cleaning bad time series data

yeah it is already, that's just cut and paste formatting from excel.

Helpful resources

Announcements
Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

Upcoming Events

Experience what’s next for Power BI

See the latest Power BI innovations, updates, and demos from the Microsoft Business Applications Launch Event.

secondImage

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Get Ready for Power BI Dev Camp

Get Ready for Power BI Dev Camp

Mark your calendars and join us for our next Power BI Dev Camp!.

Top Solution Authors
Top Kudoed Authors