Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Cleaning bad time series data

I am trying to clean time series data, which is impact by upstream processes whcih result in extreme errors. Intially a large outlier will flow into the series, and then this move is reversed the next day when upstream issues are resolved, resulting in an accurate two day move, but inaccurate daily changes. Any tips on cleaning it? It is easy to filter and replace large values in powerquery with a zero, but what is correct is the running total after two days, and zeroing out would result in an incorrect total.

 

For example, it would be something like below... how can i filter for outliers of a certain magnitude (100x), but dynamically alter the data such that  7/15 would be 0 and 7/16 would be 4mm? It would be relatively easy to implement this logic in excel, not so much in powerbi/powerquery.

 

7/1/2019                       19,590.02
7/3/2019                           1,561.33
7/5/2019                 (1,500,003.00)
7/8/2019                       (25,000.00)
7/9/2019                         35,000.00
7/10/2019                         (2,000.00)
7/11/2019                       180,000.00
7/12/2019                       250,000.00
7/15/2019   (520,000,000,000.00)
7/16/2019     520,004,000,000.00
7/17/2019                       320,000.00
7/18/2019                       650,000.00
7/19/2019                     (250,000.00)
4 REPLIES 4
v-xicai
Community Support
Community Support

Hi @Anonymous ,

 

I am not sure what desired result would you want, could you please share your sample data and desired output screenshots for further analysis? You can also upload sample pbix to OneDrive and post the link here. Do mask sensitive data before uploading.

 

Please read this post to get your answer quickly: How to Get Your Question Answered Quickly.

 

Best Regards,

Amy

 

Anonymous
Not applicable

Hi Amy,

I did come up with a solution, but it isn't very flexible, and only catches one blip in the data. Basically, I filtered for outliers, zeroing out bad rows. Then I created a cummulative sum of the outliers , placing the final sum (4 million in below), into clean data column.

 

DateRaw DataClean Data

7/1/2019

19590.0219590.02
7/3/20191561.331561.33
7/5/2019-1500003-1500003
7/8/2019-25000-25000
7/9/20193500035000
7/10/2019-2000-2000
7/11/2019180000180000
7/12/2019250000250000
7/15/2019-5.2E+110
7/16/20195.20004E+114000000
7/17/2019320000320000
7/18/2019650000650000
7/19/2019-250000-250000
Anonymous
Not applicable

How about if you force the column in datatype "Fixed decimal number" and then do the aggregation?

 

Step: In the Edit queries, click on the field, go to modelling tab and under data type tab choose "fixed decimal number".

Anonymous
Not applicable

yeah it is already, that's just cut and paste formatting from excel.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.