03-25-2020 14:11 PM - last edited 03-25-2020 19:56 PM
Here is an interesting one dealing with Internet of Things (IoT) data streaming to Cosmos DB via Azure Streaming Analytics and being consumed by Power BI. The thing about IoT data is that it often contains gaps in the data due to network connectivity or other communication issues, hardware failure, etc. As this excellent article by Noah Crowley explains: https://www.influxdata.com/blog/working-with-irregular-time-series/, these data gaps can cause problems when performing aggregations such as averages.
This particular problem came to me via a LinkedIn connection and the data represents power consumption in 10 second intervals in the battery of ships. But, because it's IoT data, sometimes it is 9 seconds, sometimes it is 10 seconds, sometimes it is 11 seconds and sometimes there is no data at all! Thus to calculate the average power consumption we cannot simply do a standard average aggregation, we have a few hoops to jump through.
First, we create a column in our table to give us a numeric date/time stamp.
Column = [Date] * 1.
Then we can create a table as follows:
Interpolation = VAR __10s = 24/60/60 * 10 VAR __Table = ADDCOLUMNS( GENERATESERIES(MIN('Values'[Column]),MAX('Values'[Column]),__10s), "__Average",AVERAGEX(FILTER('Values','Values'[Column] >= [Value] && 'Values'[Column] < [Value] + __10s),[DecimalMeasurement]), "__Date",DATE(YEAR([Value]),MONTH([Value]),DAY([Value])) ) RETURN __Table
What this table is doing is generating a standard 10 second interval table between the minimum date/time value in our dataset and the maximum date/time value in our dataset. So, each row represents 10 seconds, prividing us a consistent (regular) time series. We simply need to add our __Average column that averages the rows in our original fact table that fall within the specified 10 second range.
However, our table has gaps in it if there are no values in our original fact table that fall within particular 10 second ranges. So, we can either set those to 0 or filter them out completely. Neither of those options is particularly appealing. So, let's do linear interpolation instead!
We can create this column in our Interpolation table:
__Interpolated = IF( [__Average] <> 0 && NOT(ISBLANK([__Average])), [__Average], VAR __x1 = MAXX(FILTER('Interpolation',[Value] < EARLIER([Value]) && NOT(ISBLANK([__Average]))),[Value]) VAR __x2 = MINX(FILTER('Interpolation',[Value] > EARLIER([Value]) && NOT(ISBLANK([__Average]))),[Value]) VAR __y1 = LOOKUPVALUE('Interpolation'[__Average],[Value],__x1) VAR __y2 = LOOKUPVALUE('Interpolation'[__Average],[Value],__x2) RETURN __y1 + ([Value] - __x1) * (__y2 - __y1)/(__x2 - __x1) )
We can now simply take the average of the __Interpolated column and get the correct answer.