This is an update to a request I posted a couple of days back and the fabulous @ImkeF came back with a great solution. I now have a couple of other tricky (to me anyway) requirements that have to be factored into the final dataset.
I have time series data (recorded on a per second basis) in a Dataflow and I would like to condense it into discrete start up events using Power Query as this can then be used for ML purposes.
An actual event (or a job) will be the duration between X and Z. The start up event (i.e. the belt to get to running speed) is X to Y. The start up event does not need to consider the duration between Y and Z. Unfortunately, there aren't any status values.
Requirements (resolved per previous request by @ImkeF )
The dataset is for a conveyor belt sensor readings. Each event needs to contain:
Capture the start up time of a conveyor belt (column B)
Capture the time (MIN of column A for each event) as start time
Capture the time (Where column B stays static for 3 observations, capture the time at the 1st of the 3+ observations). This is the end time of that event
Calculate the Average Belt Driver Current (column C)
Calculate the Average Weight on the Belt (column D)
Calculate the Average Belt Driver Temperature
Calculate if the start up event was good or bad based on column B and D. In this scenario, If the start up event, is 72 and the average of the belt weight is 0, then that's a good start up. If the start up event is greater (90+) and the average of the belt weight is 0, then that's a bad start up. There are other scenarios as well. Again these are 2 examples. So in practice, I'll have to consider those.
Calculate the 3rd Quartile (75th Percentile) & 95th Percentile of Belt Driver Current (Column C)
I believe I have achieved this with the function ImkeF provided me with in the previous post
Calculate the Percentage Change of the last 5 records (from the previous record) of Belt Driver Current (Column C) and then Average them for the final dataset. This is proving tricky...
Image highlights descriptions:
2 events - 1 red box and the other, blue. The first startup event is from row 3 to 75 (highlighted in yellow). As you can see the entire event goes beyond row 75. From a start up event perspective, I have to end at value 72 (column B illustrated by the green dotted line as that value doesn't change until the end of the event. Start up events can vary up to 120 seconds. These are just 2 examples).
The second startup event is from row 5752 to 5829
The arrows indicate the final result that I've after.
The final column highlights if the start up event is good or bad.
The black rectangle highlights the 1st of the 2 new requirements (calculating 3rd quartile and 95th percentile)
Column F (highlighted in light red and blue) point to the 2nd of the 2 new requirements - calculate the percentage change and then average it
I've attached links to a pbix and excel file containing the data.
As you can see from this image, I've taken @ImkeF 's solution, extended it to include quartile and percentile (I think I have got it implemented correctly?). This missing piece is the Average percentage change.
I managed to work out the average for the last 5 records. I don't know if it is as elegant as you would have done it - and maybe you will see something wrong with it. If you do, please let me know 🙂
I had to:
Expand the "Previous" table
Add a percentage change to same column from Previous table
Reference that column, after grouping, to calculate the average percentage change for last 5 records
This image shows the M query
This is the final result. The table with the purple header is the result of the above Power Query. It matches the table with the green header, which was based on the required result from the Excel file.