Mean Time Between Failure (MTBF) and Power BI

Greg_Deckler · ‎01-16-2018

Introduction

Mean Time Between Failure (MTBF) is a common term and concept used in equipment and plant maintenance contexts. In addition, MTBF is an important consideration in the development of products. MTBF, along with other maintenance, repair and reliability information, can be extremely valuable to organizations to help identify problematic systems, predict system outages, improve product designs and improve overall operational efficiency and effectiveness. As such, it would be beneficial to understand how to implement this metric within Power BI.

Background

MTBF is essentially the arithmetic mean time between failures of a system. Now, the definition of a failure will depend greatly on the circumstance in which this metric is applied but in general, failures that do not take the system out of service are generally not considered failures. In addition, routine maintenance tasks that are scheduled are also generally not considered failures.

The below image from this Wikipedia Page probably does about as good a job as anything in depicting MTBF.

In formal math language, MTBF is defined as:

Let's Build It!

OK, enough history and background, let's build this puppy!

Step 1: Prepare Your Data

So, I generated some fake data that you can download here. The data has a little over 4,000 rows and represents 3 years of repair data for about a dozen "machines". This data is pretty bare bones but has the essential elements that we will need to demonstrate the technique. Each row represents a single repair. Here is what the data looks like:

So, we essentially have an identifier for the machine, when the repair started, when the repair ended, the type of repair (a repair or preventative maintenance (PM)) and the cause of the repair.

Step 2: Load into Power BI and Create Calculated Fields

This next step is easy, just fire up Power BI Desktop and from the Home tab of the ribbon, choose Get Data | Excel. Point to the data file you just downloaded (above) and choose either the Sheet (MTBF) or the Table (MTBF1) displayed in the navigation and choose Load.

Next, we are going to create two calculated columns. The first one is how long the repair took to complete. The formula for this is:

Repair Hours = DATEDIFF(Repairs[RepairStarted], Repairs[RepairCompleted],SECOND)/3600

Once you create this column, on the Modeling tab of the ribbon, make sure that the Data type is set to Decimal Number and that you specify at least 2 decimal places.

Obviously, all we are doing is subtracting the time the repair started from the time the repair completed so that we end up with how long the repair took to complete. Perhaps less obvious is the reason we specified SECOND instead of HOUR in our DATEDIFF function even though we want this metric in hours and thus the division by 3600 seconds/hour. The reason is that if we had specified HOUR, then DATEDIFF would have truncated the calculation to whole hours and we would have lost fractions of an hour.

The second calculated column is a bit more complex:

Uptime =

VAR next = MINX(FILTER(Repairs,

Repairs[MachineName]=EARLIER(Repairs[MachineName]) &&

Repairs[RepairStarted]>EARLIER(Repairs[RepairStarted]) &&

Repairs[RepairType]<>"PM"

),Repairs[RepairStarted])

RETURN IF([RepairType]="PM", 0,IF(ISBLANK(next),

DATEDIFF([RepairCompleted],NOW(),SECOND),

DATEDIFF([RepairCompleted],next,SECOND)

)

OK, let me explain what is going on here. Remember back to our definition of MTBF, we need to find the uptime between a machine going into an "up" state and that same machine going into a "down" state. But our data isn't structured that way, the data is focused on collecting repair information.

Thus, the first part of the formula, the VAR portion is the calculation of the variable next. This is about finding the next repair after the current repair. We do this by finding the MIN of the RepairStarted column after filtering our Repair table for machines that are equal to the current machine in the row as well as having a RepairStarted that is after the current repair in the row and we are excluding "PM" rows as those are preventative maintenance and not an actual failure.

The second part of the formula, the RETURN portion is really three pieces. If the current row is a maintenance task (PM), then we return 0 for uptime as we do not want those tasks included in our MTBF calculation. If our calculation for next is BLANK, then we know that this is the most recent failure in our data set. So, we return the difference in seconds between NOW and the completion of the current failure. Finally, if neither of those are the case, then we simply calculate the difference in seconds between when our next repair started and the current repair ended.

Step 3: Create some Measures

OK, now that we have our data loaded and have created some extra calculated columns, we can now create the measures that we will use in our report. Create the following measures:

Repairs = CALCULATE(COUNTROWS(Repairs),FILTER(Repairs,[RepairType]<>"PM"))
MTBF (Hours) = DIVIDE(SUM(Repairs[Uptime]),[Repairs],BLANK())/3600
MDT (Hours) = SUM(Repairs[Repair Hours])/COUNTROWS(Repairs)
Last Repair = MAX([RepairCompleted])
Next Expected Repair = [Last Repair] + [MTBF (Hours)]/24

Repairs: This is the total number of repairs that have occurred excluding preventative maintenance (PM) repairs.

MTBF: This is our MTBF calculation that takes the sum of our uptime in seconds, divides by the number of repairs to get an average and then converts the number to hours by dividing by 3600 seconds/hour.

MDT: This is a measure that is often associated with MTBF, MDT is Mean Down Time or the average amount of time that a repair takes to complete. Therefore, we take the sum of our Repair Hours (which does not exclude preventative maintenance tasks) and divides it by the count of all repairs. Note, we do not use the Repairs measure in this case since we want to include PM repairs as well.

Last Repair: This is simply the date/time of our last repair

Next Expected Repair: By knowing when our last repair occurred, we can add our MTBF calculation converted to days (/24) to determine when we expect our next failure to occur.

Step 4: Build the Report

Now that we have all our measures created, it is mere child's play to create a report similar to the following:

I have left everything default so that the visuals themselves show what columns and measures are being displayed. Even this highly simplified report shows valuable information about each of our machines, the types of failures they experience and because we used measures it is all interactive. For example, if we click on "Worn Component" in the middle bar chart:

We can see that failures caused by worn components occur much less often than the average (a MTBF of 300 versus 199 for all failure types) and they take about half the time to complete (MDT of 1.23 versus 2.19 for all failure types).

Conclusion

Other than one potentially complex DAX calculated column, it is extremely easy and straight-forward to use Power BI to create reports that summarize repair information, including MTBF. And, remember that the definition of "failure" can be subjective and can be creatively applied to other subject domains. This example here really is just based on having a start time and an end time for "something". So, could this be applied to something like HR data and start dates and termination dates? Sure, consider each termination and "failure" and MTBF could shed light on the average time it takes to backfill a position perhaps. The possibilities are endless so start putting this technique into use in your own situations in Power BI!

Greg_Deckler · ‎06-01-2021

@Anonymous I actually have an improved version of this in my book, DAX Cookbook. You can download the PBIX for it here: https://github.com/gdeckler. It is Chapter 9, Recipe 2.

Anonymous · ‎06-02-2021

@Greg_Deckler Thank you so much, but chapter 9, recipe 2 is a sales report. Can you double-check it?

Greg_Deckler · ‎06-02-2021

@Anonymous I double checked. You want this file:

DAXCookbook/Chapter09.pbix at master · gdeckler/DAXCookbook (github.com)

And Recipe 02 page

johnyounie · ‎06-03-2021

Hi @Greg_Decler

Really interesting post, I was a Data Scientist in the RAF and we wrote spares modelling and monte carlo simulations in FORTRAN to monitor the failures of expensive systems (aircraft, ships, vehicles) and to simulate usage of those systems.

We ended up using proprietary software written by a company name TFDG (www.tfdg.com) in the US (full disclosure, I contracted to them once I left the RAF as a programmer) and they have software that will carry these calculations out for systems called VMetric, Edcas and MAAP (and others).

Also, for balance purposes, there is a Swedish Company called Systecon that produce similar software that we also used called Opus which was a powerful Spares Modeling software package.

There is a book by Dr Craig Sherbrooke (of RAND and VariMetric fame Craig C. Sherbrooke | RAND and Craig C Sherbrooke - Home (acm.org)) which is called Optimal Inventory Modeling of Systems ISBN: 0-471-55838-9 Publisher John Wiley and Sons which describes the techniques and the algorithms to produce such a model. I include it here in case any the people who read your post are interested. Craig is an amazing guy and I had the good fortune to meet him a few times, twice on his training courses, he also helped me with my dissertation for my BSc.

Both use the Poisson Distribution (Wikipedia: Poisson expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.) and are classed as Multi-echelon and multi-system.

Hope someone finds this useful and interesting, I could go on but I know when to stop (it is when I hear snoring :-))!
If anyone wants to link up you can find me on

John Younie (BSc(Hons), MCSE, MCAD, MCT, MOS Master) | LinkedIn

Cheers

John

josh123 · ‎12-01-2021

is there any way to see the MTBF in a colum? example

Start time End time MTBF

Machine 1 1/1/20 7:00pm 1/1/20 7:25pm 0

Machine 1 1/1/20 8:00pm 1/1/20 8:28pm Start time-previous end time

Machine 1 1/1/20 9:00pm 1/1/20 9:45pm Start time-previous end time

This is more what i am trying to duplicate. easy in excel but cant figure out how to pull it off in BI.

Greg_Deckler · ‎12-03-2021

@josh123 Yes, it's the same pattern for a column.
The basic pattern is:
Column =
VAR __Current = [Value]
VAR __PreviousDate = MAXX(FILTER('Table','Table'[Date] < EARLIER('Table'[Date])),[Date])
VAR __Previous = MAXX(FILTER('Table',[Date]=__PreviousDate),[Value])
RETURN
__Current - __Previous

dwpayne24 · ‎01-25-2023

@Greg_Deckler I was able to get this to work for the most part but am getting negative values. Would you be able to help resolve this?

Greg_Deckler · ‎01-25-2023

@dwpayne24 Perhaps reverse the parameters in your DATEDIFF function(s)? Hard to say exactly.

vavasudevan · ‎04-19-2023

Hi Greg,

Is there a way we can calculate the availability % similar to https://effectivedashboards.com/course/maintenance-kpi-availability-and-technical-availability-in-po...

Greg_Deckler · ‎04-19-2023

@vavasudevan In theory I imagine something like this:

% Availability Measure =
  VAR __Start = DATE(2010,1,1)
  VAR __End = DATE(2010,3,31)
  VAR __ExpectedUptimeHours = DATEDIFF(__Start, __End, SECOND)/3600
  VAR __DownTimeHours = SUM('Repairs'[Repair Hours])
  VAR __Result = DIVIDE( ( __ExpectedUptimeHours - __DownTimeHours ), __ExpectedUptimeHours )
RETURN
  __Result

vavasudevan · ‎04-19-2023

@Greg_Deckler
I'm relatively new to DAX and after lot of digging I came across your post which was very helpful. Huge thanks to this!

I'm trying to replicate the same MTBF logic with PowerBI Refresh History and I'm stuck on the DAX part in calculating downtime.

Downtime in my case = Start time of first failure TO End time to next successful completion

Ex:

For Report 1: This report failed on 4/10/2023 3:00 to 4/10/2023 4:25 (1hr 25 mins)

For Report 2: This report failed on 4/7/2023 7:09 and failed again, and next successful completion was

4/8/2023 8:29 . Therefore report was down for 25.33 hrs or 1520 minutes.

Any help on how this can downtime be achieved? (I can see I need to use the EARLIER functon in combination with Refersh Status and Type, but I need some hand holding )

I want to calculate the same KPI's -

MTBF = Actual Running Time/# Failures

MDT = Downtime/#Failures

Availability% = (Actual Running time - Scheduled Downtime)/Actual Running Time

Uptime = Total hours in a day (24) - Downtime.

Report Name	Refresh Start	Refresh status	Refresh End	Refresh type
Report 1	4/9/2023 3:00	Completed	4/9/2023 3:15	Scheduled
Report 1	4/9/2023 14:00	Completed	4/9/2023 14:23	Scheduled
Report 1	4/10/2023 3:00	Failed	4/10/2023 3:00	Scheduled
Report 1	4/10/2023 4:14	Completed	4/10/2023 4:25	Scheduled
Report 1	4/10/2023 14:00	Completed	4/10/2023 14:16	Scheduled
Report 1	4/14/2023 3:00	Completed	4/14/2023 3:17	Scheduled
Report 2	3/19/2023 7:09	Completed	3/19/2023 8:13	Scheduled
Report 2	3/20/2023 7:08	Completed	3/20/2023 8:30	Scheduled
Report 2	3/21/2023 7:09	Failed	3/21/2023 8:36	Scheduled
Report 2	3/22/2023 7:08	Completed	3/22/2023 8:28	Scheduled
Report 2	3/23/2023 7:08	Completed	3/23/2023 7:58	Scheduled
Report 2	3/24/2023 7:08	Completed	3/24/2023 8:49	Scheduled
Report 2	3/25/2023 7:08	Failed	3/25/2023 8:07	Scheduled
Report 2	3/26/2023 7:08	Completed	3/26/2023 8:11	Scheduled
Report 2	3/27/2023 7:08	Failed	3/27/2023 8:35	Scheduled
Report 2	3/28/2023 7:08	Completed	3/28/2023 8:58	Scheduled
Report 2	3/29/2023 7:08	Completed	3/29/2023 7:56	Scheduled
Report 2	3/30/2023 7:08	Completed	3/30/2023 9:30	Scheduled
Report 2	3/31/2023 7:08	Completed	3/31/2023 8:34	Scheduled
Report 2	4/1/2023 7:08	Completed	4/1/2023 8:47	Scheduled
Report 2	4/2/2023 7:08	Completed	4/2/2023 7:50	Scheduled
Report 2	4/3/2023 7:08	Completed	4/3/2023 11:26	Scheduled
Report 2	4/4/2023 7:08	Failed	4/4/2023 13:01	Scheduled
Report 2	4/5/2023 6:15	Completed	4/5/2023 7:13	Scheduled
Report 2	4/5/2023 7:08	Completed	4/5/2023 8:18	Scheduled
Report 2	4/6/2023 7:08	Completed	4/6/2023 11:04	Scheduled
Report 2	4/7/2023 7:09	Failed	4/7/2023 7:40	Scheduled
Report 2	4/7/2023 20:19	Failed	4/7/2023 21:14	Scheduled
Report 2	4/8/2023 7:07	Completed	4/8/2023 8:29	Scheduled
Report 2	4/9/2023 7:07	Failed	4/9/2023 7:40	Scheduled
Report 2	4/10/2023 7:07	Failed	4/10/2023 7:54	Scheduled
Report 2	4/11/2023 7:08	Failed	4/11/2023 8:04	Scheduled
Report 2	4/11/2023 20:29	Completed	4/11/2023 21:50	Scheduled
Report 2	4/12/2023 7:08	Failed	4/12/2023 8:09	Scheduled
Report 2	4/12/2023 15:47	Failed	4/12/2023 16:20	OnDemand
Report 2	4/12/2023 17:28	Failed	4/12/2023 17:28	OnDemand
Report 2	4/12/2023 17:32	Failed	4/12/2023 17:32	OnDemand
Report 2	4/12/2023 17:33	Failed	4/12/2023 17:33	OnDemand
Report 2	4/12/2023 17:33	Failed	4/12/2023 17:33	OnDemand
Report 2	4/12/2023 17:34	Failed	4/12/2023 17:34	OnDemand
Report 2	4/12/2023 17:47	Failed	4/12/2023 17:47	OnDemand
Report 2	4/12/2023 17:47	Failed	4/12/2023 17:47	OnDemand
Report 2	4/12/2023 18:37	Failed	4/12/2023 18:37	OnDemand
Report 2	4/12/2023 20:04	Failed	4/12/2023 20:07	Scheduled
Report 2	4/13/2023 8:03	Failed	4/13/2023 8:06	Scheduled
Report 2	4/13/2023 20:03	Disabled	4/13/2023 20:03	Scheduled

Greg_Deckler · ‎04-19-2023

@vavasudevan Bit trickier of a dataset but see this is what you are looking for. These are columns:

Downtime Minutes = 
    VAR __Report = [Report Name]
    VAR __Table = FILTER(ALL('Table'), [Report Name] = __Report)
    VAR __Result = 
        IF(
            [Refresh status] = "Completed" || [Refresh status] = "Disabled", 
            0,
                VAR __Start = [Refresh Start]
                VAR __NexCompleted = MINX(FILTER( __Table, [Refresh Start] > __Start && [Refresh status] = "Completed"), [Refresh End])
                VAR __End = IF( __NexCompleted = BLANK(), MINX(FILTER(__Table, [Refresh Start] > __Start), [Refresh End]), __NexCompleted)
                VAR __Result = DATEDIFF(__Start, __End, MINUTE)
            RETURN
                __Result
        )
RETURN
    __Result

Uptime Minutes = 
    VAR __Report = [Report Name]
    VAR __Table = FILTER(ALL('Table'), [Report Name] = __Report)
    VAR __Result = 
    IF(
        [Refresh status] = "Failed" || [Refresh status] = "Disabled",
        0,
            VAR __End = [Refresh End]
            VAR __Last = MAXX(FILTER( __Table, [Refresh End] < __End), [Refresh End])
            VAR __LastStatus = MAXX(FILTER( __Table, [Refresh End] < __End && [Refresh End] = __Last), [Refresh status])
            VAR __Result = 
                IF(
                    __Last = BLANK() || __LastStatus = "Failed" || __LastStatus = BLANK(),
                    0,
                    DATEDIFF(__Last, __End, MINUTE)
                )
        RETURN
            __Result
    )
RETURN
    __Result

These blog article replies don't allow me to attach a PBIX.

G05DVD · ‎02-15-2024

Thanks for sharing this, still as good as when first posted!

ddvls · ‎02-21-2024

Hello everyone
Thank you for this very interesting article.As my data is extracted from a software I can not add column in Excel because there is no Excel. I can only do it in Power Query but the formulas given in the article don't work....
For the average downtime I found (easily) but not for MTBF....

Does anyone know how I can do this ? Thank you for your help 🙂

Introduction

Background

Let's Build It!

Step 1: Prepare Your Data

Step 2: Load into Power BI and Create Calculated Fields

Step 3: Create some Measures

Step 4: Build the Report

Conclusion

Unveiling the Power of Lakehouse's SQL Analytics E...

How to provide sample data in the Power BI Forum

Re: Power BI Tip: How To Sort A Column With Duplic...

Enhanced Data Profiling in Power Query: GUI and Ta...

Important Links every Power BI enthusiastic shoul...