How long a machine runs before it lets you down. Higher is better.
Mean time between failures is one of the two reliability metrics every TPM program tracks, and the one that most directly answers the question "how dependable is this machine." Reliability is not the same as quality and not the same as productivity; it is specifically about how long the machine will keep running before something unexpected stops it. A shop that knows MTBF by machine can plan staffing, set realistic delivery promises, and target the right equipment for upgrades.
"Reliability is the difference between a quote you can keep and a quote that depends on luck."
The calculation is simple arithmetic but the inputs require discipline. Run time is the time the machine was actually in production. It excludes scheduled meetings, lunches, planned maintenance, and changeovers. Failures are unplanned stops that took the machine out of production until someone repaired it. A jam the operator cleared in 90 seconds may or may not count depending on how the shop defines failure; many programs set a threshold of 10 or 15 minutes to filter out the noise. Whichever rule is chosen, the rule has to be consistent across machines and time periods or the comparisons mean nothing.
MTBF combines with mean time to repair to produce availability. The relationship: availability equals MTBF divided by the sum of MTBF and MTTR. A machine with a 200 hour MTBF and a four hour MTTR has availability of 200 over 204, or 98 percent. Drop MTBF to 50 hours with the same MTTR, and availability falls to 92 percent. Drop MTTR to one hour at the original MTBF and availability rises to 99.5 percent. Improving either lever helps; understanding which lever moves more for your shop is the diagnostic question.
The lever to pull depends on the failure pattern. If failures are concentrated on a few wear parts, preventive maintenance on those parts will raise MTBF. If failures are random and varied, predictive maintenance or autonomous maintenance inspections by the operator are usually a better bet.
Picture a 25 person fab shop with two CNC mills, three press brakes, and a laser cutter. The owner has been told the laser is the most reliable machine. MTBF tracking for a quarter tells a different story. The laser has an MTBF of 220 hours and an MTTR of three hours. One of the mills has an MTBF of 95 hours and an MTTR of 45 minutes. Both have similar availability around 97 percent. The laser is not more reliable; it breaks less often but takes far longer to recover when it does.
That data changes the conversation. The laser's MTTR problem points to parts staging and procedure: the failures are usually the same two or three modules, but the parts are not on the shelf and the procedure is in someone's head. The mill's MTBF problem points to a single chronic issue with the way oil is delivered to one axis. Both fixes are now visible. Without MTBF and MTTR broken out, the shop would have spent capital replacing the laser and never touched the mill.
MTBF pairs with mean time to repair to compute availability, the A in OEE. Raising MTBF is the goal of preventive maintenance and any sensor based program for predicting failures. Each failure event also adds to total downtime, which is the raw input that connects reliability data back to lost production time.
The questions we hear most about this term.
Long-form guides that pick up where this definition leaves off, written for manufacturers running Arda today.
Same-day setup. No distributor lock-in. Zero stockouts. Top teams double revenue in 9 months.