Find the right timing to perform preventive maintenance work

Selection of the correct interval to perform a preventive maintenance task is, by far, the most difficult job confronting the maintenance technician and analyst. We need to understand how physical processes and materials change over time, and how those changes ultimately lead to what we call failure modes.

By Anthony M. Smith and Glenn R. Hinchcliffe May 1, 2006

Selection of the correct interval to perform a preventive maintenance task is, by far, the most difficult job confronting the maintenance technician and analyst. We need to understand how physical processes and materials change over time, and how those changes ultimately lead to what we call failure modes. Understanding how failure rates can vary as a function of time is essential and in order to tackle a solution, we enter the world of statistical analysis

The task selection process should establish at the outset whether we know the age—reliability relationship for the specific failure mode in question. If we know the age—reliability relationship, then we also have information to select the TD task interval. That is, we have the failure density function (fdf) for the failure mode population, and we can select the task interval from the statistical knowledge by deciding on the level of consumer risk that we want to accept.

Suppose, for example, that the fdf looks like a bell-shaped curve where the x-axis is operating time and the y-axis is probability of failure. The left-hand tail may be quite long, thus signifying an extended period of time during which the probability of failure is quite small and, for all practical purposes, the item is in a constant failure rate condition.

However, as we proceed to the right, or as we see the probability of failure beginning to increase as additional operating time is accumulated, we can decide how far we want to proceed before doing the TD task. And this is where the level of consumer risk comes into play. We can pick that level of risk by selecting the percentage of area under the fdf that we can tolerate before taking action.

Say we choose 15%. This means that there is a 15% chance that the failure mode could occur before we take the preventive actions. We can choose any percentage value, but decreasing risk leads to more frequent PM actions and higher PM costs.

Notice that if we use the mean (or MTBF) for the bell-shaped fdf, there is a 50% chance of failure before we take preventive actions. For other fdfs, the chance of failure can be as large as 67% when the mean is used. This is not an acceptable level of risk in most circumstances—hence, using an MTBF value is not really a valid and useful technique for selecting task intervals.

The foregoing discussion has briefly outlined the most ideal situation that we experience for selecting task intervals. This ideal is not encountered as often because we usually do not have sufficient data from operating experience to define the fdf. So let’s discuss what we can do in the non-ideal situations more commonly encountered.

The first situation is one wherein we have a partial knowledge of the age—reliability relationship. This means that the failure cause information on the FMEA leads us to conclude that aging or wearout mechanisms are at play. Or perhaps we have some operating experience to support the conclusion that aging/wearout mechanisms exist. But, in either case, we do not have any statistical data to define when this would be expected to occur.

So we tend to use our experience to guess at a task interval for the TD actions. In so doing, there is overwhelming evidence to show that this process is highly conservative. That is, we tend to pick intervals that are way too short. We might overhaul a large electric motor every three years when, in reality, the correct interval turns out to be 10 years.

The second situation is one in which we have no idea what the age-reliability relationship might be, and we are now moving on to look for candidate CD tasks. If the failure mode is hidden, we also extend our search to include candidate FF tasks. These tasks, too, must have intervals specified for the non-intrusive data acquisition and inspection actions that must be accomplished. And, here again, the statistical basis for specifying these intervals is usually missing, and we guess at what they will be—and usually with great conservatism. So Age Exploration will be useful to us with CD and FF tasks as well as with TD tasks.

When good statistical data is not available, using our experience to guess at task intervals is really the only option that is available to us initially. But there is a proven technique that we can employ to refine that “guesstimate” over time, and to predict more accurately the correct task interval. It is called Age Exploration, or AE. The AE technique is strictly empirical, and works like this (using a TD task for illustrative purposes).

Say our initial overhaul interval for a fan motor is 3 years. When we do the first overhaul, we meticulously inspect and record the as-found condition of the motor and all of its parts and assemblies where aging and wearout are thought to be possible. If our inspection reveals no such wearout or aging signs, when the next fan motor comes due for overhaul we automatically increase the interval by 10% (or more), and repeat the process, continuing until, on one of the overhauls, we see the incipient signs of wearout or aging. At this point, we stop the AE process, perhaps back off by 10%, and define this as our final task interval.

Figure 1 illustrates how this AE process was successfully used by United Airlines for one of their hydraulic pumps. On the top half of Figure 1, we see that the overhaul interval started at about 6,000 hours, and that the AE process was then employed over a four-year period to extend the interval to 14,000 hours.

The bottom half of Figure 1 presents a second very interesting statistic for the same population of pumps over the same four-year interval. The statistic is premature removal rate (or the rate at which corrective maintenance actions were required). The interesting point here is that the premature removal rate has a definite decreasing value over the four-year period where the overhaul interval was increasing. We interpret this to suggest that as the amount of human handling and intrusive overhaul maintenance actions decreased, so did the human error resulting from such actions, with the net effect that corrective maintenance actions likewise decreased.

Printed with permission from Butterworth-Heinemann, a division of Elsevier, from RCM—Gateway to World Class Maintenance, by Anthony M. Smith, AMS Associates Inc. in California, and Glenn R. Hinchcliffe, Consulting Professional Engineer, G&S Associates Inc. in North Carolina. Copyright 2004. For more information about this title and similar titles, please visit www.books.elsevier.com .