A better approach to failure prevention

The traditional maintenance organization is often regarded unfavorably because the need for it has been unpredictable. Its role is to fix problems after equipment fails.

By Rajendra Doshi August 1, 1999

The traditional maintenance organization is often regarded unfavorably because the need for it has been unpredictable. Its role is to fix problems after equipment fails. When equipment fails, it demands immediate attention, upsets schedules, causes unexpected budget drains, and induces mental stress in personnel. These effects stem from the unpredictability.

Furthermore, pressure to minimize production downtime often results in temporary fixes. Fixing problems permanently tends to slip to minor importance. One of the goals of true preventive maintenance is to minimize temporary fixes.

Getting to the root

A maintenance program has value when effort is spent to find the root cause of each problem, rather than just trying to correct the apparent fault where it is visible.

To illustrate: Suppose a bearing on a wheel driving a conveyor is freezing up. The electric motor driving the wheel becomes overloaded. The result of the overloaded motor is either a blown fuse or a burned up motor.

Replacing the fuse or the motor may seem to solve the problem temporarily, but the fault occurs again until the root cause is detected and the situation is corrected. Though it looks to be an electrical fault (fuse blown or motor damaged), in reality it is a mechanical fault.

The common approach

A person usually utilizes some procedure similar to the following to analyze a failure (Fig. 1).

1. Gather data about the failure. In trying to solve the problem, one may have to go through thousands of possibilities to determine a set of circumstances that yields the same result as that of the failure. Many possibilities are eliminated during the process, based on past experience, judgment, maintenance manuals, design data, expert opinions, and any other available inputs.

2. Form a theory regarding the probable cause.

3. Formulate and carry out tests to verify the theory.

4. If the tests fail to confirm the theory, modify the theory and repeat the tests.

5. If the tests confirm the theory, then the problem is presumed to have been solved.

When tests do not confirm a theory, the human tendency is to “adjust” the facts to suit the theory. Usually, no efforts are made to learn from failed theories. Depending on the seriousness of the results, the tests may be repeated. This procedure may or may not solve the real problem.

In addition, effort is seldom made to isolate the “once in a while” successful test. Apart from known situations created during testing, other situations that contribute to successful test outcomes may be created unknowingly. When the same tests are attempted again, they may yield different results due to changes in the unknown situations.

There is always pressure to perform a quick repair in order to minimize downtime. The entire haste-driven process is error prone. Each error can throw an investigation off track, resulting in inaccurate theories and erroneous test results. Then, repeated inconclusive testing can result in loss of focus, finger pointing, and fault finding. The ultimate result is wasted time and money.

An alternate approach

The common approach to problem solving can be improved by adding some intermediate steps (Fig. 2).

1. Gather data about the failure. Do not critique the data yet.

2. Interview all personnel who have firsthand knowledge regarding the failure. Find out what they heard, smelled, or saw. Ask about anything unusual, any warning signs. Do not critique the data yet.

3. Separate facts from theories.

4. Form a set of questions to clarify conflicting and/or insufficient data.

5. Analyze the data to determine their relevancy to the event.

6. Arrange the data in time or event sequence.

7. Form a theory incorporating as much data as possible.

8. Evaluate the theory with reference to each data item.

9. Modify the theory, if needed. Caution: The assumptions forming the theory must be based on facts only. Past experience may or may not be relevant. (Due to the complexity of systems, two identical-looking failures may have different causes.) The facts must be included in the theory as they are. Do not modify them to suit the theory. Modify the theory to include the facts as they are. This step is very important. The theory must explain all the major facts.

10. Devise tests to verify the validity of the theory. Can you duplicate the field event in the lab? If possible, you may be on the right track.

11. If the tests do not confirm the theory, verify the test procedures and/or the theory, modify them if needed, and then retest. Confirm their validity with all the data. Modify the theory until it explains most of the data and the test results. Retest the theory. Repeat the procedure until the theory is confirmed.

This method may appear time consuming, but that is not the case. (As with any new procedure, there is a learning curve.) Experience has shown that overall time spent in this systematic way is about 70% of the time required for the usual procedure of trial and error.

This procedure helps to eliminate faulty assumptions and the “once in a while” result. It also extracts education from each failed theory. “Doing it right the first time” saves not only time and money, but also frustrations and loss of focus due to repeated failure to solve a problem. The end product is a good picture of what really happened and how. The root cause of the problem is identified.

In addition to finding the root cause of the problem, attention is directed to improvements that can reduce or prevent future occurrences. Another advantage of this method is that similar occurrences are detected before they happen, and they can be corrected before they become a major issue.

Can a computer be used in this process? The answer is yes and no. The computer is an excellent storage device, and allows data to be accessed quickly. The computer is reliable and accurate in recalling stored data, but it is a machine and has no independent decision making capabilities. Therefore, the answer is that the computer can be used in this process if it is utilized for storing and recalling relevant history.

Though the method was developed to improve maintenance, it has also been used in nontechnical areas with equal success. That is not surprising, because the method is based on scientific principles. It is the same procedure taught in school for applying the scientific method. If this method is followed rigorously, it can lead to the root cause of a problem in the shortest possible time, regardless of your area of expertise.

Key concepts

A focus on finding and solving problems reduces maintenance burdens.

Problem solving often depends more on the approach taken than technical expertise.

What you don’t need to be

An important question: Are you supposed to be knowledgeable in all aspects of engineering to take advantage of this method? The answer is no. You do not need to fill your mind with all available knowledge.

What you do need is an open mind and determination to find the root cause of the problem. It is necessary that you have good analytic capabilities and be objective in evaluating data. It is important to find what happened, not who caused it. For specific knowledge, there are plenty of helps like books and technical journals available. If you are interested in finding a long-term solution to a problem, then your focus must be to get to the root cause.

What you do need to be

You must be open-minded, unbiased, and bold enough to tackle problems in fields irrespective of training in those areas, decisive, and focused. Your only objective is to find the root cause of the problem.

You must accept your own responsibilities and must not try to find excuses for the failure. You must treat each misstep as a learning experience and be able to correct your own mistakes. Remember, your own failures are an integral part of the learning curve. Therefore, if you do not fail, you may not be trying hard enough.

Five misconceptions about maintenance

1. Equipment needs maintenance after it is put in service. In reality, the need for maintenance attention begins before equipment is put into service. Indeed, the need for maintenance arises before equipment is even built, while it is still in component stages.

2. Design data can be relied upon for maintenance information. Factors such as design parameters, MTBF, reliability, and maintainability can lose much of their significance after equipment is built, because they depend on several idealistic assumptions. Once equipment is built, its environment determines whether that equipment will fail before it is put into operation, even if it is “packed.”

Failures after start of an operation are frequently due to abuse. Abuse is the harshest reality equipment in operation has to face. Therefore, it is not extraordinary to find equipment with an estimated MTBF of 1 million hr failing after only 100 hr of operation.

3. A machine’s maintenance manual is extremely important. Yes, the manual typically offers maintenance and troubleshooting information. But, here are some reasons why a manual may be of limited usefulness. (1) The manual likely assumes ideal operating conditions for the equipment, which are not possible in the real world. (2) Being a support document, cost is a major factor in its generation. Therefore the manual is not likely to cover each and every combination of component failures. It’s not unusual for an equipment failure to stem from more than one component failure. (3) The manual is commonly developed during early stages of design. Changes made in the design of the equipment may not be included in the manual, making it frustrating to use.

4. “Experts” are often needed to solve equipment failure problems. They can offer a hefty amount of knowledge and provide good assistance, but their experience can also tend to make them inflexible and biased. Closed-mindedness may become a subtle, but serious, obstruction to solving the problem. Sometimes a newcomer solves a problem while the “experts” are still struggling.

5. Preventive maintenance prevents breakdowns and hence the need for emergency maintenance. In fact, preventive maintenance does not and cannot prevent breakdowns. If done properly, it does minimize breakdowns. As mentioned earlier, failure of equipment depends on realities such as the number of components, environment, and abuse. None of them is affected by preventive maintenance.

More info

The author will answer questions concerning this article. Mr. Doshi can be contacted at 770-931-7580 or shuddhatma@ iname. com.

See the “Maintenance” channel on www.plantengineering.com for more information related to this topic.