A better approach to failure prevention

The traditional maintenance organization is often regarded unfavorably because the need for it has been unpredictable. Its role is to fix problems after equipment fails.


The traditional maintenance organization is often regarded unfavorably because the need for it has been unpredictable. Its role is to fix problems after equipment fails. When equipment fails, it demands immediate attention, upsets schedules, causes unexpected budget drains, and induces mental stress in personnel. These effects stem from the unpredictability.

Furthermore, pressure to minimize production downtime often results in temporary fixes. Fixing problems permanently tends to slip to minor importance. One of the goals of true preventive maintenance is to minimize temporary fixes.

Getting to the root

A maintenance program has value when effort is spent to find the root cause of each problem, rather than just trying to correct the apparent fault where it is visible.

To illustrate: Suppose a bearing on a wheel driving a conveyor is freezing up. The electric motor driving the wheel becomes overloaded. The result of the overloaded motor is either a blown fuse or a burned up motor.

Replacing the fuse or the motor may seem to solve the problem temporarily, but the fault occurs again until the root cause is detected and the situation is corrected. Though it looks to be an electrical fault (fuse blown or motor damaged), in reality it is a mechanical fault.

The common approach

A person usually utilizes some procedure similar to the following to analyze a failure (Fig. 1).

1. Gather data about the failure. In trying to solve the problem, one may have to go through thousands of possibilities to determine a set of circumstances that yields the same result as that of the failure. Many possibilities are eliminated during the process, based on past experience, judgment, maintenance manuals, design data, expert opinions, and any other available inputs.

2. Form a theory regarding the probable cause.

3. Formulate and carry out tests to verify the theory.

4. If the tests fail to confirm the theory, modify the theory and repeat the tests.

5. If the tests confirm the theory, then the problem is presumed to have been solved.

When tests do not confirm a theory, the human tendency is to "adjust" the facts to suit the theory. Usually, no efforts are made to learn from failed theories. Depending on the seriousness of the results, the tests may be repeated. This procedure may or may not solve the real problem.

In addition, effort is seldom made to isolate the "once in a while" successful test. Apart from known situations created during testing, other situations that contribute to successful test outcomes may be created unknowingly. When the same tests are attempted again, they may yield different results due to changes in the unknown situations.

There is always pressure to perform a quick repair in order to minimize downtime. The entire haste-driven process is error prone. Each error can throw an investigation off track, resulting in inaccurate theories and erroneous test results. Then, repeated inconclusive testing can result in loss of focus, finger pointing, and fault finding. The ultimate result is wasted time and money.

An alternate approach

The common approach to problem solving can be improved by adding some intermediate steps (Fig. 2).

1. Gather data about the failure. Do not critique the data yet.

2. Interview all personnel who have firsthand knowledge regarding the failure. Find out what they heard, smelled, or saw. Ask about anything unusual, any warning signs. Do not critique the data yet.

3. Separate facts from theories.

4. Form a set of questions to clarify conflicting and/or insufficient data.

5. Analyze the data to determine their relevancy to the event.

6. Arrange the data in time or event sequence.

7. Form a theory incorporating as much data as possible.

8. Evaluate the theory with reference to each data item.

9. Modify the theory, if needed. Caution: The assumptions forming the theory must be based on facts only. Past experience may or may not be relevant. (Due to the complexity of systems, two identical-looking failures may have different causes.) The facts must be included in the theory as they are. Do not modify them to suit the theory. Modify the theory to include the facts as they are. This step is very important. The theory must explain all the major facts.

10. Devise tests to verify the validity of the theory. Can you duplicate the field event in the lab? If possible, you may be on the right track.

11. If the tests do not confirm the theory, verify the test procedures and/or the theory, modify them if needed, and then retest. Confirm their validity with all the data. Modify the theory until it explains most of the data and the test results. Retest the theory. Repeat the procedure until the theory is confirmed.

This method may appear time consuming, but that is not the case. (As with any new procedure, there is a learning curve.) Experience has shown that overall time spent in this systematic way is about 70% of the time required for the usual procedure of trial and error.

This procedure helps to eliminate faulty assumptions and the "once in a while" result. It also extracts education from each failed theory. "Doing it right the first time" saves not only time and money, but also frustrations and loss of focus due to repeated failure to solve a problem. The end product is a good picture of what really happened and how. The root cause of the problem is identified.

In addition to finding the root cause of the problem, attention is directed to improvements that can reduce or prevent future occurrences. Another advantage of this method is that similar occurrences are detected before they happen, and they can be corrected before they become a major issue.

Can a computer be used in this process? The answer is yes and no. The computer is an excellent storage device, and allows data to be accessed quickly. The computer is reliable and accurate in recalling stored data, but it is a machine and has no independent decision making capabilities. Therefore, the answer is that the computer can be used in this process if it is utilized for storing and recalling relevant history.

Though the method was developed to improve maintenance, it has also been used in nontechnical areas with equal success. That is not surprising, because the method is based on scientific principles. It is the same procedure taught in school for applying the scientific method. If this method is followed rigorously, it can lead to the root cause of a problem in the shortest possible time, regardless of your area of expertise.

Key concepts

A focus on finding and solving problems reduces maintenance burdens.

Problem solving often depends more on the approach taken than technical expertise.

What you don't need to be

An important question: Are you supposed to be knowledgeable in all aspects of engineering to take advantage of this method? The answer is no. You do not need to fill your mind with all available knowledge.

What you do need is an open mind and determination to find the root cause of the problem. It is necessary that you have good analytic capabilities and be objective in evaluating data. It is important to find what happened, not who caused it. For specific knowledge, there are plenty of helps like books and technical journals available. If you are interested in finding a long-term solution to a problem, then your focus must be to get to the root cause.

What you do need to be

You must be open-minded, unbiased, and bold enough to tackle problems in fields irrespective of training in those areas, decisive, and focused. Your only objective is to find the root cause of the problem.

You must accept your own responsibilities and must not try to find excuses for the failure. You must treat each misstep as a learning experience and be able to correct your own mistakes. Remember, your own failures are an integral part of the learning curve. Therefore, if you do not fail, you may not be trying hard enough.

Five misconceptions about maintenance

1. Equipment needs maintenance after it is put in service. In reality, the need for maintenance attention begins before equipment is put into service. Indeed, the need for maintenance arises before equipment is even built, while it is still in component stages.

2. Design data can be relied upon for maintenance information. Factors such as design parameters, MTBF, reliability, and maintainability can lose much of their significance after equipment is built, because they depend on several idealistic assumptions. Once equipment is built, its environment determines whether that equipment will fail before it is put into operation, even if it is "packed."

Failures after start of an operation are frequently due to abuse. Abuse is the harshest reality equipment in operation has to face. Therefore, it is not extraordinary to find equipment with an estimated MTBF of 1 million hr failing after only 100 hr of operation.

3. A machine's maintenance manual is extremely important. Yes, the manual typically offers maintenance and troubleshooting information. But, here are some reasons why a manual may be of limited usefulness. (1) The manual likely assumes ideal operating conditions for the equipment, which are not possible in the real world. (2) Being a support document, cost is a major factor in its generation. Therefore the manual is not likely to cover each and every combination of component failures. It's not unusual for an equipment failure to stem from more than one component failure. (3) The manual is commonly developed during early stages of design. Changes made in the design of the equipment may not be included in the manual, making it frustrating to use.

4. "Experts" are often needed to solve equipment failure problems. They can offer a hefty amount of knowledge and provide good assistance, but their experience can also tend to make them inflexible and biased. Closed-mindedness may become a subtle, but serious, obstruction to solving the problem. Sometimes a newcomer solves a problem while the "experts" are still struggling.

5. Preventive maintenance prevents breakdowns and hence the need for emergency maintenance. In fact, preventive maintenance does not and cannot prevent breakdowns. If done properly, it does minimize breakdowns. As mentioned earlier, failure of equipment depends on realities such as the number of components, environment, and abuse. None of them is affected by preventive maintenance.

More info

The author will answer questions concerning this article. Mr. Doshi can be contacted at 770-931-7580 or shuddhatma@ iname. com.

See the "Maintenance" channel on www.plantengineering.com for more information related to this topic.

Top Plant
The Top Plant program honors outstanding manufacturing facilities in North America.
Product of the Year
The Product of the Year program recognizes products newly released in the manufacturing industries.
System Integrator of the Year
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
October 2018
Tools vs. sensors, functional safety, compressor rental, an operational network of maintenance and safety
September 2018
2018 Engineering Leaders under 40, Women in Engineering, Six ways to reduce waste in manufacturing, and Four robot implementation challenges.
GAMS preview, 2018 Mid-Year Report, EAM and Safety
October 2018
2018 Product of the Year; Subsurface data methodologies; Digital twins; Well lifecycle data
August 2018
SCADA standardization, capital expenditures, data-driven drilling and execution
June 2018
Machine learning, produced water benefits, programming cavity pumps
Spring 2018
Burners for heat-treating furnaces, CHP, dryers, gas humidification, and more
October 2018
Complex upgrades for system integrators; Process control safety and compliance
September 2018
Effective process analytics; Four reasons why LTE networks are not IIoT ready

Annual Salary Survey

After two years of economic concerns, manufacturing leaders once again have homed in on the single biggest issue facing their operations:

It's the workers—or more specifically, the lack of workers.

The 2017 Plant Engineering Salary Survey looks at not just what plant managers make, but what they think. As they look across their plants today, plant managers say they don’t have the operational depth to take on the new technologies and new challenges of global manufacturing.

Read more: 2017 Salary Survey

The Maintenance and Reliability Coach's blog
Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
One Voice for Manufacturing
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Maintenance and Reliability Professionals Blog
The Society for Maintenance and Reliability Professionals an organization devoted...
Machine Safety
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
Research Analyst Blog
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Marshall on Maintenance
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
Lachance on CMMS
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
Material Handling
This digital report explains how everything from conveyors and robots to automatic picking systems and digital orders have evolved to keep pace with the speed of change in the supply chain.
Electrical Safety Update
This digital report explains how plant engineers need to take greater care when it comes to electrical safety incidents on the plant floor.
IIoT: Machines, Equipment, & Asset Management
Articles in this digital report highlight technologies that enable Industrial Internet of Things, IIoT-related products and strategies.
Randy Steele
Maintenance Manager; California Oils Corp.
Matthew J. Woo, PE, RCDD, LEED AP BD+C
Associate, Electrical Engineering; Wood Harbinger
Randy Oliver
Control Systems Engineer; Robert Bosch Corp.
Data Centers: Impacts of Climate and Cooling Technology
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
Safety First: Arc Flash 101
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
Critical Power: Hospital Electrical Systems
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
Design of Safe and Reliable Hydraulic Systems for Subsea Applications
This eGuide explains how the operation of hydraulic systems for subsea applications requires the user to consider additional aspects because of the unique conditions that apply to the setting
click me