Setting a cadence for your failure analysis efforts

I have decided to address this topic because I believe that we are leaving a lot of value on the table with our failure analysis (RCFA, RCA, Problem Solving, 8D) efforts. Here is what I see when I interact with maintenance organizations (some or all of these may apply to you).


The true value from your problem solving efforts comes when we involve a team of people with the right skill and knowledge. Courtesy: Allied Reliability GroupSome simple facts

Let me share a couple of ideas that I hope you will agree with:

  1. Problems occur every day, both big ones and small ones.
  2. We cannot possibly address them all.
  3. The benefit from a failure analysis effort comes from finding the root cause of a problem and permanently eliminating it.
  4. Getting good at problem solving is a learned behavior that we must practice at, much like a golf swing or the ability to play music.

If you can sign up for these ideas as being a fact, then what I want to talk to you about today is how we can build a process around these ideas that produces results we can benefit from.

The challenge

The reason I have decided to address this topic is that I believe that we are leaving a lot of value on the table with our failure analysis (RCFA, RCA, Problem Solving, 8D) efforts. Here is what I see when I interact with maintenance organizations (some or all of these may apply to you):

  • A general understanding of the RCFA process, with some classroom training being provided in the past that was well accepted and understood.
  • When the big problem (the huge catastrophic apocalyptic one) happens, we perform a failure analysis on it and issue a report. Rarely for any of the small ones.
  • If we are honest with ourselves, we largely rely on documenting problems, but do not often dig down to the root cause, or make any permanent change that will really prevent it from happening again.
  • When we look at our failure analysis records, we must admit that we often treat symptoms of the failure and fail to dig all of the way to the actual root cause.
  • Our failure analysis efforts are accepted to be an engineering function, and we fail to involve (sufficiently) the people at the front line (operators and maintenance) who have direct knowledge and experience with the failure.
  • Our solutions are more often than not engineering solutions (translated to things that cost money) and we rarely address process, training, or procedural solutions (all human related and relatively inexpensive).

The Fix

All right, I am not going to throw stones here. If any of those bullet points above look familiar, then just take a moment to consider some of these simple solutions I have provided below.

Trigger points

Trigger points help us understand when to act. They are our rules that tell us when we need to perform a failure analysis and when we do not. They need to be set in such a way to drive a constant level of activity around this process. The level of activity is determined by how many problems that you can adequately study and solve in a given time frame – likely much lower than you would think.

Let’s say that within your span of control (department or workgroup), you can solve only 1 or 2 problems per month (identify, study and find root cause, design implement and test a solution). Then we must set trigger points that will drive that level of activity each month.

For example, if you say that we must address every production delay of 60 minutes or more, and you have 23 such events in a given month, that is not a good trigger point for you. Set your trigger point high enough (say 2 or even 4 hours) so that you are addressing those 1 or 2 critical problems per month.

What about the smaller problems, you may ask? They are coming soon. Our reward for success is an adjustment to our trigger points. If we find ourselves in a position where we have no production delays greater than 2 hours for several months in a row, that means we have improved. Congratulations! Now let’s adjust the triggers downward to 60 minutes and continue on.

These simple trigger points set a cadence for your failure analysis efforts and allow you to drive improvements continuously, a little bit at a time.

Quality control and oversight

Failure analysis is a skill that must be built and refined over time. It does not come naturally and it is difficult for us to be critical of our own work. Put a responsible person in charge of your failure analysis program to review your failure analysis records (spot check or all of them, your call) and ask the following questions:

  • Did we identify the root cause or did we address a symptom?
  • Are we applying our triggers correctly and do we have the correct level of activity?
  • Do our solutions address the actual root cause?
  • Are we keeping our solutions small and achievable? (Let’s not solve all of the world’s problems, just this one.)
  • Did we address the problem in a timely manner?
  • Did we involve the right people in the analysis or was this performed in a back office somewhere?

Drive the process to the front line

Finally, true value from your problem solving efforts comes when we involve a team of people with the right skill and knowledge. It is very easy for us to overlook our front line operators and maintenance technicians and look at our failure analysis efforts as “engineering only” effort.

Not to mention the fact that these people likely have direct knowledge of the failure, and may have even seen it themselves first hand, they also have the ability to help you find simple and effective solutions that relate back to the way we maintain and operate our assets.

There is tendency to treat failure analysis as an “equipment redesign” effort. These types of solutions are generally costly and tend to mask the human-related causes of failures.

Getting results from your failure analysis effort takes discipline and a knowledgeable team of people with the right training, tools, and focus.
You as a leader can provide this focus with the way in which you administer your program. Provide your teams with a framework to function within and they will do the rest.

Top Plant
The Top Plant program honors outstanding manufacturing facilities in North America.
Product of the Year
The Product of the Year program recognizes products newly released in the manufacturing industries.
System Integrator of the Year
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
November 2018
2018 Product of the Year finalists, mild steel welding: finding the right filler, and new technique joins aluminum to steel.
October 2018
Tools vs. sensors, functional safety, compressor rental, an operational network of maintenance and safety
September 2018
2018 Engineering Leaders under 40, Women in Engineering, Six ways to reduce waste in manufacturing, and Four robot implementation challenges.
October 2018
2018 Product of the Year; Subsurface data methodologies; Digital twins; Well lifecycle data
August 2018
SCADA standardization, capital expenditures, data-driven drilling and execution
June 2018
Machine learning, produced water benefits, programming cavity pumps
Summer 2018
Microgrids and universities, Steam traps and energy efficiency, Finding help with energy projects
October 2018
Complex upgrades for system integrators; Process control safety and compliance
November 2018
Analytics quantify processes, Fieldbus networking and IIoT, Choosing the right accelerometer

Annual Salary Survey

After two years of economic concerns, manufacturing leaders once again have homed in on the single biggest issue facing their operations:

It's the workers—or more specifically, the lack of workers.

The 2017 Plant Engineering Salary Survey looks at not just what plant managers make, but what they think. As they look across their plants today, plant managers say they don’t have the operational depth to take on the new technologies and new challenges of global manufacturing.

Read more: 2017 Salary Survey

The Maintenance and Reliability Coach's blog
Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
One Voice for Manufacturing
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Maintenance and Reliability Professionals Blog
The Society for Maintenance and Reliability Professionals an organization devoted...
Machine Safety
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
Research Analyst Blog
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Marshall on Maintenance
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
Lachance on CMMS
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
Material Handling
This digital report explains how everything from conveyors and robots to automatic picking systems and digital orders have evolved to keep pace with the speed of change in the supply chain.
Electrical Safety Update
This digital report explains how plant engineers need to take greater care when it comes to electrical safety incidents on the plant floor.
IIoT: Machines, Equipment, & Asset Management
Articles in this digital report highlight technologies that enable Industrial Internet of Things, IIoT-related products and strategies.
Randy Steele
Maintenance Manager; California Oils Corp.
Matthew J. Woo, PE, RCDD, LEED AP BD+C
Associate, Electrical Engineering; Wood Harbinger
Randy Oliver
Control Systems Engineer; Robert Bosch Corp.
Data Centers: Impacts of Climate and Cooling Technology
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
Safety First: Arc Flash 101
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
Critical Power: Hospital Electrical Systems
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
Design of Safe and Reliable Hydraulic Systems for Subsea Applications
This eGuide explains how the operation of hydraulic systems for subsea applications requires the user to consider additional aspects because of the unique conditions that apply to the setting
click me