Setting a cadence for your failure analysis efforts

I have decided to address this topic because I believe that we are leaving a lot of value on the table with our failure analysis (RCFA, RCA, Problem Solving, 8D) efforts. Here is what I see when I interact with maintenance organizations (some or all of these may apply to you).


The true value from your problem solving efforts comes when we involve a team of people with the right skill and knowledge. Courtesy: Allied Reliability GroupSome simple facts

Let me share a couple of ideas that I hope you will agree with:

  1. Problems occur every day, both big ones and small ones.
  2. We cannot possibly address them all.
  3. The benefit from a failure analysis effort comes from finding the root cause of a problem and permanently eliminating it.
  4. Getting good at problem solving is a learned behavior that we must practice at, much like a golf swing or the ability to play music.

If you can sign up for these ideas as being a fact, then what I want to talk to you about today is how we can build a process around these ideas that produces results we can benefit from.

The challenge

The reason I have decided to address this topic is that I believe that we are leaving a lot of value on the table with our failure analysis (RCFA, RCA, Problem Solving, 8D) efforts. Here is what I see when I interact with maintenance organizations (some or all of these may apply to you):

  • A general understanding of the RCFA process, with some classroom training being provided in the past that was well accepted and understood.
  • When the big problem (the huge catastrophic apocalyptic one) happens, we perform a failure analysis on it and issue a report. Rarely for any of the small ones.
  • If we are honest with ourselves, we largely rely on documenting problems, but do not often dig down to the root cause, or make any permanent change that will really prevent it from happening again.
  • When we look at our failure analysis records, we must admit that we often treat symptoms of the failure and fail to dig all of the way to the actual root cause.
  • Our failure analysis efforts are accepted to be an engineering function, and we fail to involve (sufficiently) the people at the front line (operators and maintenance) who have direct knowledge and experience with the failure.
  • Our solutions are more often than not engineering solutions (translated to things that cost money) and we rarely address process, training, or procedural solutions (all human related and relatively inexpensive).

The Fix

All right, I am not going to throw stones here. If any of those bullet points above look familiar, then just take a moment to consider some of these simple solutions I have provided below.

Trigger points

Trigger points help us understand when to act. They are our rules that tell us when we need to perform a failure analysis and when we do not. They need to be set in such a way to drive a constant level of activity around this process. The level of activity is determined by how many problems that you can adequately study and solve in a given time frame – likely much lower than you would think.

Let’s say that within your span of control (department or workgroup), you can solve only 1 or 2 problems per month (identify, study and find root cause, design implement and test a solution). Then we must set trigger points that will drive that level of activity each month.

For example, if you say that we must address every production delay of 60 minutes or more, and you have 23 such events in a given month, that is not a good trigger point for you. Set your trigger point high enough (say 2 or even 4 hours) so that you are addressing those 1 or 2 critical problems per month.

What about the smaller problems, you may ask? They are coming soon. Our reward for success is an adjustment to our trigger points. If we find ourselves in a position where we have no production delays greater than 2 hours for several months in a row, that means we have improved. Congratulations! Now let’s adjust the triggers downward to 60 minutes and continue on.

These simple trigger points set a cadence for your failure analysis efforts and allow you to drive improvements continuously, a little bit at a time.

Quality control and oversight

Failure analysis is a skill that must be built and refined over time. It does not come naturally and it is difficult for us to be critical of our own work. Put a responsible person in charge of your failure analysis program to review your failure analysis records (spot check or all of them, your call) and ask the following questions:

  • Did we identify the root cause or did we address a symptom?
  • Are we applying our triggers correctly and do we have the correct level of activity?
  • Do our solutions address the actual root cause?
  • Are we keeping our solutions small and achievable? (Let’s not solve all of the world’s problems, just this one.)
  • Did we address the problem in a timely manner?
  • Did we involve the right people in the analysis or was this performed in a back office somewhere?

Drive the process to the front line

Finally, true value from your problem solving efforts comes when we involve a team of people with the right skill and knowledge. It is very easy for us to overlook our front line operators and maintenance technicians and look at our failure analysis efforts as “engineering only” effort.

Not to mention the fact that these people likely have direct knowledge of the failure, and may have even seen it themselves first hand, they also have the ability to help you find simple and effective solutions that relate back to the way we maintain and operate our assets.

There is tendency to treat failure analysis as an “equipment redesign” effort. These types of solutions are generally costly and tend to mask the human-related causes of failures.

Getting results from your failure analysis effort takes discipline and a knowledgeable team of people with the right training, tools, and focus.
You as a leader can provide this focus with the way in which you administer your program. Provide your teams with a framework to function within and they will do the rest.

Top Plant
The Top Plant program honors outstanding manufacturing facilities in North America. View the 2017 Top Plant.
Product of the Year
The Product of the Year program recognizes products newly released in the manufacturing industries.
System Integrator of the Year
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
February 2018
2017 Product of the Year winners, retrofitting a press, IMTS and Hannover Messe preview, natural refrigerants, testing steam traps
March 2018
SCCR, 2018 Maintenance study, and VFDs in a washdown environment.
Jan/Feb 2018
Welding ergonomics, 2017 Salary Survey, and surge protection
April 2018
ROVs, rigs, and the real time; wellsite valve manifolds; AI on a chip; analytics use for pipelines
February 2018
Focus on power systems, process safety, electrical and power systems, edge computing in the oil & gas industry
December 2017
Product of the Year winners, Pattern recognition, Engineering analytics, Revitalize older pump installations
April 2018
Implementing a DCS, stepper motors, intelligent motion control, remote monitoring of irrigation systems
February 2018
Setting internal automation standards
December 2017
PID controllers, Solar-powered SCADA, Using 80 GHz radar sensors

Annual Salary Survey

Before the calendar turned, 2016 already had the makings of a pivotal year for manufacturing, and for the world.

There were the big events for the year, including the United States as Partner Country at Hannover Messe in April and the 2016 International Manufacturing Technology Show in Chicago in September. There's also the matter of the U.S. presidential elections in November, which promise to shape policy in manufacturing for years to come.

But the year started with global economic turmoil, as a slowdown in Chinese manufacturing triggered a worldwide stock hiccup that sent values plummeting. The continued plunge in world oil prices has resulted in a slowdown in exploration and, by extension, the manufacture of exploration equipment.

Read more: 2015 Salary Survey

The Maintenance and Reliability Coach's blog
Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
One Voice for Manufacturing
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Maintenance and Reliability Professionals Blog
The Society for Maintenance and Reliability Professionals an organization devoted...
Machine Safety
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
Research Analyst Blog
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Marshall on Maintenance
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
Lachance on CMMS
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
Maintenance & Safety
The maintenance journey has been a long, slow trek for most manufacturers and has gone from preventive maintenance to predictive maintenance.
Industrial Analytics
This digital report explains how plant engineers and subject matter experts (SME) need support for time series data and its many challenges.
IIoT: Operations & IT
This digital report will explore several aspects of how IIoT will transform manufacturing in the coming years.
Randy Steele
Maintenance Manager; California Oils Corp.
Matthew J. Woo, PE, RCDD, LEED AP BD+C
Associate, Electrical Engineering; Wood Harbinger
Randy Oliver
Control Systems Engineer; Robert Bosch Corp.
Data Centers: Impacts of Climate and Cooling Technology
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
Safety First: Arc Flash 101
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
Critical Power: Hospital Electrical Systems
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
click me