Reliability program: How to sell, implement, and sustain one

This article is the first of a two-part series. It is based on a paper originally presented at the Process Plant Reliability, 8th International Conference and Exhibition. You are sleeping soundly, dreaming about that big rainbow trout you plan to catch on your fishing trip tomorrow. Suddenly you are stirred by an unwelcome sound: the phone ringing at 3 a.


This article is the first of a two-part series. It is based on a paper originally presented at the Process Plant Reliability, 8th International Conference and Exhibition.

You are sleeping soundly, dreaming about that big rainbow trout you plan to catch on your fishing trip tomorrow. Suddenly you are stirred by an unwelcome sound: the phone ringing at 3 a.m. A familiar voice on the other end says, "Sorry to call in the middle of the night, but the charge pump crashed again and we are shutting down. You need to come out in the morning."

Losing production from the same problem for the tenth time is bad, but telling your son the trip is postponed is worse. Another Saturday on the end of the "firehose."

Stepwise process

The actions taken immediately after fire fighting to get the plant back on line are a measure of the state of your reliability program. Reliability improvement is a stepwise process. All of the following steps must be completed to eliminate each problem and ultimately improve reliability:

  • Failure

  • Fire Fighting

  • Incident Investigation

  • Root Cause Analysis

  • Engineered Solution

  • Implementation

  • Monitoring and Feedback.

    • The basic concept is that you must drive for implementation to improve plant performance. Anything short is just a pile of paper and studies. Until changes are implemented in the field, the plant equipment won't reflect that a reliability program exists. These changes can include people, procedures, operations, or physical changes to the equipment. The point is that reliability will not improve unless change is implemented and sustained.

      There are many examples of plants with long lists of ideas but poor performance. This is because the plant reliability program is on a recycle loop somewhere before implementation. Perhaps the plant never gets out of the failure/firefighting mode. Sometimes there is inadequate expertise to perform analyses, or reliability projects don't score high enough to get engineering attention. It is possible to get hung up at any step of the process. If this happens, plant performance will remain static despite increases in staffing and spending targeted for reliability improvement.

      Let's look at an analogy to this situation that could occur in your personal life. Perhaps you wake up with a terrible sore throat. You may try fire fighting with some throat lozenges or by gargling salt water. However, the problem will return, since the root cause has not been addressed. The next step is to call the doctor so an incident investigation and root cause analysis can be conducted. If the root cause is an infection, the engineered solution is antibiotics. You would be expected to implement the solution by taking the pills. Later, the doctor would do a followup examination for monitoring and feedback.

      Suppose this simple process broke down. For example, the doctor could be playing golf and unavailable, or there could be an error in diagnosis. The process could also break down during implementation if you forgot to take the medication. As you can see, it is necessary to follow all of the steps to eliminate the problem. The same process is required to eliminate the reliability headaches in your plant.

      Program philosophy

      The basic philosophy is that the plant must escalate reliability to the same level as safety, cost, and other critical drivers. If this is accomplished, top safety performance, high onstream time, low cost, and excellent environmental compliance should be achieved.

      The key thought process that needs to be implemented in your plant is:

      • Treat reliability like safety

      • Treat reliability like football

      • Treat reliability like athletics

      • Treat reliability like a bully

      • Treat reliability like a project (sometimes).

        • Treat reliability like safety

          Reliability is very similar to safety and needs to be treated the same in your plant. First, you need a champion of the program. This person must have resources, credibility, and the leadership skills to implement change and make it stick. The second, but equally important factor is that everyone in the plant must be on board with the program. This includes personnel in maintenance and operations, plus personnel in engineering, purchasing, warehousing, and projects.

          We have a safety policy at our workplace. Safety is a condition of employment. A similar attitude and focus is necessary to achieve reliability performance objectives. Support by the operators and maintenance crafts is essential, because this is where the rubber meets the road. Failure at this level means failure of the program. Therefore, a substantial amount of time needs to be spent communicating the program and expectations to all levels. All personnel must understand how they influence reliability and therefore safety, environmental performance, and profitability.

          Another similarity with safety is that true management support is important for success. Lip service won't work. For example, without commitment of top-notch human resources and funds, the program cannot survive. Management must also be patient to allow the program to develop. Instant success is difficult to achieve in both reliability and safety. However, the ability to showcase "quick hit" successes will help gain credibility while the program matures.

          Treat reliability like football

          A great man once said football is blocking and tackling. Reliability management is the same. Both involve basic tasks, but you have to put the right number of good players on the field to succeed. There is no point in taking on an ailing industrial plant without a skilled team operating at full strength.

          Some of the basic needs for the reliability program include good record keeping, inspection and predictive maintenance programs, planning of maintenance activities, setting expectations for the craftsman, and standard engineering practices. Let's examine the value of each of these items.

          Good records are essential

          Each plant must have confidence in the accuracy of a number of engineering documents. These include process flow diagrams, piping and instrument diagrams, electrical one-line diagrams, interlock system diagrams, relief valve information, and equipment data sheets. In addition, a sound management-of-change program must be implemented and a master file set up. Equipment files stashed in maintenance, operations, and engineering need to be purged so old, inaccurate data is not used.

          Inspection and preventive maintenance programs are also necessary elements of the reliability program. You must know the condition of all plant stationary, rotating, and I&E equipment in order to predict optimum repair intervals and avoid surprises.

          Some plants do not have a basic machinery lubrication or vibration monitoring program and have never conducted internal or external inspections on the stationary equipment.

          Not only is this risky from a production standpoint, it may also have serious safety, environmental, and legal consequences.

          In our plants, we have implemented simple, cost-effective, risk-based systems to address these issues. In the machinery area virtually all of the rotating equipment is on a vibration route and is checked based on criticality. Criteria have been established to define when additional monitoring or spectrum analysis is needed, and general shutdown limits are defined. In the stationary equipment arena, we are using risk-based inspection programs. These were developed a number of years ago and use a degree of hazard and probability of failure matrix. As industry expertise in this area matures we are upgrading our systems to take advantage of this information.

          Risk-based inspection has allowed us to extend turnaround and inspection intervals. It also places inspection and maintenance resources on the equipment that is most likely to develop problems between turnarounds. When problems do surface, we use pre-engineered guidelines to evaluate flaws and have a structured process to evaluate risks associated with continued operation.

          Sound maintenance planning is an important step in eliminating fire fighting. Standard work plans can be developed so repairs can be accomplished efficiently. All steps of the work process can be considered before the plant is in a crisis situation. It also allows a preventive and predictive program to be implemented and scheduled to optimize the workforce.

          The craftsman is where the critical interface with the equipment occurs. This is of particular concern during a turnaround when the contract staff must expand and adequate control over the work is difficult to maintain.

          Each plant must have strict qualification requirements for contractors and personnel working inside the fence. This indoctrination must continue on a daily basis. We have exposed contract personnel to a "show and tell" program using failed parts to demonstrate the consequences of using the wrong bolt, gasket, or material. This kind of training has been highly effective.

          Finally, engineering standards are very important. This is especially true if you are in a capital growth program. Often, operating companies underestimate the importance of defining engineering requirements and leave this task to the contractor.

          Our engineering, maintenance, and inspection standards can be accessed via the company's internal web page. This has increased awareness of these documents and how to use them.

          Treat reliability like athletics

          Everyone wants to be a champion. We would all like to win gold at the Olympics, hit a home run at the World Series, or win the Masters. However, only a few of us will achieve these lofty goals. The people who get there have the talent and put in long hours to succeed.

          Successful reliability management is the same. Only a limited number of operating companies can claim true world-class programs and even fewer can sustain the program for more than a few years in the wake of cost reductions, transfers of key personnel, and loss of momentum. This fact of life will be a constant challenge for your reliability team. Therefore, showcasing success and problem resolution is important.

          Treat reliability like a bully

          How do you treat a bully? Smack him in the nose! A similar approach is required on reliability problems. To move ahead with your program, an aggressive approach to problem elimination is necessary. Problem analysis is not problem elimination.

          Sometimes a mediocre attempt is made to solve a problem in order to save time or money. The result is generally a waste of valuable resources since the problem usually returns. This approach is like spraying for roaches one room at a time. They keep bugging you unless you spray the whole house. This example does not mean throwing money at problems. However, an aggressive approach is needed, and the solution has to be implemented. Sometimes organizations make list after list of things they would like to do. The plant must make a list, and work through implementation without getting distracted.

          Treat reliability like a project (sometimes)

          Reliability is clearly not a project for a number of reasons. For example, a reliability program has no end, and the plant reliability personnel will never finish their jobs. However, sometimes the reliability personnel need to put on a project hat and execute the work just like a topnotch project manager. This approach is critical during startup of a program and for problem elimination.

          During implementation of one of the first programs I started from scratch, I found we were not making adequate progress. As I analyzed the situation several things became obvious.

          We were trying to implement too many programs in parallel and had not established a priority for each program. In addition, a realistic schedule had not been established. In other words, we flunked Project Management 101.

          In order to make progress, personnel were assigned to evaluate each project. The projects included lubrication, vibration monitoring, lube oil sampling, failure analysis, spare parts evaluation, repair procedures, and thermography for insulation and electrical gear.

          We developed a priority for each program based on expected value, assigned a project manager, and set up a project schedule. As programs came on line we made sure adequate resources were assigned so the program would flourish. Once we had this project execution plan in place, we were able to move forward quickly and efficiently.

          The second part of the series will appear in the June issue.

          Top Plant
          The Top Plant program honors outstanding manufacturing facilities in North America.
          Product of the Year
          The Product of the Year program recognizes products newly released in the manufacturing industries.
          System Integrator of the Year
          Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
          September 2018
          2018 Engineering Leaders under 40, Women in Engineering, Six ways to reduce waste in manufacturing, and Four robot implementation challenges.
          GAMS preview, 2018 Mid-Year Report, EAM and Safety
          June 2018
          2018 Lubrication Guide, Motor and maintenance management, Control system migration
          August 2018
          SCADA standardization, capital expenditures, data-driven drilling and execution
          June 2018
          Machine learning, produced water benefits, programming cavity pumps
          April 2018
          ROVs, rigs, and the real time; wellsite valve manifolds; AI on a chip; analytics use for pipelines
          Spring 2018
          Burners for heat-treating furnaces, CHP, dryers, gas humidification, and more
          August 2018
          Choosing an automation controller, Lean manufacturing
          September 2018
          Effective process analytics; Four reasons why LTE networks are not IIoT ready

          Annual Salary Survey

          After two years of economic concerns, manufacturing leaders once again have homed in on the single biggest issue facing their operations:

          It's the workers—or more specifically, the lack of workers.

          The 2017 Plant Engineering Salary Survey looks at not just what plant managers make, but what they think. As they look across their plants today, plant managers say they don’t have the operational depth to take on the new technologies and new challenges of global manufacturing.

          Read more: 2017 Salary Survey

          The Maintenance and Reliability Coach's blog
          Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
          One Voice for Manufacturing
          The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
          The Maintenance and Reliability Professionals Blog
          The Society for Maintenance and Reliability Professionals an organization devoted...
          Machine Safety
          Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
          Research Analyst Blog
          IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
          Marshall on Maintenance
          Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
          Lachance on CMMS
          The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
          Material Handling
          This digital report explains how everything from conveyors and robots to automatic picking systems and digital orders have evolved to keep pace with the speed of change in the supply chain.
          Electrical Safety Update
          This digital report explains how plant engineers need to take greater care when it comes to electrical safety incidents on the plant floor.
          IIoT: Machines, Equipment, & Asset Management
          Articles in this digital report highlight technologies that enable Industrial Internet of Things, IIoT-related products and strategies.
          Randy Steele
          Maintenance Manager; California Oils Corp.
          Matthew J. Woo, PE, RCDD, LEED AP BD+C
          Associate, Electrical Engineering; Wood Harbinger
          Randy Oliver
          Control Systems Engineer; Robert Bosch Corp.
          Data Centers: Impacts of Climate and Cooling Technology
          This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
          Safety First: Arc Flash 101
          This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
          Critical Power: Hospital Electrical Systems
          This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
          Design of Safe and Reliable Hydraulic Systems for Subsea Applications
          This eGuide explains how the operation of hydraulic systems for subsea applications requires the user to consider additional aspects because of the unique conditions that apply to the setting
          Author Information
          Rick Hoffman is currently Manager, Specialty Engineering, for Lyondell Chemical Company and Equistar Chemicals, LP. During his career he has developed reliability programs for refineries, chemical plants, and synthetic fuels operations. He can be reached by phone at 713-309-3915 or by email at .