Reliability program: How to sell, implement, and sustain one

This article is the second of a two-part series. It is based on a paper originally presented at the Process Plant Reliability, 8th International Conference and Exhibition. Selecting the right staff will make or break a reliability program. Programs that are successful have been staffed with personnel with exceptional leadership skills.

06/12/2003


This article is the second of a two-part series. It is based on a paper originally presented at the Process Plant Reliability, 8th International Conference and Exhibition.

Selecting the right staff will make or break a reliability program. Programs that are successful have been staffed with personnel with exceptional leadership skills. This is because the program is about change and change is about leadership.

The reliability superintendent is a key position in your plant. Take care in selecting this person. Here is a list of important factors for success:

  • Leadership

  • Credibility

  • Communication skills

  • Interested in this job, not the next job

  • Business perspective

  • Broad view

  • Technical competence.

    • Sounds like we are looking for Superman. We are! This person will play a critical role in the cost, safety, and environmental performance of the facility. Pick the right person and pay accordingly. These people are hard to find and even harder to keep, because your competitor knows what they are worth.

      The staff

      The staff personnel need to be cut from the same mold as the boss. Leadership, communication, and credibility are critical factors in addition to technical competence. We have all worked with people who are great technically but can't get anything done, because they lack the leadership and credibility to influence people to change. Try to avoid putting these people on the reliability team. They will just frustrate a high-energy work group and reduce the credibility of the reliability department. This may sound harsh, but one poor performer can have a negative impact on the team and therefore the performance of the program.

      During the staffing process, potential successors for the reliability superintendent should be identified. A comprehensive development plan should be initiated to prepare these people to assume responsibility of the department in the future.

      Staff makeup

      The reliability organization needs to cover a lot of ground. Our reliability groups have responsibility for maintenance engineering, I&E reliability, inspection, and the rotating equipment program. Clerical functions, spare parts, and other functions can be incorporated. In a world-class refinery or chemical plant the organization will probably be 15_25 people, and in a small plant 5_8 people will be required for critical mass. In the larger plants a typical group would consist of the superintendent, inspection supervisor, 4_8 inspectors, machinery engineer, 1_2 machinery technicians, 2_4 I&E engineers, 4 maintenance/reliability engineers and 2_3 personnel for secretarial and clerical functions.

      Incremental personnel will be required for implementing large programs. This could include establishing a piping inspection program, optimizing PM programs, substantial expansion of the facility, or installing a new computerized maintenance management system. This team may seem like a fairly large number of people. However, it is important to have critical mass to get out of fire fighting. Nuclear warheads and reliability programs need critical mass to make things happen.

      Organizational structure

      We have been successful with a number of organizational structures as long as management supports the program and the right people are in place. A new regional reliability concept was implemented in our company. In this arrangement, reliability superintendents report to a regional reliability manager. The regional reliability manager also controls all of the corporate nonprocess engineering specialists and regional warehousing. This concept has proven effective because its provides a consistent approach across the region and elevates the regional reliability manager to the same level as the plant managers.

      The reliability superintendent essentially becomes a direct report of the plant manager rather than reporting lower in the organization. This arrangement:

      • Allows access to all critical meetings and decisions

      • Provides reliability-oriented control of warehousing, spare parts, alliances, parts repairs and positive material identification. These areas can directly impact reliability but are often controlled by nontechnical organizations such as purchasing

      • Provides plants with direct access to corporate specialists in machinery, stationary equipment, materials, combustion, instrumentation, power, and analyzers

      • Provides developmental cross-transfer opportunities in the same regional department.

        • The Reliability Toolbox

          Our "Reliability Toolbox" includes:

          • Engineering standards

          • Risk-based inspection standards

          • Maintenance standards

          • Worldwide reliability metrics and reporting system

          • Mechanical integrity risk assessment program (MIRA)

          • Bad-actor elimination program

          • Materials, coatings, and elastomer selection guidelines

          • Guidelines for prevention of stress corrosion cracking

          • Corporate piping specifications

          • Alliances with selected vendors

          • Structured program for vendor qualification

          • Shop surveillance program for engineered equipment

          • Field Q/A program based on standard inspection and test plans

          • Machinery monitoring program

          • Electrical monitoring program

          • Front-end loading program for turnarounds.

            • The toolbox was developed over the last 10 yr and was driven by an intense capital expansion program. We also had strong management commitment to improve the safety, reliability and environmental performance of our manufacturing facilities. On the capital side we have implemented programs to assure our requirements are clearly spelled out in engineering standards and that the requirements are actually achieved in the final product. This performance is accomplished by strict rules on design review, vendor selection, shop inspection, and field Q/A. Final acceptance by operations is also strictly controlled through a "Care, Custody, and Transfer" program.

              The toolbox also addresses the needs of daily operations. Standard maintenance procedures for machinery overhaul and other routine maintenance tasks have been developed and incorporated in our job order system. There are also standards to set minimum inspection intervals and scope for stationary equipment.

              We have set up metrics to put each plant on an "apples to apples" basis. For example, each plant measures reliability, availability, production loss, and other critical reliability parameters the same way. We also issue a Pareto analysis of the worst reliability problems for all of the plants. This helps allocate human and financial resources on a worldwide basis. It also helps us focus on programs and major projects that need to be implemented in the future.

              Some of the tools will be reviewed in detail to show how they are applied to drive our program. The selected tools are the reliability metrics, bad-actor program, and heat exchanger program. Information on other programs can be obtained by contacting the author.

              Reliability metrics and bad-actor programs

              These programs are effective since they have been implemented consistently on a worldwide basis. A uniform measurement program allows allocation of resources to eliminate the problems that have the most impact on the bottom line. Reliability metrics measured on a worldwide basis are:

              Plant availability = (maximum rate - planned turnarounds - unplanned outages) / (maximum rate) - planned turnarounds)

              Business slowdowns are not considered reliability hits. However, during a business slowdown the plant must be available to go to full rate or a hit is counted. Once production is lost, it cannot be "caught up." This is a tough but uniform measurement of the availability of the plant.

              • Unplanned reliability "hits" in number, type, and amount of production loss

              • Ranking of each "hit" as a slow down, planned shutdown or emergency shutdown

              • Pareto ranking of reliability "hits" based on production loss for individual plants and the system

              • Year-to-year comparison of individual plant and overall system performance.

                • This data is presented to executive management on a quarterly basis along with goals for each facility and a problem elimination program based on the data from the metrics.

                  The bad-actor program is simply the list of the worst problems the plant faces from a reliability standpoint. The list is usually limited to a maximum of 10 items based on production loss, number of times the problem has impacted the plant and total maintenance costs. We try to maintain focus on the list and communicate progress throughout the facility. A program with discipline can stay on track and eliminate the top-10 list and then go after the next 10 problems. A firefighting organization will make a list but change it after every failure or never make progress on the list.

                  Heat exchanger program

                  As our program matured and problems were eliminated, reliability of heat exchange equipment became the next target for improvement. This decision was based on reliability metrics and results of the bad-actor program. In 1997, heat exchanger tube failures accounted for 31% of the unplanned downtime in our Gulf Coast plants, resulting in more than $12 million of unplanned production losses. A multidisciplinary team studied the tube failure modes and developed recommendations for improvement of heat exchanger design, construction, operation, maintenance, and inspection.

                  The first step in attacking heat exchanger reliability was to develop a database that included equipment design information, tube test results, failure mechanisms, retube/upgrade data, replacement priority, and production criticality factor.

                  The team used this information to develop a formal risk-based approach to prioritize bundles for inspection and retubing. This entire effort boiled down to a one-page tool called the Retube Analysis Matrix. The tool provided a systematic and consistent approach to predict problem heat exchangers and to develop a complete turnaround scope.

                  The Retube Analysis Matrix is based on a point system. Zero points indicates the lowest probability of failure, and 16 the highest. The following four risk factors were used in the matrix: