Using data mining for plant maintenance

Data mining can be used for anything from asset and inventory management to predictive maintenance to banking and finance. However, maintaining manufacturing plants is what we are all about. Knowing when a machine will break before it breaks, in plenty of time for repairs to be conveniently and cost-effectively scheduled and executed, is an exciting application of this technology that can add d...

By Jack Smith, Senior Editor, Plant Engineering December 15, 2002
Key Concepts
  • •Data mining uses data analysis tools to discover patterns and relationships in data as a basis for predictions.

  • •A predictive model is derived based on patterns determined from known results.

  • •Equipment reliability is the intelligent use of knowledge and information to reduce mean time between failures.

    What is data mining?
    Data mining criteria
    The importance of reliability
    Predictive maintenance
    Another approach
    CMMS and data mining
    Data mining historical sensor data

    Data mining can be used for anything from asset and inventory management to predictive maintenance to banking and finance. However, maintaining manufacturing plants is what we are all about. Knowing when a machine will break before it breaks, in plenty of time for repairs to be conveniently and cost-effectively scheduled and executed, is an exciting application of this technology that can add dollars to the bottom line. This article will focus on data mining within the context of maintaining equipment in industrial manufacturing plants.

    What is data mining?

    Data mining means different things to different people. In an ideological sense, data mining is getting to the data you need, when you need them, then using them to make intelligent business decisions.

    Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data and using them to make valid predictions (Fig. 1). Data mining is not a cure-all; it is a tool. It doesn’t watch your database, or send you notifications, or sound alarms to get your attention when it detects an interesting pattern. It doesn’t eliminate the need to know your plant, to understand your equipment’s data, or to understand analytical methods.

    Fig. 1. Making valid predictions that affect plant maintenance decisions is a goal of data mining. This data mining report lists several cross reference options. (Courtesy 24/7 Systems, Inc.)

    Data mining helps users find patterns and relationships in the data — it does not evaluate the value of the patterns to the plant. The patterns uncovered by data mining must be verified in the real world.

    Data mining is comprised of four steps:

    • Describing the data

    • Building a predictive model

    • Testing the model

    • Verifying the model

      • When describing data, summarize their statistical significance. Use charts and graphs to visually identify potentially meaningful links among variables, such as coincidence in time.

        A predictive model is derived based on patterns determined from known results. That model is then tested on results outside the original sample. Just as a blueprint is not a perfect representation of the actual piece of equipment, a good model should never be confused with reality. However, it can be a useful guide to understanding your plant. The final step is to verify the model by testing it.

        Data mining relies on artificial intelligence (AI) and statistics. Both disciplines have made advances in pattern recognition and classification. Also, they have made great contributions to the understanding and application of neural nets and decision trees.

        Data mining does not replace traditional statistical techniques. Rather, it is an extension of statistical methods that is, in part, the result of a major change in the statistics community. The development of most statistical techniques was, until recently, based on elegant theory and analytical methods that worked quite well on the modest amounts of data being analyzed. The increased power of computers and their lower cost, coupled with the need to analyze enormous data sets, have allowed the development of new techniques.

        These new techniques include relatively recent algorithms, such as neural nets and decision trees, and new approaches to older algorithms such as discriminant analysis. By unleashing increased computer power on huge volumes of available data, these techniques can approximate almost any functional form or interaction on their own. Traditional statistical techniques rely on the modeler to specify the functional form and interactions.

        Data mining is the application of these techniques to common business and maintenance problems in a way that makes these techniques available to the skilled knowledge worker. Data mining is a tool for increasing the productivity of people trying to build predictive models.

        Data mining criteria

        The many data mining applications have some things in common. Not all applications have every element. However, for our purposes, data mining relies on the following for a basis:

      • Data warehouse, archive, and/or historian

      • Data model

      • Online analytical processing (OLAP).

        • The data warehousing concept places equal emphasis on where data are stored and the content of those data. Working in concert with query tools and report generators, these repository and analysis tools can bridge gaps between CMMS/EAM and predictive maintenance and reliability- based tools.

          Fig. 2. Data mining can answer questions regarding the maintenance of equipment in your plant. Getting good data into the database is extremely important. Results are numbers at an intersection. Headers are hierarchical in this data minign program.


          OLAP is a category of software technology that enables users to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information. This is information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

          OLAP functionality is characterized by dynamic multidimensional analysis of consolidated enterprise data supporting end-user analytical and navigational activities including:

        • Calculations and modeling applied across dimensions, through hierarchies, and/or across members

        • Trend analysis over sequential time periods

        • Slicing subsets for onscreen viewing

        • Drill-down to deeper levels of consolidation

        • Reach-through to underlying detail data

        • Rotation to new dimensional comparisons in the viewing area.

          • OLAP is implemented in a multiuser client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various “what-if” data model scenarios.


            But what does all this mean to the plant engineer? Suppose there are higher than normal motor failures at a manufacturing plant. What do these motors have in common? Are they from the same area? Are they the same type of motor? Are they from the same manufacturer? Have PMs been performed? There are many ways to slice and dice data to derive a satisfactory answer. The key is to have the data on hand.

            Data mining can help answer these questions and many more. How many more? Just about as many as you can think of that would make good business sense for a maintenance department (Fig 2).

            One of the goals of data mining is to allow data to be analyzed by nonprogrammers. To accomplish this, some initial work must occur.

            Equipment information comes first. This is the biggest hurdle for most plants. Many CMMS packages know that a piece of equipment was sent for repair, but have no information about the installation of that equipment. Many plants keep their own information in spreadsheets, but most of the spreadsheets are pet projects often begun by someone else and abandoned after a retirement or personnel transfer.

            There is the issue of what people have actually put in the program. If accurate start and end dates of the equipment installation are in the database, you can perform time-period analysis, which you can’t do if all you have is the number of days the equipment was installed.

            Asset identification is another problem. If General Electric is stored as “GE,” “G.E.,” “General Elec.,” and “General Electric,” the software does not know that these are really the same manufacturer. Some cleanup is necessary.

            If you send a piece of equipment for repair from an installation location, a good data mining program should prompt you to enter the reason you’re sending the equipment for repair.

            The importance of reliability

            Predictive maintenance is used so that companies can gain a higher degree of reliability. This does not mean that spending money on expensive predictive maintenance solutions is right for every plant. Equipment reliability is not the software; equipment reliability is the intelligent use of knowledge and information to reduce mean time between failure (MTBF) using tools such as root cause analysis. If data mining is cost effective in promoting reliability, then it is a good investment. If not, then don’t waste your money.

            Fig. 4. Boiler feed water pump

            The success of any reliability strategy depends on the depth and precision of the data collected, how it’s organized and categorized, and the analytical application, which facilitates the comparison of apples to apples.

            One company’s way to ensure accurate data is to capture equipment specifications, evaluate operational conditions, and inventory each machine’s assemblies and subassemblies. The data are then arranged according to a mechanical hierarchy using noun modifiers to differentiate equipment by their specific engineering and operating characteristics. This approach enables any targeted performance data to be extracted from tens of thousands of historical records, regardless of industry and/or facility, to provide meaningful, accurate direction for the maintenance management of those assets. The result of this analysis is critical to maintenance managers in evaluating equipment reliability, availability, and maintainability.

            Predictive maintenance

            Predictive maintenance also means different things to different people. It is usually quantified in dollars, whether translated through regulatory requirements or unexpected equipment downtime. The level, or even the existence, of predictive maintenance depends on the type of equipment to be maintained, the level of regulatory compliance required, the cost of unplanned downtime, and the amount of money a business desires to spend to know when a critical piece of equipment will break.

            Some equipment requires a deeper level of predictive maintenance than others do, especially if the plant has backup equipment that is easily deployed. The criticality of maintenance on some types of equipment tends to be less when plants have ample equipment backup and redundancy, ensuring adequate capacity and maintenance flexibility.

            If equipment maintenance costs seem to be escalating, data mining can be employed to track statistical similarities among the equipment. Are there commonalties in manufacturers of motors, drives, filters, belts, or other components that correlate with MTBF rates? Do motors or belts from one manufacturer fail more frequently than from another?

            Another approach

            Another data mining approach uses sensor data archived in data historians to develop multivariable models of critical rotating equipment, such as pumps, turbines, generators, and engines. By developing a model of normal operation, then comparing sensor data with the multivariable model, anomalies are announced on watch lists, and failures can be spotted before they become catastrophic.

            In real time, this technology generates a dynamic band around each signal, using an empirical model to generate an estimate for each sensor based on the value of all other sensors (see “Data mining historical sensor data”). Signal excursions outside of this dynamic band provide the earliest possible warning of trouble — well within the normal operating ranges for each signal.

            The science behind this method applies nonparametric regression, which provides high-fidelity modeling capabilities for early detection of abnormal performance. The process models normal equipment performance through its full dynamic range of operation to establish a threshold for the earliest possible warning. It is exception driven, meaning it does not report normal operation — only deviations from it.

            This technology is ideal for critical rotating equipment, and industrial problems such as:

          • Compressor fouling, erosion, or tip rubs

          • Compressor brush seal leakage

          • Bleed flow problems

          • IGV problems

          • Combustor transition piece failures

          • Plugged fuel nozzles

          • Combustor bypass valve problems

          • Damage to turbine blades

          • Stator cooling system problems

          • Lube oil system problems

          • Instrumentation failures and drifts

          • Operating inefficiency

          • NOX and CO anomalies.

            • CMMS and data mining

              Some of the newer CMMS/EAM software includes knowledge management tools. These systems not only manage the work orders to ensure timely completion — in priority order, but also collect the history of what has been done.

              Work flow is used to manage the purchasing process, routing purchase orders through steps to get necessary information such as approved vendors, pricing, and approvals, and finally, transmitting the order to the vendor.

              Workflow engines direct the work (Fig. 3). For example, workflow could be used to transmit the next highest priority work order to a wireless hand-held device on the shop floor. At the same time, it could retrieve specific work instructions if the task is an infrequently required bearing or motor change.

              Data from completed work orders are combined with shop floor data collection and condition monitoring to build a history of any piece of equipment in the plant. Data mining as part of a predictive maintenance module can identify trends indicating a need for equipment change or additional preventive maintenance

              However, a CMMS alone can fall short of data mining goals, unless they are specifically designed to provide the necessary information. Historically, CMMS programs have been poor at tracking failure causes and failure progression. The apparent reason for failure is usually the reason tracked, not the root cause.

              Linking and maintaining design documents, condition data, overhaul vendor documents, and maintenance procedures to specific equipment, overhauls, and locations are difficult or not feasible in a CMMS. Tracking specific equipment by tag number or serial number is difficult or impossible in a CMMS. Often a CMMS has the ability to track components but the task is so difficult and awkward that it cannot really be used.

              Analysis for reliability is not easy to accomplish in a CMMS. For example:

            • Where are low MTBF areas or equipment?

            • Where are high costs of replacement areas or equipment?

            • What are root causes of failure by areas or equipment?

              • Many CMMS systems do not provide for the tracking of new equipment or overhaul warranty. Solving this problem can save companies a lot of money.

                PLANT ENGINEERING magazine extends its appreciation to 24/7 Systems, Inc., HSB Reliability Systems, Indus International, and SmartSignal for the use of their materials in the preparation of this article.

                Data mining historical sensor data

                Boilers, turbines, generators, and other rotating equipment are some examples of high-value assets that cause costly production problems when they fail or require unplanned maintenance. To study such failures, SmartSignal, Lisle, IL, recreated a data mining model of a turbine-driven pump failure with estimates of vibration, pressure, and flow.

                Historical process data (shown below) are from a boiler feed water pump that experienced an unplanned failure. A simulation of the failure required data mining software to demonstrate that it could not only identify the problematic component, but also detect the signs of incipient failure earlier than traditional plant monitoring technology.

                The model used to recreate the failure was derived from the actual failure data. Multivariable sensor data from data historians show conditions prior to and during equipment failure. Since a turbine drove the pump, sensor data included turbine rotor vibration of the inboard bearing, turbine first stage pressure, as well as the pump discharge flow rate.

                The pump and turbine models were applied to three weeks data to detect early warning of failure. Neither the pump model nor the pump-turbine efficiency model revealed signs of failure. However, the turbine model provided interesting results.

                Both turbine vibration signals alerted briefly following a pump startup, and then alerted continuously following the pump startup two days later. This continued until the turbine failed catastrophically 6 days later. The vibration jumped to 7 mils and tripped the high vibration interlock shutdown.

                The early vibration alerts are notable because they flagged process variations of only 0.5 mils, which are within the turbine protection system’s vibration threshold limits. They came 6 days before the catastrophic failure.

                The ability of the software to analyze the critical data and isolate the specific failure mode could have provided early warning of an impending failure. By taking action, the plant maintenance department could prevent a catastrophic failure. Early action reduces the maintenance expense by limiting the extent of damage and minimizing the duration of the forced downtime.

                The top graph shows turbine rotor vibration; the middle graph shows turbine first stage pressure; the bottom graph shows pump discharge.