Using software in predictive maintenance, part 1: Understanding maintenance fundamentals

Michael Hardy and Ed Garibian cover maintenance fundamentals and how they help engineers improve their efficiency as well as what total productive maintenance (TPM) can do.

By Plant Engineering November 8, 2023

Predictive and preventive maintenance insights

  • Proactive maintenance strategies, when coupled with a computerized maintenance management system (CMMS), enhance operational efficiency by enabling advanced preparation, preventing costly downtime, and ensuring regulatory compliance through effective maintenance planning.
  • Utilization-based techniques in maintenance scheduling, facilitated by a CMMS, optimize asset ownership costs and equipment uptime. This approach, reliant on data-driven decision-making, requires proper system setup and accurate asset registry.
  • Total productive maintenance (TPM) is a comprehensive methodology focused on reducing maintenance costs and increasing production output. Autonomous maintenance, a key pillar of TPM, involves operators using digital checklists to monitor machine health, facilitating two-way communication and proactive maintenance actions.

The benefits of proactive maintenance strategies, coupled with a computerized maintenance management system (CMMS), can drive a plant’s efficiency, and optimize the use of resources.

Creating automated preventive maintenance schedules is the industry standard for proactive digital maintenance management. However, once preventive maintenance becomes the norm, plants can push further and enhance their operations by instituting proactive maintenance as a company-wide responsibility. This extends a plant’s workforce far beyond the handful of dedicated maintenance workers to other departmental staff.

One of the best ways to minimize asset cost of ownership and maximize plant equipment uptime is by implementing sound enterprise asset management (EAM) principles and integrating them with a well-defined technique that lead to many sources of proactive and predictive maintenance.

Michael Hardy, vice president of asset management at Bureau Veritas and Ed Garibian, CEO of Llumin, go over maintenance fundamentals on what engineers need to know from the Oct. 17, 2023, webcast: “Using software in predictive maintenance.” This has been edited for clarity.

Michael Hardy: I want to talk a bit about what maintenance is, and certainly folks on this, listening in today, are experts in that. But just so we’re all on the same page using the same terms, I think the first thing that I would like us all to agree on today is that the idea of if it ain’t broke, don’t fix it, is definitely not the topic for today and probably shouldn’t be in anybody’s playbook for 2023. Even if we were only here talking about sustainability, that alone should be reason enough to abandon any kind of run to failure asset management strategy.

So recurring maintenance, what we’re talking about today, we’re going to use a lot of terms that sound a lot alike. And of course, all maintenance is by nature recurring, otherwise you’d be fixing things as they broke. But proactive approaches to maintenance really are essential for the longevity, optimal performance of assets. And the benefits of a proactive strategies are really obvious to anyone who’s responsible for engineering’s operations in really any industry. Let’s highlight some of those benefits.

Advanced preparation, of course, that allows for scheduling of maintenance to prevent downtime, ensure uninterrupted operations, which is everyone’s goal. That also allows us to plan the activities so we can effectively allocate resources. And so we have the right people, the right tools, the right materials to do the job when it’s scheduled, and not get halfway through it and realize you don’t have something. Preventive maintenance, for those of us who work in this industry, it’s not expensive, it’s priceless. Costly, downtime, breakdowns, inefficient operations can all be avoided as well as the costs associated with those. As you know, not everybody’s setting budgets agrees that that’s the case, but I think today we’ll really talk through how it really should be obvious even to them.

And finally, really safe operations and regulatory compliance also rely heavily on effective proactive maintenance strategies. So whether we’re talking about CGMP regulations that maybe are enforced by USFDA for a whole host of food and medical related industries or OSHA, EPA, Joint Commission if you’re in healthcare, ACCME accreditation, all of these things really rely on effective, well-documented maintenance strategies that are of course, proactive. The benefits seem obvious, but far too often engineering departments are underfunded, short-staffed, leaving less time for preventive proactive maintenance activities.

How do we get from there to optimization? Optimizing recurring maintenances is all about getting the most benefit from the maintenance activities that we perform. And one size here does not fit all. Let’s talk a little bit about some of the terms here as well. So we have predictive maintenance and preventive maintenance. They’re not the same thing, although they share both similarities and many differences. Preventive maintenance is scheduled based on accepted intervals between maintenance activities, and I always think about this as changing the oil in the car. We understand it’s got to happen, we’re going to get it on a schedule and do it.

Predictive maintenance optimization

Predictive maintenance, on the other hand, relies on maybe sensors or inspections, some kind of feedback to the system that allows us to time our maintenance activities to the ideal indication of condition. This allows us to perhaps stretch out our maintenance activities, not do them as often, or it may indicate based on some factors we’ll talk about in a moment, that we need to be doing maintenance more often.

Optimizing recurring maintenance relies on a combination of both preventive and predictive strategies. Again, one size does not fit all. Preventive maintenance sometimes can feel like the poor relations of predictive maintenance because there’s so much focus these days on IoT sensors, high technology and the kinds of solutions that we’d all think are pretty cool and see at conferences. But one proactive strategy, we’re going to spend a minute on, some people consider sort of a bridge between the two, and that’s a strategy that relies on utilization data to drive the preventive maintenance schedule and therefore allowing us to optimize that.

Utilization-based techniques are far superior to pure calendar centric models. They give us more information to go by and they help us to refine the schedule that we have. And on the other hand, or maybe also, taking advantage of utilization techniques requires a level of maintenance management sophistication and maturity, which is really where having a Computerized Maintenance Management System is essential in keeping track of all of that.

When we talk about utilization driven recurring preventive maintenance, the concept is very simple, if we use the car example again, the dealer may instruct us to do a major maintenance at 20,000 miles, and that’s really expensive. Usually it’s the first big one that we have to do. But imagine if it wasn’t at 20,000 miles, but instead they said you have to do it at six months regardless of how far you’ve driven. So that wouldn’t make sense to us. Applying utilization makes a lot of sense when we’re talking about maintaining our machinery and equipment. There’s quite a bit of our equipment that we can maintain more effectively by adopting this kind of model.

The use of a computerized maintenance management system (CMMS), of which there are many different iterations of that, and they’re set up almost uniquely sometimes at each site, but this greatly enhances the ability to manage utilization driven PM. But proper setup is really the key. So, operations and engineering staff need to be willing and able to invest the time either on their own or working with a consultant to make sure that the asset registry is complete and accurate. The utilization is being measured at the right tier of the asset hierarchy. You can’t manage what you don’t measure, but if you’re measuring the wrong things, it tends to be even worse because you’ll be managing your utilization to essentially the wrong standard.

Also, we can further refine the utilization driven maintenance schedule, certainly for some equipment by considering additional variables, we might be looking at how the asset’s being used. If it’s a vehicle asset, how that’s being used is a big deal. Is it carrying loads? Is it on nice smooth roads or doing off highway work, climbing up the mountain? These things affect some of the straight miles driven or hours used, and they may affect how we change those schedules.

Another thing that might allow us to change that also is maybe the products that are being produced, perhaps on a production line. A good CMMS will allow you to switch from one product to another, one use to another, and have all of the configurations, the setup, the procedures, the timing, all of that stuff will reflect the current use of that asset.

It’s really important to think about all these variables, but again, that requires a certain amount of commitment, sophistication. And if you don’t have that or you don’t have the time for that in your shop, then talk with your CMMS provider and make sure that you’ve found a way to address that, get some outside help or however you have to do it, because having a CMMS to do all of this and then not getting it set up properly is sort of worse than not having it maybe at all.

Effective maintenance management strategies also tie into this idea of the mean time to repair (MTTR), right? If we have an effective maintenance program, we’re using a combination of proactive strategies, preventive maintenance, predictive maintenance, utilization-based scheduling, and ultimately what we want to make sure that we take advantage of is the ability to minimize our MTTR. You don’t want to get halfway through maintenance, realize you don’t have the right materials, the spare parts aren’t there, not following the correct procedures.

You also have to need to understand that MTTR is really a multifaceted process that involves keeping all of our procedures, resources completely up to date, communicating effectively with all the parties that are involved in scheduling or performing those activities. And we have to be able to monitor our performance by continuously striving for improvement and keeping track of how that’s going at all times. CMMS software, when it is properly set up, configured and managed, is a major contributor to an effective preventive maintenance program.

So I’m going to switch this over and introduce Ed, who’s going to talk about another type of proactive maintenance strategy, which is called total productive maintenance (TPM).

The role of total productive maintenance in manufacturing

Ed Garibian: TPM, essentially is really, it’s a commitment by an organization. It’s a methodology that’s focused on improving and maintaining machinery or plant infrastructure, that sort of thing. And the cool thing about TPM is its aim is really to do two things. One, continuously drive down costs of maintenance, cost of ownership of the asset, but also simultaneously increase production output. It’s quite an impactful methodology. And these are the eight pillars of TPM and its philosophy. We’re going to talk about autonomous maintenance in a little more detail today. All of these pillars impact your maintenance strategies and contribute to elevating your overall maintenance processes. But we’re going to talk a little bit more in detail about autonomous maintenance because so impactful to our discussion today.

So what you’re seeing is an application that’s focused on the user experience, the operator experience, and a TPM or autonomous maintenance scenario. You’ve got on the right, you see the application embedded in a mobile device, running in a browser, could be a phone, tablet, whatever. The application’s actually embedded right into an HMI screen, an HMI application. The screen you see open is basically a TPM checklist, and this can be run at the beginning of a shift, once a day or whatever periodicity makes sense for the operation. And the form itself is configured based on obviously the machine that the operator’s using, the attributes of that machine, what’s important and may partly be driven by the process that’s running on the machine.

The types of questions or queries that can be presented are, I guess either one, certainly at the minimum, a checklist of making sure that the operator did whatever they were supposed to be before they get started. Another dataset might be taking entries based on observations that the operator’s making before they get started, environment observations, that sort of thing. And finally, measurements. Measurements can be recorded on the same form or as part of the same process.

Behind all that, as the data is collected, rules can be put around that data so that automatically, if there’s a attribute that’s collected that’s out of spec or that just needs to be looked at, automatic notifications can occur to either other folks in production or maybe the maintenance department or the engineering department. The other nice thing about this is that the tool should allow an operator at any time, not just during a TPM inspection or whatever, but anytime while they’re running the machine, an operator can easily create an immediate work request or work order based on, maybe an anomaly they observe or whatever. This really facilitates two-way communication and also near real-time communication when needed.

As part of the communication strategy, information can be presented back to the operator. So from engineering or maintenance, so upcoming maintenance schedules maybe, information about the asset O&M, documentation, safety documentation, et cetera.

In terms of the impact overall, the whole notion of TPM outside of just autonomous maintenance is a proactive strategy, no question about it. But as you can see from an autonomous maintenance perspective, just that one pillar produces so much of an impact because it facilitates a two-way communication between operations, maintenance and engineering. And autonomous maintenance can also, being so proactive, dramatically elevate any type of sustainability initiatives. It can give you opportunities to save resources, prevent loss, worker safety, and ultimately even looking at areas of energy spend and all that.

Michael Hardy: Okay, so with that foundation of total productive maintenance and understanding how that comes together. Next, Ed, let’s talk about adding in the predictive maintenance to the proactive maintenance strategy.

Ed Garibian: Yes, obviously adding an element in your maintenance processes of predictive technology and predictive maintenance is certainly a game changer. Obviously it has to be done under the right context, but it’s super impactful. Here are the tenets of that. Michael also alluded to this early in the presentation, condition monitoring sort of is the foundation. As you’re monitoring machine data, setting rules around that data, thresholds and then using predictive algorithms and modeling to determine outcomes. And then taking a risk-based approach to deciding when to invest in predictive maintenance is another part of what we’re going to chat about. And finally, once an outcome has been indicated that there’s potentially a fault or a failure, what actions to take and how to set those up.

Essentially, condition monitoring is ongoing monitoring of the health of an asset. Parameters have to be smartly selected based on the asset type and the machine type that indicate health. And so in some cases, temperatures, some cases vibrations, others, pressures or in certain scenarios based again on the asset type, there might be multiple conditions or multiple KPIs that have to be simultaneously monitored in order to really get a true indicator of the health of the machine. Either way, these things all have to be considered and managed in the system.

And then once the condition parameters are set up and monitored, now we have to start setting some rules around these to act against. One way to set up rules is strictly by thresholds. Whether it’s a single parameter or multiple parameters, you can set rules that just say, look, if these conditions or these parameters go to a certain level that’s too high or too low, set up triggers that either set alarms or notifications or create actions that are triggered based on these thresholds being violated.

Another way of doing this or using data, another use case would be not only based on absolute data thresholds and values, but combining real-time data values with historical data that’s collected. And then using predictive modeling to then say, okay, here’s a forecasted failure or forecasted degradation based on these models. And the models can be regression analysis, time series, and ultimately machine learning and AI, all can contribute to another way of looking at data and doing predictive modeling. Either way, rules are super important.

One factor that has to be addressed is the investment. So, how do you decide what assets to invest this technology into? Well, risk-based asset management gives us a framework that makes it a little easier to decide where to invest in. Really the most important piece of this, at least in my opinion, is to really get an understanding across the database of your assets is what is the cost of an asset going down? And that to me is something that you set up, and consequence of failures and their associated levels and ratings really will determine how to invest. And a consequence of failure is basically, like I said, what does it cost when this thing goes down, how much does it cost to fix, spare parts, labor or replacement? And most importantly, if there is a lengthy time to bring this asset back up and running, depending on its criticality to the operation, how much are we losing in production dollars and operations dollars? Those are all factors that contribute to a rating of consequence of failure.

Also, workforce safety might be an element. What happens? Is there a risk to our workforce if this thing goes down or risk to EPA compliance or whatever? These are all factors that have to be looked at. And then when they are set consequence of failure levels and ratings and then act on that. And over time, of course, as the asset ages and you’re monitoring its health, its performance over time, its conditions and all that, probability of failure will kick in as well. And that overall, risk plays a role in deciding over time what assets you’re going to be continually investing in on a predictive nature.

Once we’ve set up rules for predictive maintenance, we’ve decided which assets it makes sense to invest in, now we have to create some actions that actually are meaningful and just in time. If our predictive systems indicate a potential for failure, actions and rules have to be in place upon that incident, we get in front of it, we remediate and we prevent a downtime incident from occurring. And these actions are based on these four attributes. Obviously, we talked about criticality already and how important the asset is. Certainly, that’s going to affect what actions you take when there’s an impending fault, asset type, what type of asset it is, that’s going to affect what you do, how you react, what parameters, what organizational policies, as well. Certainly organizational policies are going to truly impact what gets done.

The thing about this, it has to be acknowledged is that it’s based on context. Even if the same exact incident, the same exact condition has occurred, that means that there’s something impending in terms of a failure depending on its location, depending on the time of day, depending on day of the week or whatever. These are all context attributes that impact how an organization will react. In other words, who gets notified, who gets assigned the work? Is it inside person? Is it a contractor? What suppliers do we engage if there are parts needed? All of that is based on an organizational strategy or policy, and that’s influenced by other aspects, not just the asset itself.

So an example of this might be setting up conditions, obviously conditions and rules around those conditions, and then using those attributes, those factors, those elements that we just talked about to form outputs and outcomes that include notifications, workflows, escalation, and ultimately actions and assignments. So these are all part of what I think makes a lot of sense and if you’ve invested in predictive technology, you make sure that one, those outcomes are indicated, you’re planning ahead and ready to act.