Analyze oil thoroughly
Individual results are not individual when it comes to maintaining the health of your machinery; this is why
- Individual statistics in an oil analysis report don’t exist in a vacuum but closely correlate with each other.
- The common element of many erroneous report interpretations is poorly managing the time factor.
- Laboratories are often blamed for machine failure when the equipment manager doesn’t read the report thoroughly (or at all).
There is a tendency to focus on one aspect of an oil analysis report and ignore everything else. Examples include looking only at the most recent sample, only trending one test, etc. By not looking at the sample results thoroughly, the results can become meaningless or even detrimental to determining the root cause of a machinery problem. By missing one piece of the puzzle, technicians end up working in the wrong direction, which wastes time and money.
Individual results in the sample report are not actually individual. Every item of data correlates to another item of data to form the big picture.
Mistakes and consequences
Experts from all fields that work with oil analysis have seen common mistakes that often lead to dire consequences for the machinery. Darren Goll, Reliability, CNRL Albian Sands, is one of those experts.
“One of the major mistakes by many customers I’ve observed is overreacting to a sample with one reportable element,” Goll said. “If there are no severe elements or fluid breakdown in a sample, many times the customer can either change the oil and keep sampling to develop a trend or just run the component and take frequent samples to quickly establish a trackable and historical trend and then take action accordingly. This gives them valuable historical data for the future.”
Michael Potts, CET, CMRP, regional account manager at Fluid Life, cites looking at elements in isolation as an example. “There are many mistakes I see made every day when interpreting used oil analysis results,” he said. “The most common one is not looking at the whole picture. People tend to focus heavily on the colors or flags of one singular test as opposed to looking at the entire report holistically. This can lead to serious mistakes, like tearing an engine apart at the first sign of copper or condemning perfectly good oil because oxidation values are elevated. By only focusing on pieces of data, people lose sight of the whole picture and then blame the lab for flagging things that might not yet be a serious problem.”
According to Potts, “The second most common mistake is jumping to conclusions based on a single sample result as opposed to monitoring a trend of data. A single bad result doesn’t tell you if the condition is abnormal, because without more data you don’t truly know what normal is. When starting a used oil analysis program, the first step should be to establish what the normal operating conditions are so going forward you can confidently recognize when something is irregular.”
Naimesh Vadhwana, M.Sc., P.Eng., pointed to neglecting to note the age of both the oil and component being tested. “Among the most serious mistakes is not understanding the age of the oil/component,” he said. “This leads to assuming a sample is clean, but if the oil was only in service for a few hours (i.e., three to four hours instead of a few shifts) the contaminants do not show up. Conversely, when the sample duration is two to three times the normal change interval, extra contaminants can show up, leading one to believe the sample is worse than it really is. Oil in a new component (first change) may still have assembly grease/oil in it leading to elements from the assembly process being flagged; an older component may be breaking down (or closer to failure) than a newer component so the oil might be expected to show more contamination.”
Vadhwana said looking at elements individually can lead to troubleshooting the wrong part of a component. For example, when looking at copper flagged as an individual element, one can miss brass/bronze elements, which would be detected by looking at tin and lead in conjunction with copper. Another issue he cites is not knowing the interval between the sample date and interpretation date. Especially if a negative result would require immediate action or follow up testing.
Part of the issue lies with the oil analysis process itself, and another part lies with poor interpretation of the results. Roger L. Young, CLS, senior field technical advisor, Imperial, said, “Some common mistakes may include rushing through the report, only paying attention to sample ranking or generic comments or taking the wrong action with insufficient data. I think it takes time and experience to feel comfortable interpreting the data and making the right choices with the data provided.”
Young added that some pieces of equipment have different “personalities” when compared to another of the same model and learning these personalities takes time. “Some assets will tend to make more iron, copper, lead, soot, fuel, etc., than others, so it is important to make sound decisions regarding oil changes or maintenance triggers,” he said. “Acting too quickly may result in throwing away perfectly good oil, but not being reactive enough may result in premature wear and asset damage. It can be difficult to find this balance, but once it is achieved the result should be improved reliability at the lowest cost.”
Goll cites automatically defaulting to another oil change when a severe sample report comes back as a common error. “The next sample reports are the same; then they change the fluid again and so on,” he said. “The result is they never repair the problem and never get useful data. Then the component fails early and the owner wonders why.
“If the customer ignores or misses the warning signs that may be present in the sample trend and continues to run the component, ignoring these trends, this will no doubt lead to early component failure. Also, very often, as a result, they will make the unfair assumption that fluid sampling is useless and stop doing it.”
There are many reasons end-users don’t thoroughly review and interpret the report — one is sheer lack of time. “It seems that most, if not all companies, are trying to do more with less,” Young said. “This is true with human resources as well.”
According to Young, “A few years ago, a maintenance and reliability department for a larger site might have had two to four reliability engineers dedicated to oil and vibration analysis, and the program produced valuable information regarding equipment health and predictive maintenance practices. It seems the trend has been to make these departments smaller and add more tasks to the roles. I would think the most common cause of not using the data correctly is simply lack of resources at the site level.”
Overlooked data sources
In addition to not knowing how much time has elapsed between samples, a big issue is not looking at the report data showing how much time has elapsed between samples. In other words, the data is there, it’s just not taken into account. “Many people only look for the flagged elements and miss the time factor between samples,” Vadhwana said. “By missing the time factor, a sample may appear to have ‘cleaned up,’ but in fact the duration between samples may have been significantly different.”
Potts cites additive chemistry as a frequently missed statistic. “The most overlooked source of useful data in an oil analysis report is definitely the additive chemistry,” he said. “Like a fingerprint, additive chemistry in an oil is unique. Additives can tell you valuable information such as if your sample has been mislabeled, if the oil has been topped off incorrectly or if your maintenance team isn’t using the proper oil type specific to the application. Most people see viscosity flags and just assume there is a problem, but 80% of the time, it’s just labeled incorrectly. You can confirm this by looking at the additives.”
According to Goll, the most overlooked source of important information is how all the data relates, i.e., what soft wear metal is present along with hard wear metal and what part in the component that mix of wear metal identifies. Is it normal break-in wear or an actual early internal component failure? “This happens because that analysis judgment call requires more experience than anything else,” Goll said. “It can take multiple years of experience for successful interpretation. Unfortunately, that skill can’t be taught in a few weeks because very few vendors publish that info; it takes time to learn the trends for each type and brand of machinery or equipment.”
Bad data sources
Sometimes the data in the oil analysis report is bad, but there are a lot of reasons for this. “The most common source of bad data is inputting incorrect hours in the system,” said Potts. “Ensuring correct oil hours are input into the system is one of the key elements of any successful oil analysis program. Oil analysis programs are put in place to extend component life, extend lubricant life and catch in-progress failures. If you do not have the proper component or oil hours, it is next to impossible to be able to confidently extend component or oil life. If the correct hours are not logged at all, it makes interpretation of the data extremely complicated for end-users. Once hours start getting recorded incorrectly or not at all, it is such a large undertaking to fix that; it may never get fully resolved.”
Young said while bad data can originate from more than one source, it starts with sample quality. “Oil analysis laboratories can make mistakes, but it is really quite uncommon,” he said. “Flawless sample collection practices need to be followed, including accurate data collection, cleanliness, sample location and repeatability. For instance, if a sample is labeled with a wrong product name or it is contaminated with dirt or water, it is not worth sending in for analysis. The analysis performed will not be usable, and the money spent will be wasted. This can even be more serious if you are trending a certain pattern on a piece of equipment, a bad sample can skew the data or cause unwanted delays while a new sample is collected and analyzed.”
Goll thinks the most common source of bad data is poor sample collection. “If contamination (water/debris) is allowed to enter the sample or the sample is taken from the wrong source, it’s a waste of the sample and a waste of time,” he said. “Also, if the sample is incorrectly entered in the database — the wrong unit or component or the wrong fluid — it also is a waste of resources. Many customers spend thousands of dollars on troubleshooting non-existent problems because the sample entry data was incorrect. A good sample is worth the price of the component and the unit it was drawn from. It’s the same old cliché: ‘If it’s worth doing, it’s worth doing correctly.’”
Vadhwana agrees with Goll. “The most common source of poor data is bad/inconsistent sample methods. If the sampling ports and/or containers have not been handled correctly, the results could indicate a bad sample, while, in fact, it was the container that introduced the contamination,” he said.
An obvious solution to all the above is thorough report interpretation. This is where Goll is on the front line. And when it comes to meaningful report interpretation, experience matters. “In my many years in maintenance and reliability roles, I have reviewed reportable and severe samples and also reviewed the maintenance interval cycle per sample as well as the component health,” he said. “I look at onboard software event reporting as well as reliability real-time analysis software reporting to analyze the component operation.”
Goll said, “Most equipment vendors today offer software to record historical operational data as well as real-time data to monitor the equipment. For example, if a component is running hot or cold, you may see wear elements or fluid breakdown not caused by internal failure; it is only an undesirable operational condition that can be corrected. If the fluid is run over the recommended service interval, it might not be a problem at all; the intervals might only need to be reduced.”
An added benefit of thorough report interpretation is it may result in operators being able to extend the fluid life and drain intervals.
Vadhwana believes most responsibility for interpreting the report lies with the provider, since the provider flags samples and the customer generally only looks at the flagged results. The provider is assuming the sample was taken correctly and labeled accurately.
Young said the responsibility lies with both the provider and customer. “The oil analysis provider can offer generic results interpretations based on experiences and typical trends. However, it is the customers who should have somewhat intimate knowledge of their assets and trends that are occurring,” he said. “This scenario is extremely important, and the foundation of a successful maintenance and reliability program. If results are not acknowledged, analyzed and acted on when appropriate, it may not be worth the time, effort and money to be sending in samples. A successful oil analysis program is all about watching trends over time to help judge equipment health, and not an indicator of catastrophic failure occurring.”
In theory, if a customer doesn’t understand an aspect of a report, he or she wouldn’t hesitate to reach out for help. In practice, this is not always the case.
“It can go either way,” Potts said. “Some users reach out to me or customer care with questions when they arise. Others may not be aware of the importance of certain elements of oil analysis such as flagging, or they may not be aware of the additional resources available to them. These users tend to become frustrated and begin to ignore the results. When this happens, users eventually lose faith in the program altogether and usually stop sampling. When I have a new customer startup, I discuss and interpret the initial results with them so they can see the value in the data.”
Some experts would like to see better oil analysis training at every level. Goll believes a good start would be including fluid analysis in trade and post-secondary schools as a component of relevant programs.
Most commercial oil analysis labs, and even some reliability service providers, offer training courses to aid their customers in every aspect of an oil analysis program — from pulling and labeling samples to interpreting the reports, or even training toward STLE’s OMA I certification. “We have a customer care department our customers can contact at no charge to discuss and understand their results,” Potts said. “We have account managers who do complimentary data interpretation training with customers. And we also have paid services that go over and above our lab services to provide customers with direct recommendations and manage their overall program.”
Goll advises those new to the field to enroll in a quality fluid analysis training program. “This is with the understanding that it takes many years to hone those skills,” he said. “They also should insist that their workplace support fluid sample collection and database training and then follow up with refresher training.”
The best advice is to prevent missing anything by reading the whole report and decrease the time spent reading reports by avoiding partial interpretations along the way.
Oil analysis in the real world
On the surface, the entire oil analysis process seems simple: Select the equipment, take the sample, send it to the lab and wait for results. In the real world:
- Equipment selection is random, and based on what one person thinks should be tested.
- Samples are taken from the locations with the easiest access and not from where they best represent the oil in use.
- Samples sit around for several days before anyone sends them to the lab.
- The lab doesn’t process the sample in a timely fashion.
- The lab report is formatted in a way that assumes customers have a background in statistics.
- The lab report doesn’t really matter anyway because either no one reads it thoroughly or it doesn’t get read at all.
- No one follows through on obvious action items.
Choosing machinery to monitor — The criticality index
The criticality index determines the logical extent of condition monitoring required for a piece of equipment. It considers such factors as:
- The importance of the machine’s function
- Whether or not there is another piece of equipment that can take over the function if that piece of machinery fails
- The overall impact of downtime
- The projected repair cost.
This index assigns all machines to one of the following three categories:
Critical machinery. These machines are so important, the rest of the operating environment cannot function without them, i.e., power plant turbines. Equipment in this category requires complete online and (where possible) inline condition monitoring — regardless of cost. The specifics of monitoring often are included in insurance policies and warranties. This equipment is a prime candidate for predictive maintenance.
Essential machinery. This equipment is key to the operating environment, but its failure does not cripple operations. Sometimes equipment that falls into this category is considered critical if not for the fact that a backup piece of equipment is readily available. While testing is not as important as it is with critical machinery, it is recommended to prevent costly repairs and inconvenience.
General purpose machinery. The balance of operations equipment falls into this category. These machines usually are monitored informally and periodically.
This article first appeared in Tribology & Lubrication Technology (TLT), the monthly magazine of the Society of Tribologists and Lubrication Engineers, an international not-for-profit professional society headquartered in Park Ridge, Ill., www.stle.org. The STLE is a CFE Media content partner.