The impact of imperfect manual testing on safety systems

Safety instrumented system standards (IEC 61508 & 61511) are performance based. Essentially, the greater the level of process risk, the better the systems needed to control it. A variety of techniques may be used to determine the level of performance (referred to as Safety Integrity Level, or SIL) needed of safety instrumented functions.


Safety instrumented system standards ( IEC 61508 & 61511 ) are performance based. Essentially, the greater the level of process risk, the better the systems needed to control it. A variety of techniques may be used to determine the level of performance (referred to as Safety Integrity Level , or SIL) needed of safety instrumented functions.

A variety of techniques may also be used to analyze system performance to see if the hardware meets the SIL targets that have been assigned. These modeling techniques %%MDASSML%% often referred to as SIL verification %%MDASSML%% account for many different factors such as failure rates, failure modes, quantities, levels of automatic diagnostic coverage, manual test intervals and more. Assumptions made during modeling can have a dramatic impact on the overall answers. For example, assuming 95% manual test coverage (rather than the optimistic 100% often used) results in an average 40% reduction in performance; assuming 90% results in an average 57% reduction.

The impact of manual test coverage has a significant impact that should be accounted for in system modeling. The results also indicate that manual testing needs to be as realistic and thorough as possible.

Failure modes

Safety instrumented system (SIS) hardware is normally dormant (e.g., an isolation valve remains open for long periods of time). Such hardware may fail in two ways: safe and dangerous. Safe failures result in nuisance trips and lost production (e.g., the valve slams shut because the normally-energized solenoid coil burned out; there was no process hazard). Dangerous failures result in the system not being able to perform its safety function (e.g., the solenoid de-energizes, but the valve is stuck open and does not close on demand).

Automatic diagnostics, manual testing

Some hardware has built-in automatic diagnostics to detect dangerous failures. For example, PLCs can detect a variety of internal problems such as being stuck in an endless loop. However, automatic diagnostics can never be 100% effective. Some devices, for example non-intelligent sensors and valves, have no automatic diagnostics at all. For example, a solenoid-operated valve cannot tell by itself whether it is stuck open.

All safety devices must be manually tested in order to detect potentially dangerous failures. SIL verification calculations are based on a manual test interval. The more often devices are tested, the quicker potentially dangerous failures can be detected and repaired, which results in better safety performance. The performance requirements for a safety instrumented function, as listed in the ANSI/ISA 84 standard, are shown in Table 1.

Safety Integrity Level (SIL) Probability of Failure on Demand (PFD) Risk Reduction Factor (RRF = 1/PFD)
4≥ 0.00001 to & 0.0001> 10,000 to≤ 100,000
3≥ 0.0001 to & 0.001> 1,000 to≤ 10,000
2≥ 0.001 to & 0.01> 100 to≤ 1,000
1≥ 0.01 to & 0.1> 10 to≤ 100

Assuming no automatic diagnostics, the probability of failure on demand (PFD) of a non-redundant device is calculated using the following formula:

PFD =λ d * ( TI m / 2 )

Where:λ d is the dangerous failure rate

TI m is the manual test interval

If a device has a level of automatic diagnostics, then dangerous failures are split into two categories: dangerous detected and dangerous undetected. Accounting for automatic diagnostics, the PFD calculation becomes:

PFD = (λ dd * ( TI a / 2 )) + (λ du * ( TI m / 2 ))

Where:λ dd is the dangerous detected failure rate

TI a is the automatic diagnostic test interval

λ du is the dangerous undetected failure rate

TI m is the manual test interval

In most every case, the PFD due to automatic diagnostics is insignificant compared to the PFD due to manual testing (usually by two orders of magnitude) and can therefore be ignored.

What is 'manual test coverage'?

Assuming that manual testing is 100% effective is unrealistic. For example, full or partial stroking of a valve does not determine whether the valve will seat properly or whether it might leak. Partial stroking will not determine whether the seat is eroded or whether there is a welding rod stuck in the valve.

Testing the electronics of a sensor does not determine whether the sensing element itself is responding properly. Removing a sensor and testing it in a laboratory or maintenance shop does not determine whether the sensor will respond properly in the actual process. (Stories have been told at conferences where they did not.) Testing a level float switch by moving the float with a rod will not ensure that the float will actually float. So in reality, manual testing %%MDASSML%% referred to as manual test coverage by some %%MDASSML%% is not 100% effective in detecting all possible failures. This can be accounted for in these calculations:

PFD = (λ dd * ( TI a / 2 )) + (λ du * ( TI m / 2 )) + (λ dn * ( Life / 2 ))

Where:λ dd is the dangerous detected failure rate

TI a is the automatic diagnostic test interval

λ du is the dangerous undetected failure rate

TI m is the manual test interval

λ dn is the dangerous never-detected failure rate

Life is the proposed life of the hardware.

In other words, some dangerous failures will remain in the system for the life of the system.

Imperfect manual testing

The impact of imperfect manual testing is significant. Calculations may be done by hand or by using a number of commercially available programs that do account for imperfect manual testing. The impact is the same regardless of the hardware or configuration.

In other words, calculations for single devices (switches or valves), triplicated transmitters with 99% automatic diagnostics, dual valves, valves with partial stroke testing, all reveal the same results.

If the manual test coverage drops to 95%, the risk reduction is reduced by an average of 40%. If the manual test coverage drops to 90%, the risk reduction is reduced by an average of 57%. These results assume a 15 year life and yearly manual testing. Assumptions that can vary the final answer by a factor of two are significant enough to warrant paying attention to.

SIL performance targets vary by an entire order of magnitude (as shown in Table 1, previous page). A system with perfect manual testing and a risk reduction factor of 400 will be reduced to 170, assuming 90% manual test coverage.

Both numbers are in the SIL 2 range. However, a system with an initial risk reduction of 200 will be reduced to 86 assuming 90% manual test coverage. This is enough to slip from an assumed SIL 2 level of performance, down to an actual SIL 1 level of performance.

Thorough manual testing

The results summarized here show that manual testing needs to be as realistic and thorough as possible or else intended performance levels may not be met. The goal is to have the manual test coverage percentage as close to 100% as possible. While most will accept that manual test coverage is rarely 100% (just as automatic diagnostics can never be 100%, and no redundant system can have 0% common cause), determining an accurate assessment of the manual test coverage percentage is problematic.

Until detailed failure rate data is accumulated %%MDASSML%% which is certainly possible considering the databases that most users now have available %%MDASSML%% estimating the manual test coverage may remain a SWAG (Scientific Wild Ass Guess) for the time being.

Author Information
Paul Gruhn, PE, CFSE is the training manager at

The ISA 84 committee wrote a technical report in 2002 titled Guidance for Testing of Process Sector Safety Instrumented Functions (SIF) Implemented as or Within Safety Instrumented Systems (SIS) (ISA-TR84.00.03 %%MDASSML%% 2002). As the name would imply, this 222-page document describes methods for testing safety devices. The document is currently being re-written by the committee.

The Top Plant program honors outstanding manufacturing facilities in North America. View the 2015 Top Plant.
The Product of the Year program recognizes products newly released in the manufacturing industries.
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
Doubling down on digital manufacturing; Data driving predictive maintenance; Electric motors and generators; Rewarding operational improvement
2017 Lubrication Guide; Software tools; Microgrids and energy strategies; Use robots effectively
Prescriptive maintenance; Hannover Messe 2017 recap; Reduce welding errors
The cloud, mobility, and remote operations; SCADA and contextual mobility; Custom UPS empowering a secure pipeline
Infrastructure for natural gas expansion; Artificial lift methods; Disruptive technology and fugitive gas emissions
Mobility as the means to offshore innovation; Preventing another Deepwater Horizon; ROVs as subsea robots; SCADA and the radio spectrum
Research team developing Tesla coil designs; Implementing wireless process sensing
Commissioning electrical systems; Designing emergency and standby generator systems; Paralleling switchgear generator systems
Natural gas engines; New applications for fuel cells; Large engines become more efficient; Extending boiler life

Annual Salary Survey

Before the calendar turned, 2016 already had the makings of a pivotal year for manufacturing, and for the world.

There were the big events for the year, including the United States as Partner Country at Hannover Messe in April and the 2016 International Manufacturing Technology Show in Chicago in September. There's also the matter of the U.S. presidential elections in November, which promise to shape policy in manufacturing for years to come.

But the year started with global economic turmoil, as a slowdown in Chinese manufacturing triggered a worldwide stock hiccup that sent values plummeting. The continued plunge in world oil prices has resulted in a slowdown in exploration and, by extension, the manufacture of exploration equipment.

Read more: 2015 Salary Survey

Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Society for Maintenance and Reliability Professionals an organization devoted...
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
The maintenance journey has been a long, slow trek for most manufacturers and has gone from preventive maintenance to predictive maintenance.
Featured articles highlight technologies that enable the Industrial Internet of Things, IIoT-related products and strategies to get data more easily to the user.
This digital report will explore several aspects of how IIoT will transform manufacturing in the coming years.
Maintenance Manager; California Oils Corp.
Associate, Electrical Engineering; Wood Harbinger
Control Systems Engineer; Robert Bosch Corp.
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
click me