The impact of imperfect manual testing on safety systems

Safety instrumented system standards (IEC 61508 & 61511) are performance based. Essentially, the greater the level of process risk, the better the systems needed to control it. A variety of techniques may be used to determine the level of performance (referred to as Safety Integrity Level, or SIL) needed of safety instrumented functions.


Safety instrumented system standards ( IEC 61508 & 61511 ) are performance based. Essentially, the greater the level of process risk, the better the systems needed to control it. A variety of techniques may be used to determine the level of performance (referred to as Safety Integrity Level , or SIL) needed of safety instrumented functions.

A variety of techniques may also be used to analyze system performance to see if the hardware meets the SIL targets that have been assigned. These modeling techniques %%MDASSML%% often referred to as SIL verification %%MDASSML%% account for many different factors such as failure rates, failure modes, quantities, levels of automatic diagnostic coverage, manual test intervals and more. Assumptions made during modeling can have a dramatic impact on the overall answers. For example, assuming 95% manual test coverage (rather than the optimistic 100% often used) results in an average 40% reduction in performance; assuming 90% results in an average 57% reduction.

The impact of manual test coverage has a significant impact that should be accounted for in system modeling. The results also indicate that manual testing needs to be as realistic and thorough as possible.

Failure modes

Safety instrumented system (SIS) hardware is normally dormant (e.g., an isolation valve remains open for long periods of time). Such hardware may fail in two ways: safe and dangerous. Safe failures result in nuisance trips and lost production (e.g., the valve slams shut because the normally-energized solenoid coil burned out; there was no process hazard). Dangerous failures result in the system not being able to perform its safety function (e.g., the solenoid de-energizes, but the valve is stuck open and does not close on demand).

Automatic diagnostics, manual testing

Some hardware has built-in automatic diagnostics to detect dangerous failures. For example, PLCs can detect a variety of internal problems such as being stuck in an endless loop. However, automatic diagnostics can never be 100% effective. Some devices, for example non-intelligent sensors and valves, have no automatic diagnostics at all. For example, a solenoid-operated valve cannot tell by itself whether it is stuck open.

All safety devices must be manually tested in order to detect potentially dangerous failures. SIL verification calculations are based on a manual test interval. The more often devices are tested, the quicker potentially dangerous failures can be detected and repaired, which results in better safety performance. The performance requirements for a safety instrumented function, as listed in the ANSI/ISA 84 standard, are shown in Table 1.

Safety Integrity Level (SIL) Probability of Failure on Demand (PFD) Risk Reduction Factor (RRF = 1/PFD)
4≥ 0.00001 to & 0.0001> 10,000 to≤ 100,000
3≥ 0.0001 to & 0.001> 1,000 to≤ 10,000
2≥ 0.001 to & 0.01> 100 to≤ 1,000
1≥ 0.01 to & 0.1> 10 to≤ 100

Assuming no automatic diagnostics, the probability of failure on demand (PFD) of a non-redundant device is calculated using the following formula:

PFD =λ d * ( TI m / 2 )

Where:λ d is the dangerous failure rate

TI m is the manual test interval

If a device has a level of automatic diagnostics, then dangerous failures are split into two categories: dangerous detected and dangerous undetected. Accounting for automatic diagnostics, the PFD calculation becomes:

PFD = (λ dd * ( TI a / 2 )) + (λ du * ( TI m / 2 ))

Where:λ dd is the dangerous detected failure rate

TI a is the automatic diagnostic test interval

λ du is the dangerous undetected failure rate

TI m is the manual test interval

In most every case, the PFD due to automatic diagnostics is insignificant compared to the PFD due to manual testing (usually by two orders of magnitude) and can therefore be ignored.

What is 'manual test coverage'?

Assuming that manual testing is 100% effective is unrealistic. For example, full or partial stroking of a valve does not determine whether the valve will seat properly or whether it might leak. Partial stroking will not determine whether the seat is eroded or whether there is a welding rod stuck in the valve.

Testing the electronics of a sensor does not determine whether the sensing element itself is responding properly. Removing a sensor and testing it in a laboratory or maintenance shop does not determine whether the sensor will respond properly in the actual process. (Stories have been told at conferences where they did not.) Testing a level float switch by moving the float with a rod will not ensure that the float will actually float. So in reality, manual testing %%MDASSML%% referred to as manual test coverage by some %%MDASSML%% is not 100% effective in detecting all possible failures. This can be accounted for in these calculations:

PFD = (λ dd * ( TI a / 2 )) + (λ du * ( TI m / 2 )) + (λ dn * ( Life / 2 ))

Where:λ dd is the dangerous detected failure rate

TI a is the automatic diagnostic test interval

λ du is the dangerous undetected failure rate

TI m is the manual test interval

λ dn is the dangerous never-detected failure rate

Life is the proposed life of the hardware.

In other words, some dangerous failures will remain in the system for the life of the system.

Imperfect manual testing

The impact of imperfect manual testing is significant. Calculations may be done by hand or by using a number of commercially available programs that do account for imperfect manual testing. The impact is the same regardless of the hardware or configuration.

In other words, calculations for single devices (switches or valves), triplicated transmitters with 99% automatic diagnostics, dual valves, valves with partial stroke testing, all reveal the same results.

If the manual test coverage drops to 95%, the risk reduction is reduced by an average of 40%. If the manual test coverage drops to 90%, the risk reduction is reduced by an average of 57%. These results assume a 15 year life and yearly manual testing. Assumptions that can vary the final answer by a factor of two are significant enough to warrant paying attention to.

SIL performance targets vary by an entire order of magnitude (as shown in Table 1, previous page). A system with perfect manual testing and a risk reduction factor of 400 will be reduced to 170, assuming 90% manual test coverage.

Both numbers are in the SIL 2 range. However, a system with an initial risk reduction of 200 will be reduced to 86 assuming 90% manual test coverage. This is enough to slip from an assumed SIL 2 level of performance, down to an actual SIL 1 level of performance.

Thorough manual testing

The results summarized here show that manual testing needs to be as realistic and thorough as possible or else intended performance levels may not be met. The goal is to have the manual test coverage percentage as close to 100% as possible. While most will accept that manual test coverage is rarely 100% (just as automatic diagnostics can never be 100%, and no redundant system can have 0% common cause), determining an accurate assessment of the manual test coverage percentage is problematic.

Until detailed failure rate data is accumulated %%MDASSML%% which is certainly possible considering the databases that most users now have available %%MDASSML%% estimating the manual test coverage may remain a SWAG (Scientific Wild Ass Guess) for the time being.

Author Information
Paul Gruhn, PE, CFSE is the training manager at

The ISA 84 committee wrote a technical report in 2002 titled Guidance for Testing of Process Sector Safety Instrumented Functions (SIF) Implemented as or Within Safety Instrumented Systems (SIS) (ISA-TR84.00.03 %%MDASSML%% 2002). As the name would imply, this 222-page document describes methods for testing safety devices. The document is currently being re-written by the committee.

Top Plant
The Top Plant program honors outstanding manufacturing facilities in North America.
Product of the Year
The Product of the Year program recognizes products newly released in the manufacturing industries.
System Integrator of the Year
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
September 2018
2018 Engineering Leaders under 40, Women in Engineering, Six ways to reduce waste in manufacturing, and Four robot implementation challenges.
GAMS preview, 2018 Mid-Year Report, EAM and Safety
June 2018
2018 Lubrication Guide, Motor and maintenance management, Control system migration
August 2018
SCADA standardization, capital expenditures, data-driven drilling and execution
June 2018
Machine learning, produced water benefits, programming cavity pumps
April 2018
ROVs, rigs, and the real time; wellsite valve manifolds; AI on a chip; analytics use for pipelines
Spring 2018
Burners for heat-treating furnaces, CHP, dryers, gas humidification, and more
August 2018
Choosing an automation controller, Lean manufacturing
September 2018
Effective process analytics; Four reasons why LTE networks are not IIoT ready

Annual Salary Survey

After two years of economic concerns, manufacturing leaders once again have homed in on the single biggest issue facing their operations:

It's the workers—or more specifically, the lack of workers.

The 2017 Plant Engineering Salary Survey looks at not just what plant managers make, but what they think. As they look across their plants today, plant managers say they don’t have the operational depth to take on the new technologies and new challenges of global manufacturing.

Read more: 2017 Salary Survey

The Maintenance and Reliability Coach's blog
Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
One Voice for Manufacturing
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Maintenance and Reliability Professionals Blog
The Society for Maintenance and Reliability Professionals an organization devoted...
Machine Safety
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
Research Analyst Blog
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Marshall on Maintenance
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
Lachance on CMMS
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
Material Handling
This digital report explains how everything from conveyors and robots to automatic picking systems and digital orders have evolved to keep pace with the speed of change in the supply chain.
Electrical Safety Update
This digital report explains how plant engineers need to take greater care when it comes to electrical safety incidents on the plant floor.
IIoT: Machines, Equipment, & Asset Management
Articles in this digital report highlight technologies that enable Industrial Internet of Things, IIoT-related products and strategies.
Randy Steele
Maintenance Manager; California Oils Corp.
Matthew J. Woo, PE, RCDD, LEED AP BD+C
Associate, Electrical Engineering; Wood Harbinger
Randy Oliver
Control Systems Engineer; Robert Bosch Corp.
Data Centers: Impacts of Climate and Cooling Technology
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
Safety First: Arc Flash 101
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
Critical Power: Hospital Electrical Systems
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
Design of Safe and Reliable Hydraulic Systems for Subsea Applications
This eGuide explains how the operation of hydraulic systems for subsea applications requires the user to consider additional aspects because of the unique conditions that apply to the setting
click me