The impact of imperfect manual testing on safety systems

Safety instrumented system standards (IEC 61508 & 61511) are performance based. Essentially, the greater the level of process risk, the better the systems needed to control it. A variety of techniques may be used to determine the level of performance (referred to as Safety Integrity Level, or SIL) needed of safety instrumented functions.

07/15/2008


Safety instrumented system standards ( IEC 61508 & 61511 ) are performance based. Essentially, the greater the level of process risk, the better the systems needed to control it. A variety of techniques may be used to determine the level of performance (referred to as Safety Integrity Level , or SIL) needed of safety instrumented functions.

A variety of techniques may also be used to analyze system performance to see if the hardware meets the SIL targets that have been assigned. These modeling techniques %%MDASSML%% often referred to as SIL verification %%MDASSML%% account for many different factors such as failure rates, failure modes, quantities, levels of automatic diagnostic coverage, manual test intervals and more. Assumptions made during modeling can have a dramatic impact on the overall answers. For example, assuming 95% manual test coverage (rather than the optimistic 100% often used) results in an average 40% reduction in performance; assuming 90% results in an average 57% reduction.

The impact of manual test coverage has a significant impact that should be accounted for in system modeling. The results also indicate that manual testing needs to be as realistic and thorough as possible.

Failure modes

Safety instrumented system (SIS) hardware is normally dormant (e.g., an isolation valve remains open for long periods of time). Such hardware may fail in two ways: safe and dangerous. Safe failures result in nuisance trips and lost production (e.g., the valve slams shut because the normally-energized solenoid coil burned out; there was no process hazard). Dangerous failures result in the system not being able to perform its safety function (e.g., the solenoid de-energizes, but the valve is stuck open and does not close on demand).

Automatic diagnostics, manual testing

Some hardware has built-in automatic diagnostics to detect dangerous failures. For example, PLCs can detect a variety of internal problems such as being stuck in an endless loop. However, automatic diagnostics can never be 100% effective. Some devices, for example non-intelligent sensors and valves, have no automatic diagnostics at all. For example, a solenoid-operated valve cannot tell by itself whether it is stuck open.

All safety devices must be manually tested in order to detect potentially dangerous failures. SIL verification calculations are based on a manual test interval. The more often devices are tested, the quicker potentially dangerous failures can be detected and repaired, which results in better safety performance. The performance requirements for a safety instrumented function, as listed in the ANSI/ISA 84 standard, are shown in Table 1.

Safety Integrity Level (SIL)

Probability of Failure on Demand (PFD)

Risk Reduction Factor (RRF = 1/PFD)

4

≥ 0.00001 to & 0.0001

> 10,000 to≤ 100,000

3

≥ 0.0001 to & 0.001

> 1,000 to≤ 10,000

2

≥ 0.001 to & 0.01

> 100 to≤ 1,000

1

≥ 0.01 to & 0.1

> 10 to≤ 100


Assuming no automatic diagnostics, the probability of failure on demand (PFD) of a non-redundant device is calculated using the following formula:

PFD =λ d * ( TI m / 2 )

Where:λ d is the dangerous failure rate

TI m is the manual test interval

If a device has a level of automatic diagnostics, then dangerous failures are split into two categories: dangerous detected and dangerous undetected. Accounting for automatic diagnostics, the PFD calculation becomes:

PFD = (λ dd * ( TI a / 2 )) + (λ du * ( TI m / 2 ))

Where:λ dd is the dangerous detected failure rate

TI a is the automatic diagnostic test interval

λ du is the dangerous undetected failure rate

TI m is the manual test interval

In most every case, the PFD due to automatic diagnostics is insignificant compared to the PFD due to manual testing (usually by two orders of magnitude) and can therefore be ignored.

What is 'manual test coverage'?

Assuming that manual testing is 100% effective is unrealistic. For example, full or partial stroking of a valve does not determine whether the valve will seat properly or whether it might leak. Partial stroking will not determine whether the seat is eroded or whether there is a welding rod stuck in the valve.

Testing the electronics of a sensor does not determine whether the sensing element itself is responding properly. Removing a sensor and testing it in a laboratory or maintenance shop does not determine whether the sensor will respond properly in the actual process. (Stories have been told at conferences where they did not.) Testing a level float switch by moving the float with a rod will not ensure that the float will actually float. So in reality, manual testing %%MDASSML%% referred to as manual test coverage by some %%MDASSML%% is not 100% effective in detecting all possible failures. This can be accounted for in these calculations:

PFD = (λ dd * ( TI a / 2 )) + (λ du * ( TI m / 2 )) + (λ dn * ( Life / 2 ))

Where:λ dd is the dangerous detected failure rate

TI a is the automatic diagnostic test interval

λ du is the dangerous undetected failure rate

TI m is the manual test interval

λ dn is the dangerous never-detected failure rate

Life is the proposed life of the hardware.

In other words, some dangerous failures will remain in the system for the life of the system.

Imperfect manual testing

The impact of imperfect manual testing is significant. Calculations may be done by hand or by using a number of commercially available programs that do account for imperfect manual testing. The impact is the same regardless of the hardware or configuration.

In other words, calculations for single devices (switches or valves), triplicated transmitters with 99% automatic diagnostics, dual valves, valves with partial stroke testing, all reveal the same results.

If the manual test coverage drops to 95%, the risk reduction is reduced by an average of 40%. If the manual test coverage drops to 90%, the risk reduction is reduced by an average of 57%. These results assume a 15 year life and yearly manual testing. Assumptions that can vary the final answer by a factor of two are significant enough to warrant paying attention to.

SIL performance targets vary by an entire order of magnitude (as shown in Table 1, previous page). A system with perfect manual testing and a risk reduction factor of 400 will be reduced to 170, assuming 90% manual test coverage.

Both numbers are in the SIL 2 range. However, a system with an initial risk reduction of 200 will be reduced to 86 assuming 90% manual test coverage. This is enough to slip from an assumed SIL 2 level of performance, down to an actual SIL 1 level of performance.

Thorough manual testing

The results summarized here show that manual testing needs to be as realistic and thorough as possible or else intended performance levels may not be met. The goal is to have the manual test coverage percentage as close to 100% as possible. While most will accept that manual test coverage is rarely 100% (just as automatic diagnostics can never be 100%, and no redundant system can have 0% common cause), determining an accurate assessment of the manual test coverage percentage is problematic.

Until detailed failure rate data is accumulated %%MDASSML%% which is certainly possible considering the databases that most users now have available %%MDASSML%% estimating the manual test coverage may remain a SWAG (Scientific Wild Ass Guess) for the time being.


Author Information

Paul Gruhn, PE, CFSE is the training manager at


The ISA 84 committee wrote a technical report in 2002 titled Guidance for Testing of Process Sector Safety Instrumented Functions (SIF) Implemented as or Within Safety Instrumented Systems (SIS) (ISA-TR84.00.03 %%MDASSML%% 2002). As the name would imply, this 222-page document describes methods for testing safety devices. The document is currently being re-written by the committee.



No comments
The Top Plant program honors outstanding manufacturing facilities in North America. View the 2013 Top Plant.
The Product of the Year program recognizes products newly released in the manufacturing industries.
The Engineering Leaders Under 40 program identifies and gives recognition to young engineers who...
The true cost of lubrication: Three keys to consider when evaluating oils; Plant Engineering Lubrication Guide; 11 ways to protect bearing assets; Is lubrication part of your KPIs?
Contract maintenance: 5 ways to keep things humming while keeping an eye on costs; Pneumatic systems; Energy monitoring; The sixth 'S' is safety
Transport your data: Supply chain information critical to operational excellence; High-voltage faults; Portable cooling; Safety automation isn't automatic
Case Study Database

Case Study Database

Get more exposure for your case study by uploading it to the Plant Engineering case study database, where end-users can identify relevant solutions and explore what the experts are doing to effectively implement a variety of technology and productivity related projects.

These case studies provide examples of how knowledgeable solution providers have used technology, processes and people to create effective and successful implementations in real-world situations. Case studies can be completed by filling out a simple online form where you can outline the project title, abstract, and full story in 1500 words or less; upload photos, videos and a logo.

Click here to visit the Case Study Database and upload your case study.

Maintaining low data center PUE; Using eco mode in UPS systems; Commissioning electrical and power systems; Exploring dc power distribution alternatives
Synchronizing industrial Ethernet networks; Selecting protocol conversion gateways; Integrating HMIs with PLCs and PACs
Why manufacturers need to see energy in a different light: Current approaches to energy management yield quick savings, but leave plant managers searching for ways of improving on those early gains.

Annual Salary Survey

Participate in the 2013 Salary Survey

In a year when manufacturing continued to lead the economic rebound, it makes sense that plant manager bonuses rebounded. Plant Engineering’s annual Salary Survey shows both wages and bonuses rose in 2012 after a retreat the year before.

Average salary across all job titles for plant floor management rose 3.5% to $95,446, and bonus compensation jumped to $15,162, a 4.2% increase from the 2010 level and double the 2011 total, which showed a sharp drop in bonus.

2012 Salary Survey Analysis

2012 Salary Survey Results

Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Society for Maintenance and Reliability Professionals an organization devoted...
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.