Control system failure survival strategies

When bad things happen to good control systems, it’s best to know in advance which of several strategies will be used, why, and how.


Steps to prevent control systems failure and the instrumentation required for safety.What happens when control systems fail? It is not so much the response, but the response planning that is crucial to failure situations. Like all other fields of engineering, control systems require careful planning and design to deal with every foreseeable situation, including failures, whenever feasible. No technology is perfect, so possible failure types must be identified and strategies developed for the handling of each one of significance. This is not to say that an engineered approach is required every time, but factors such as how the system is monitored, alarm system capabilities, availability of operators, process uptime requirements, probability of occurrence, possible consequences, and process safety, just to name a few, must always be considered.

Why? Control system robustness and reliability have come a long way in the last 30 years. The automated control system is now considered to be a vital part of the infrastructure that helps ensure the smooth running of plants, pumping stations, utilities, and processing facilities. In all but the most specialized and small scale processes, the days of operators routinely turning valves, starting and stopping pumps by hand, and monitoring mechanical gauges are over. Low-level tasks are now typically handled by automated control systems.

Certain approaches are commonly used to deal with control system failures.

1. Manual operator shutdown: Perhaps the simplest response to a control system failure is to have an operator (staff person) shut down the process manually. In some systems where there are no significant safety concerns, operators are always available, and there is a way the operator can easily notice a problem, an operator-initiated approach is sometimes used. If using a manual shutdown approach, careful consideration has to be given to how the operator would notice a failure and what the consequences would be if there was a time delay before the operator was able to shut down the process. For facilities without personnel and/or plants with limited alarm system capability, this approach can be difficult to use effectively.

2. Emergency stop: In many jurisdictions, processes must have emergency stop and/or process stop buttons installed. In the case of a control system failure, these can provide an easier method for an operator to quickly shut down a process. The downside is that these types of buttons typically initiate hard shutdowns, which may not always be the best response for certain types of control system failure. For example, in some water pumping applications having an operator start an auxiliary pump is preferable to an operator-initiated shutdown. Like the previously mentioned manual shutdown method, the manually initiated emergency/process stop approach also depends on the operator noticing the problem and being able to react to it in a timely manner.

3. Automatic to manual: Another option is to have an operator switch the process equipment from automatic to manual mode and then continue to run the process in manual mode. Depending on process design, this may or may not be possible. While this approach may seem attractive for a process uptime perspective, simply running a process manually without the aid of the automated control system should only be considered with great caution. The process must be simple enough that an operator can run it manually without become overwhelmed, and the operator must be able to deal with any process upsets without the aid of the now-disabled control system. Furthermore, any potential safety risks must be carefully identified, evaluated, and managed. Likewise, any potential impacts on the alarm system and/or reduced data logging capability also must be carefully considered before attempting to run a process manually. For some processes, the approach of having an operator run them in manual mode might not be acceptable due to the potential risk involved.

4. Safety system shutdown: In processes that warrant automated independent safety systems, the safety system is another way that control system failures can be handled. For processes that use them, the safety system is typically designed to generate a hard shutdown. Safety systems can range from simple emergency stop circuits to highly redundant ISA84-style SIS (safety-instrumented-systems) that use multiple automatic shutdown techniques. The caveat with relying on the safety system to handle a control system failure is that the process often has to get out of control for the safety system to be activated. It is also a reactive approach rather than a proactive method of dealing with control system failures. The end result of a safety system being activated is almost always a hard shutdown, which has obvious impacts on process uptime.

The approaches above react to the results of a control system failure, rather than the control system failures themselves.

5. Redundancy: An alternate method to dealing with control system failures is to make the control system less prone to failures by using redundancy. Redundant components such as backup power supplies, multiple processors, fault-tolerant I/O cards, dual-trunked networks, and backup instrumentation, along with automatic fail-over logic, can significantly reduce the chance of control system failure. By handling failures with built-in redundancy in the primary control system, costly hard shutdowns can often be avoided. This approach does come with a cost since implementing redundancy into a control system can often be an expensive undertaking. Implementation challenges may exist in some primary control systems to provide for bumpless transfer among redundant system components.

6. Secondary redundancy: When having a fully redundant primary control system is not feasible, a slightly different approach is to have a secondary control system as a backup. Use of secondary control systems can be particularly well suited for applications where the cost of a fully redundant primary control system is prohibitive. A secondary control system is typically designed to “take over” the process in one or both of the following situations: (a) when a process value (such as a tank level) exceeds a value that the normal control system would prevent it from reaching, or (b) when a system health signal from the primary control system indicates that the secondary system needs to take over. Secondary control systems often have their own sensors, which have the benefit of being unaffected by instrumentation failures in the primary control system.

A secondary control system can also be triggered by a wired “failsafe,” often an electrical signal, that indicates a failure in the primary control system. For example, a sewage pumping station with a sewage holding tank could use a PLC with a level transmitter for its primary control system, and a set of float switches with electric relay logic as its backup control system.

If the PLC or level transmitter fails, the float switches and relays would then take over control of the pumps. In many systems, a well-implemented secondary control system can result in significant uptime improvement at an attractive price-point. However, implementing secondary backup control systems can present challenges when designing how output devices, such as pumps and valves, are to smoothly transition from primary to secondary control when required. Being able to transition output devices and final control elements in an automated fashion requires careful planning and system integration.

Choose among options

In general, when a control system failure occurs, it is usually preferable to rely on control system redundancy and/or backup control systems. When no redundancy and/or backup control systems are used, control system problems typically result in shutdowns. Furthermore, the response of having an operator continue to run the processes in manual mode is not always an option due to staffing levels and potential safety concerns. Systems that can auto-recover from minor control system problems are almost always preferable to ones that require operator intervention each time a problem is encountered.

Effective plant design requires a diverse team of professionals and a multifaceted approach. Starting with knowledge of the process itself, the design team must consider the individual characteristics of the process and control system characteristics. Control system failures are a possibility that we all have to be aware of—the key is to make sure that when they happen, there is a solid plan and strategy in place on how to deal with them.

- Graham Nasby, P.Eng., PMP, engineer, Eramosa Engineering Inc., Guelph, Ontario, Canada, is among Control Engineering Leaders Under 40, Class of 2011. Posted by Chris Vavra, Control Engineering, 

Nasby, a licensed professional engineer, has worked in various industries ranging from IT and software development to pharmaceuticals and semiconductor manufacturing. He designs automated control and monitoring systems for the municipal water/wastewater sector at Eramosa Engineering Inc. Graham is also a contributing member of the ISA18, ISA101, and CSC/IEC TC65 standards committees. 

Process Control Channel

No comments
The Top Plant program honors outstanding manufacturing facilities in North America. View the 2013 Top Plant.
The Product of the Year program recognizes products newly released in the manufacturing industries.
The Engineering Leaders Under 40 program identifies and gives recognition to young engineers who...
The true cost of lubrication: Three keys to consider when evaluating oils; Plant Engineering Lubrication Guide; 11 ways to protect bearing assets; Is lubrication part of your KPIs?
Contract maintenance: 5 ways to keep things humming while keeping an eye on costs; Pneumatic systems; Energy monitoring; The sixth 'S' is safety
Transport your data: Supply chain information critical to operational excellence; High-voltage faults; Portable cooling; Safety automation isn't automatic
Case Study Database

Case Study Database

Get more exposure for your case study by uploading it to the Plant Engineering case study database, where end-users can identify relevant solutions and explore what the experts are doing to effectively implement a variety of technology and productivity related projects.

These case studies provide examples of how knowledgeable solution providers have used technology, processes and people to create effective and successful implementations in real-world situations. Case studies can be completed by filling out a simple online form where you can outline the project title, abstract, and full story in 1500 words or less; upload photos, videos and a logo.

Click here to visit the Case Study Database and upload your case study.

Maintaining low data center PUE; Using eco mode in UPS systems; Commissioning electrical and power systems; Exploring dc power distribution alternatives
Synchronizing industrial Ethernet networks; Selecting protocol conversion gateways; Integrating HMIs with PLCs and PACs
Why manufacturers need to see energy in a different light: Current approaches to energy management yield quick savings, but leave plant managers searching for ways of improving on those early gains.

Annual Salary Survey

Participate in the 2013 Salary Survey

In a year when manufacturing continued to lead the economic rebound, it makes sense that plant manager bonuses rebounded. Plant Engineering’s annual Salary Survey shows both wages and bonuses rose in 2012 after a retreat the year before.

Average salary across all job titles for plant floor management rose 3.5% to $95,446, and bonus compensation jumped to $15,162, a 4.2% increase from the 2010 level and double the 2011 total, which showed a sharp drop in bonus.

2012 Salary Survey Analysis

2012 Salary Survey Results

Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Society for Maintenance and Reliability Professionals an organization devoted...
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.