Reduce downtime and risk with effective alarm management
A more intelligent design that can deliver the right alarms at the right time is critical in reducing downtime and risk for operators.
- Improper alarm management contributes to lost production and can create the potential for a major industrial incident.
- Users need to streamline alarm systems and determine which alarms require immediate response and which ones don’t.
- Extensive in-house training and education on alarm management may also be needed.
Improper alarm management contributes to lost production, costing industrial facilities millions in unplanned downtime – not to mention creating the potential for a major industrial incident. When operators face dozens of alarms during a facility upset, it’s almost impossible to quickly separate important from unimportant alarms, slowing response time for addressing a problem before it escalates.
Oftentimes, these “alarm flood” issues stem from an improperly conceived alarm system with poor prioritization, improperly set alarm points, ineffective annunciation, vague or confusing graphics and/or alarm definitions on the human-machine interface (HMI). Facilities experiencing these real-world alarm issues can greatly benefit from a properly functioning alarm management system that’s holistically integrated with the plant automation systems.
Many automation suppliers have developed tools to help tame the alarm problem and reduce the quantity and rate of alarms to which operators must respond. Reducing the quantity only addresses part of the problem, however.
The ultimate objective is directing an operator’s focus to the most critical information related to the problem. This requires a more intelligent design that can deliver the right alarms to the right operator at the right time – and with appropriate importance, context and guidance so they can correct or quickly mitigate the situation.
Alarm best practices and philosophy
The ANSI/ISA-18.2-2016 Management of Alarm Systems for the Process Industries standard focuses on alarms found in modern process automation solutions based on a distributed control system (DCS), supervisory control and data acquisition (SCADA) or programmable logic controller (PLC) platform. While it’s often applied in the continuous process industries, its scope also targets other manufacturing processes like batch, discrete and hybrid – so its applicability is universal.
The standard provides guiding principles and well-defined processes for managing the lifecycle of alarm systems. It clarifies the alarm philosophy and rationalization process as a holistic lifecycle approach, starting with the idea of creating alarms where the process and safety considerations call for them. It also sets a higher standard for selecting and implementing alarms.
Highlighting best practices, the standard focuses on defining the right number of alarms for the current circumstances and not any specific minimum or maximum for the total number. Instead, emphasis is placed on the rate of alarms. A process can deviate from the norm in multiple ways at the same time, and this is a real concern when there are too many unmanaged alarms. When this occurs, operators can become overwhelmed and find it difficult to separate the truly important alarms from duplicate or irrelevant ones. As a result, situational awareness is compromised, and operators end up making poor operational decisions or can even escalate the problem.
Like many ISA standards, ANSI/ISA-18.2 is considered a recognized and accepted good engineering practice (RAGAGEP), so its application is treated by many safety regulating bodies as an operating requirement for manufacturers. A foundational aspect of that application is the creation of an alarm philosophy document that defines the criteria for rating an alarm’s severity, urgency, and response. In the simplest terms, the alarm philosophy is a set of guidelines on how to manage alarms effectively and provides the basis for a properly functioning alarm management system. With it, facilities can standardize, design, develop, implement, modify, manage, maintain, and continuously improve their alarms. Alarm response procedures also can be developed and integrated with an HMI to help operators respond effectively to mitigate abnormal situations.
After the alarm philosophy criteria are established, the alarm rationalization process helps to minimize the number of alarms required to keep operating conditions efficient and safe. An alarm rationalization team reviews, justifies, validates, and documents each alarm based on the alarm philosophy criteria. The main goal of rationalization is evaluate the alarms, identify root causes and determine which alarms an operator needs and should be included in the pool of useful alarms.
Alarm criticality to help process operators understand
To illustrate this process further, let’s consider that a compressor goes down and impacts numerous processes simultaneously. The incident can cause the operator’s screen to light up with an overwhelming number of alarms. In this situation, what does the operator really need to know? The compressor went down – that’s the critical alarm. The myriad of other alarms blurs that reality. Causes of the shutdown might be related to temperature, pressure, an electrical trip, or something else – all of which is valuable information related to the causal condition. Alarms resulting from that shutdown event confuse the situation.
Once an event’s criticality and effect is established, a focused analysis can help separate the potential associated alarms into root causes, the trigger event and all other conditions and events that are a result of the initiating incident. Each of these types of alarms should be treated differently. The causal alarms should be configured in a way the operator is given enough early warning to react in time to prevent the critical event. If the event does occur, the resultant alarms should be suppressed so as not to affect the operator’s situational awareness.
Rationalizing alarms: Separate information from action
The assessment described above is a good first step for reigning in the number of alarms an operator might experience when a critical event occurs, but it’s also very important to review every alarm to ensure it meets the fundamental requirement – whether an operator must act on the alarm. The alarm rationalization process is key to weeding out so-called “alarms” that don’t meet this requirement and should be relegated to a separate group of information-only events. The process also helps determine the necessary operator response time, and consequence of not acting in time.
In the compressor scenario above, some qualifying questions should be, “Does the operator have to act on an alarm?” If “yes,” then the rationalization process should determine how much time the operator has to act before a consequence occurs, such as a drop in pressure or the compressor stopping altogether. Also important is determining the severity of doing nothing, which helps separate out the critical alarms. This brings into focus the potential for unexpected downtime, personnel safety, or environmental damage – resulting in shutdown costs or even injury to personnel.
Due to the sheer number of alarms that must be considered, the rationalization exercise is a major endeavor – but the effort is worthwhile. The process highlights the important realization that many “alarms” are not alarms at all. For the ones that are, they are not always useful or necessary. The key is reducing alarm notification to the times when they are most needed, and suppressed when they are not.
Suppressing alarms: Three ways, dynamic and static
As mentioned, suppressing alarms when they aren’t useful is a key aspect of managing a vast pool of potential alarms. In fact, many times it’s the best way to reduce alarm rates to a manageable level. For this purpose, ANSI/ISA-18.2 defines the following three forms of alarm suppression:
- Shelving – An operator manually suppresses an alarm temporarily.
- Design – The process automation system suppresses an alarm based on a specific set of conditions.
- Out of service – An alarm has been suppressed because a portion of the equipment is shut down for maintenance or some other reason.
The most interesting and challenging of the three is design suppression, which is further divided into two categories – dynamic and static. Dynamic suppression is the most challenging of the two, because it requires creating rules the system uses to determine an alarm’s importance. It provides the automation system with enough intelligence to determine the most important alarms, make sure they are annunciated, and suppress the unnecessary and irrelevant alarms. It’s this intelligence that helps to avoid alarm floods during upsets and other complicated situations.
Static suppression is based on the state of the process and equipment. Specific alarms are enabled or suppressed during defined procedures or conditions. For example, some alarms may only be enabled during a unit startup. Of the two suppression types, this technique is simpler and is more commonly implemented.
Qualified personnel: Training to define, process
As mentioned above, critical alarms should be identified and documented, and their treatment defined in an alarm philosophy document. An alarm management program incorporates this documentation, along with a properly rationalized alarm system to enable operators to identify root causes and resolve issues. This process will be long and tedious, and the rationalization team might have to process more than a hundred alarms each day from upwards of 10,000 or more for a large plant. For this reason, management awareness and buy-in is necessary to see the process through to completion.
Training in-house personnel may also be required to properly develop the alarm list and to design, deploy and maintain it – but few companies have enough people with the bandwidth or the depth of skill and experience to execute this work. In these instances, an experienced third-party partner should be consulted – someone who can create and maintain an alarm management program and has the breadth of experience to provide guidance and assistance with its implementation. Incorporating the standards-based alarm management program described above, though resource intensive, is well worth the effort when the potential safety risk to personnel and potentially huge cost of lost production is considered.
Richard Slaugenhaupt is a consultant for Maverick Technologies, a CFE Media content partner. Edited by Chris Vavra, web content manager, Control Engineering, CFE Media and Technology, firstname.lastname@example.org.
Keywords: Alarm management, process safety
How are you managing your alarms and making the process more efficient?
Original content can be found at Control Engineering.