Process risk assessment uses big data
Technology Update: Predictive, process risk assessment can use big data to assess risks dynamically and report automatically, empowering plant personnel to identify issues, taking necessary preventive measures to address them, avoiding a related shutdown incident or accident.
It’s a typical Monday morning scene at a refinery: the team (plant manager, supervisors, and head operators) gets together to review the past week’s performance and the coming week’s plans. They talk about the industrial fluid-catalytic-cracking-unit and the key question, “How was the catalyst stand pipe’s performance?” The team answers are: “Not great; there were more alarms than usual; and we’re not sure why.”
Plant management knows the regenerated catalyst stand pipe is prone to disturbances, which leads to frustrating operational “hiccups” (and trips) every now and then. It’s one of the most profitable units in the refinery, with a best-in-class historian and manufacturing intelligence software. The systems generate hundreds of thousands of data points. Yet, the magnitude of risks and reliability associated with the stand pipe (and how they change dynamically) remains unknown, creating challenges in managing its operation for optimum efficiency.
This type of scene plays out often in refineries across the globe and indicates a growing problem as equipment ages and experienced operators retire. With recent advances in control and monitoring systems, facilities are getting overloaded with data—without clear insights into process performance, especially development of process risks. Hence, over the past few years, facilities have become data rich but information poor; this is typically referred to as the “big data challenge.”
Big data is indeed big. Typically, more than 5 billion data points are recorded every 6 months in a plant with about 320 tags, recording sensor measurements every second. It is often characterized by four Vs: volume, variety, velocity, and variability, which change with time. Lost in the big data flood are indicators that can help plants understand the dynamically changing risks and avoid some of the $10 billion losses the U.S. chemical and petrochemical industry experience annually (due to unexpected shutdowns).
Research shows that taking a different-in-kind approach to harnessing big data—based on processing the information directly with advanced data mining techniques—creates a wealth of insights that were previously unavailable. This has significant potential to transform the way facilities operate, and to reduce unexpected disruptions.
Current process risk analysis methods leave gaps in the risk assessment landscape. A more innovative approach for predictive risk assessment can help facilities prevent accidents and unexpected shutdowns, and operate reliably with reduced risk profile.
Current risk analyses, gaps
Improved process risk management is the primary outcome of the widely used Process Safety Management (PSM) standard, which is promulgated by the U.S. Occupational Safety and Health Administration (OSHA) to maintain and improve safety, operability, and productivity of plant operations. Advances have been made in the process risk assessment area in the last decade, though significant gaps remain for some facilities.
Risk analysis techniques and associated gaps are as follows:
1. Quantitative risk assessment (QRA). Typically, QRAs are conducted once every 3-5 years by most facilities. These use various data sources available to the industry, such as incident data, material safety data, and equipment and human reliability data, to identify incident scenarios and evaluate risks by defining the probability of failure and their potential consequences. They help users identify areas for risk reduction.
Gaps: Because QRA mostly involves incident and failure data (excluding day-to-day process and alarm data that contain information on precursor events), it has limited predictive power. Interestingly, a summary report (Lauridsen et al., 2002) by the Joint Research Centre and Denmark Risk National Laboratory of the European Commission indicates that risk estimates based on generic reliability/failure databases are prone to biases and could result in large deviations depending on data sources. Their project employed seven partners that conducted risk analyses for the same ammonia storage facility, finding "large differences in frequency assessments of the same hazardous scenarios." For these reasons, the importance of using process-specific databases for objective risk analyses has been gaining recognition.
2. Safety audits. Many facilities conduct safety, health, and environmental audits using internal teams and large consulting companies, which require significant resources. The frequency and effectiveness of internal safety audits depend highly on resource availability of the facility. In most cases, safety professionals with some support from engineers, operators, and sometimes even managers periodically review operating procedures and safety records, and conduct limited number of interviews about safety practices.
Gaps: Formal, in-depth safety audits are conducted periodically, with frequency ranging from once a year (in extremely rare cases more than once a year) to once in several years. An integral part of these audits is to review incident history and observable near-misses that are reported by employees. The latter depends upon the safety culture at the facility and may not always provide a true picture of risks.
Furthermore, these approaches do not have the capability to monitor the change in the process risk levels in real, or even near, time.
3. Operations management and manufacturing intelligence tools. Operations management and manufacturing intelligence software provide key performance indicators (KPIs) for performance monitoring of operations, and assessment of availability and effectiveness of equipment. They focus on trending, reporting, and visual analytics of a select data slice, which help users monitor the variability of different parameters in a time period (shift, day, week, etc.).
Gaps: These systems fall short when it comes to big data analytics, particularly when users need insights on when parts of operation are becoming riskier and how anomalies are creeping in. This requires comparison of operating conditions with their normal behavior to identify new changes and shifts, which is not the focus of these systems. With aging equipment and expected departure of many seasoned operators from the workforce, this handicap becomes even more considerable.
4. Condition-based monitoring tools. These tools identify abnormal situations in real- or near-time by comparing plant performance with its expected behavior and alerting the user when there is a mismatch. Both model-driven (based on quantitative process models) as well as data-driven tools (based on clustering and dimensionality reduction approaches) are available in the market that help operators take immediate corrective actions as real-time alerts are dispatched.
Gaps: Because they are designed to monitor operations in real- or near-time, they do not focus on identifying how risks and likelihood of incidents evolve over a period of time (days, weeks, months). Stated differently, although they provide smart alarms (superior to traditional alarms with fixed thresholds) that cater to the needs of operators on the floor, they are limited in scope when it comes to assessing magnitude of process risks and performance, which is critical information for plant managers, engineers, and reliability personnel for strategic decision making. Further, many require a lot of “care and pain” in maintaining the baseline, making them less attractive. In addition, they often involve remote monitoring and diagnosis of plant data at an offsite facility, which is a resource- and capital-intensive project.
Risk assessment gaps continue to be problematic. Identifying risk levels and drivers dynamically can play an important role in helping busy plant personnel harness the insights in the big data and take appropriate actions rapidly.
Real-time risk analysis
Accidents are rare events that occur when a series of (unfortunate) failures of risk management barriers occur in succession, implying a “chance” factor involved in their occurrences. However, post-incident investigations show that there are several near-misses that occur before these unexpected events that evolve (gradually or often, rapidly) to become abnormal situations (Phimister et al., 2003; Kleindorfer et al., 2003; Pariyani et al., 2010; Kleindorfer et al., 2012). This concept is captured in the well-known “safety pyramid.”
Figure 1 introduces an extended version of the safety pyramid (developed by Near-Miss Management LLC), indicating two categories of near-misses that are precursors to accidents. Observable near-misses are typically noted by the operations team, such as equipment failures, leaks, etc. Hidden near-miss events can be detected only through rigorous data analysis and are typically not observable to the human eye. Finding such events in the process and alarm databases permits detection of operational problems at their developing stages.
With new advances in information technology, a typical industrial plant operation monitors hundreds (even thousands) of parameters continuously, generating extensive data sets (on the order of 10 million to 50 million data points or more daily), exceeding the billion mark within few weeks of operation. Using advanced data-mining techniques finds new anomalies that often cannot be detected using manual or visual data analysis or engineering models. This information is then used to estimate risk that indicates likelihood of normal conditions escalating to abnormal levels, providing insights on potential performance issues in operations, before they become visible or threatening.
The dynamic risk levels guide plant personnel to the sources and nature of the risk (at the lowest data levels) to deploy the right resources in a timely manner, to plan just-in-time (JIT) maintenance, and to head off potential problems, several days or even weeks in advance. Results can be made accessible to all users (plant managers, supervisors, engineers, reliability and maintenance crew, as well as operators) to promote transparency among the operating team and to complement existing PSM, hazard identification, and quantitative risk analysis activities.
See two detailed graphics and two case studies on next page.