How to quantify, compute system failure profiles

Developing computational frameworks to improve network resilience is a must for companies looking to weather a cyber attack or a natural disaster like a major storm.

By Gregory Hale October 17, 2020

Getting hit by a major cyber attack or even a severe storm, networks critical to the flow of data, people, goods, and services must be made more resilient to failure, said a team of scientists.

“To be able to quantify and compute system failure profiles is critical information that planners might need to make decisions in terms of how to recover these systems,” said Samrat Chatterjee, a data and operations research scientist at Pacific Northwest National Laboratory (PNNL) in Richland, Washington.

“If we are unable to characterize how the system is failing or might fail, how can we intervene and recover in an efficient manner?” he added.

Chatterjee is serving as principal investigator on a project with colleagues at Northeastern University in Boston, which is developing computational frameworks to improve network resilience.

“What we mean by resilience in this context is robustness,” said project co-principal investigator Auroop Ganguly, a professor of civil and environmental engineering at Northeastern University. “Robustness means things will be slower to break, you will not lose service levels too fast or too much and, once something happens, you have a plan to recover efficiently, fast, and reliably.”

Computational framework developed to study resilience

The team developed and implemented a generalizable computational framework to study the resilience of the multilayered London Rail Network to the compound threat of intense flooding and a targeted cyberattack. The team plans to use this generalizable computational framework to inform the design and engineering of other interdependent networks, including military installations.

The team is particularly interested in understanding how changes in one network affect other interconnected networks, the network-of-networks. The London Rail Network, for example, consists of the Underground subway, the Overground passenger trains, and the Dockland Light Rail. These three networks interconnect at shared nodes or rail stations.

If a bout of intense rain and flooding shutters a rail station, how might that closure ripple across the interconnected networks? If an adversary timed a cyberattack to follow the flooding, would the compound impact be disproportionate and greater than the sum of the parts?

“Cyberattacks are almost continuously happening,” Ganguly said. “Sometimes it is nation states, sometimes it is rogue agencies, sometimes it is just a few bad actors. And weather extremes have been growing in frequency and duration. So, the chance that they will be concurrent at some point is high.”

Computer simulations used to model compound threats

The researchers used computer simulations to model such compound threat events directed toward networks-of-networks to begin to understand how failures cascade through these interconnected systems. The insights gained inform design and engineering recommendations on how to make interconnected networks more resilient to natural and human threats and be primed to recover efficiently.

Real-world networks-of-networks such as the London Rail Network are constrained by geography and were originally designed for efficiency. For instance, some rail stations are located near major waterways to facilitate the transfer of people and cargo from ships to rail.

“By doing multiple simulations on this network-of-networks, you find that some of the design principles that were used back in the older days like having your prominent stations near the river may make the system vulnerable to failure,” said Nishant Yadav, a graduate student in Ganguly’s sustainability and data sciences laboratory at Northeastern University.

That vulnerability exists because the stations near rivers are more likely to flood during extreme weather events. If these major network nodes are shut early in a disaster, the impacts will cascade through the system, added Yadav.

This article originally appeared on ISSSource’s websiteISSSource is a CFE Media content partner.

Original content can be found at isssource.com.


Author Bio: Gregory Hale is the editor and founder of Industrial Safety and Security Source (ISSSource.com), a news and information website covering safety and security issues in the manufacturing automation sector.