How to protect OT/ICS systems from ransomware attacks

Reduce the risk of ransomware attacks on OT/ICS systems by following these prevention tips

By Rick Kaun and Ron Brash August 3, 2021

Between May 6 and May 12, 2021, Colonial Pipeline, owner of 5,500 miles of pipeline carrying natural gas, gasoline, and diesel from Texas to New Jersey, shut down its operations in response to what it said was a ransomware attack targeting its information technology (IT) network. In a media statement, Colonial officials indicated the damage was limited to their IT systems, but that the company “proactively took certain systems offline to contain the threat.”

That response, which included disabling select operational technology (OT)/industrial control systems (ICS), “temporarily halted all pipeline operations … which we are actively in the process of restoring.” The company added that its (OT) systems were fine, and the shutdown was a measured response to enable quick recovery. Without such an “abundance of caution,” the IT malware might have proven much more disruptive thanks to the interconnectedness of pipeline infrastructure and participants upstream/downstream (e.g., custody transfers, shared remote metering, available storage/capacity, etc.).

Since the Colonial incident, several other major ransomware attacks on operating entities have been reported: Martha’s Vineyard Ferry Service, FUJIFILM, and the JBS meat company who supplies 40% of all the US meat supply. This comes on the heels of several other large public ransomware events at the second-largest paper company, Westrock, Molson Coors, and others just this year.

The reality is that industrial organizations are now in the crosshairs of the ransomware gangs as the impacts from lost availability is in the millions of dollars, so the ransom demands can be quite high. A recent report by Digital Shadows found that industrial goods and services was the number one most targeted industry in 2020 at 29%. The number of attacks was more than those on the next three industries (retail, construction, and technology) combined.

What is ransomware?

Ransomware is a form of virus or more commonly called malware. Essentially the attackers find a way in (phishing, social engineering, etc) to first invade the target network. Their software then runs around the network (traversing network shares, local drives, etc) encrypting everything it finds with a key that only the attackers know. If the victim wants to unlock the files, then the victim will have to pay to get the key. The costs to get the key and decrypt files can range from hundreds to thousands or even millions of dollars depending on the specifics of the attacker and victim.

Why is ransomware used and what are the potential impacts?

Ransomware has roots in the scam and extortion criminal world, but by nature, it can also be used to target larger asset owners and organizations or to mask other activities that might be more devious.

Let’s first look at why ransomware is becoming such a challenge for industrial organizations today:

  • Ransomware takes advantage of “availability” risks and is highly profitable in industrial organizations. The business of cyber theft of personal information used to be quite profitable, but prices for that information have fallen dramatically as supply has increased. So cyber criminals have found new business models. They have shifted from the “C” in the Confidentiality -Integrity-Availability triad to the “A,” and industrial organizations require availability to operate, so the payment is usually quick and large.
  • In most cases, insurance covers a significant portion of the cost of the ransom and recovery. As a result, with current policies in place, the payment process is greased by the presence of insurance. This, however, is changing as insurers start to change policies going forward as seen in AXA’s recent announcement to stop coverage for ransomware payments.
  • Even IT attacks can shut down OT operations. OT systems are usually highly susceptible to ransomware if it gets to those systems. So, the first step in any incident response plan is to stop the spread by disconnecting OT systems. While IT systems are costly to restore, OT systems may be three to four times as costly and may take much longer. Hence the “abundance of caution” that is read frequently. In many cases operations does not solely rely on “OT” systems, but “IT” systems such as billing or supply chain software are now necessary to operate effectively. Thus, shutting down key IT systems can essentially require an OT shut down as well.
  • Why is OT so susceptible?
    • Most ransomware takes advantage of older vulnerabilities that have been left unpatched. In OT, there are a huge number of both exploits and unpatched systems.
    • Ransomware often exploits network-based insecurities to gain access (eg, through remote desktop protocol (RDP)) but spreads from endpoint to endpoint. Compensating controls, system hardening, vulnerability management and other techniques such as network isolation all play a critical role in reducing the impact and spread of a virus attack.
  • Ransomware is often very effective because many organizations are insufficiently equipped to recognize and avoid potential incidents. Large numbers of legacy and unpatched assets often poorly monitored and supervised by a handful of non-cybersecurity personnel is a recipe for disaster.

To put the cycle into perspective the diagram below illustrates the typical path ransomware takes to get into a facility:

What happened to Colonial Pipeline?

According to published reports, part of Colonial’s immediate reaction to the attack was to enlist the services of incident response specialist FireEye. Those investigators have since attributed the attack to a prolific Russian criminal ransomware group known as DarkSide, a crew credited with around 40 similar attacks with ransom demands ranging from $200,000 to more than $2 million.

DarkSide has claimed its attacks feature a professional “experience,” focusing on providing “quality products” to its consumers. The hacker crew claims it will only attack those who have the means to pay, or who are known to have cybersecurity insurance. The group also has been known to employ a double extortion methodology – getting victims to pay for unencrypting their data or, failing that, blackmailing them with the threat of public release of data exfiltrated as part of the crime.

By Monday, the DarkSide attackers expressed contrition for the Colonial Pipeline attack. Perhaps in response to the international publicity and focused governmental and law enforcement efforts spun up in the wake of the incident, the hackers took to their dark website to say they never intended to disrupt public utilities.

“We are apolitical,” the hackers said. “We do not participate in geopolitics, do not need to tie us with a defined government and look for other of our motives. Our goal is to make money, not creating problems for society.”

As mentioned above, the Colonial attack specifically targeted the IT systems that operate things such as billing and inventory. In fact, the ransomware never did cross over to infect the company’s OT systems. However, operations were halted anyway, due to the risk of further spread into OT.

What does this mean for OT? Are OT systems immune because they are less connected to the internet? Are they just “later in the spread” so rather than being patient 1-100, they are patients 101 and following – it’s just a matter of time? Should all resources be focused on stopping the ransomware from impacting IT, and if that can be done, then is OT safe? Is the solution more about incident response and how to protect operations from potential IT ransomware by creating redundancy for those systems or barriers that let OT run without reliance on those critical IT systems? The questions are numerous and should raise strategic questions for all industrial operators.

How to protect against a ransomware attack on industrial organizations

100% avoidance of downtime or incidents is not a possibility even with a security system. Rather the true measure of security is in resiliency. In other words, how quickly is detection, response and recovery to a threat or activity?

While an overall security program (like the NIST CSF, IEC 62443, or CSC18) is the proper end game for operational security, there are a few specific security controls that should emphasize directly related to ransomware. They are listed here with some very specific ‘OT Notes’ where application of these practices are more challenging particularly due to the nature of OT.

How to protect against ransomware in an OT environment

Know how an IT attack can impact OT, build clear incident response gameplay, and prioritize risks to ensure as little impact on operations as possible in case of emergency.

  • Well-defined maps of potential threats and impacts. One of the biggest questions is the risk levels and priorities of assets and systems. What systems are tied to what systems, not just technically but operationally? The great news is many industrial organizations already have disaster recovery plans. Those recovery plans need to extend to cyber events so organizations understand what they can disconnect, what they can keep operating, etc. This is key as attacks can spread from IT to OT so easily.
  • Risk prioritization: These exercises can determine the true crown jewels – which systems are the lynchpins to operations, all the way down to the individual servers etc. This then allows the organization to prioritize risk management on those systems and add extra layers of security to protect those key assets
    • OT challenge: OT specific policies and procedures – Most IT tools and behaviors MUST be modified to provide similar effects without disrupting OT. This type of balance requires significant knowledge of both security practices but also Operational awareness
  • Robust backup and recovery: Expanded backup coverage and frequent snapshots (more hosts) are needed. The more hosts that are frequently backed up securely, and assuming an adequate pipeline to get systems back those backups (e.g., enough network bandwidth), the faster organizations can recover from a ransomware attack. However, organizations must ensure the vulnerability is mitigated or the host is isolated when the backup is restored, or they may become re-infected.
    • OT challenge: Legacy systems, lack of bandwidth and need to track multiple backup solutions/products in most OT environments makes management difficult.
  • Have offline backups of critical assets: Offline backups as a resilience or disaster recovery strategy is critical to ensure your most important OT assets are protected or can be readily restored if your infrastructure is down. This includes PLC logic code, configuration, documentation, and system images/files. It may sound expensive, but it is often accomplished with securely encrypted USBs that are periodically rotated such that file integrity is maintained.
    • OT challenge: Complexity of OT environments, number and variations of source code type, location, etc – requires a wholistic backup and recovery program.
  • Regularly have “cyber fire drills” to test backups and their recovery: A frequent training regime should be applied for OT and cyber-related events. Forensics, failed hardware, shutdowns, etc. should have at least an initial note for cyber, just to ensure it was not cyber-related, and if so, a chain of custody and due diligence can be assured. Secondly, it is important that your resources know what to do when there is an issue, so this is another way to double-check processes while improving the likelihood of a quick recovery.

Endpoint Management

As stated above, one of the reasons organizations use an abundance of caution and shut down their OT processes is the fundamental endpoint risks on these assets. While it may be easier to avoid this hard topic, the reality is that resilience requires more secure OT endpoints.

The first question in this effort (as well as in beginning monitoring for potential threats) is the endpoints. To do this the following are fundamentally required:

  • Asset inventory: Effective endpoint management begins with a robust asset inventory. A rich view of a 360-degree picture of each endpoint enables proper endpoint management.
    • OT challenge: Incorporating an automated asset inventory that includes all asset types from OS based to networking but also embedded with deep asset profiles including set criticality, users and accounts, presence of compensating controls, etc.
  • OT systems management: OT asset inventory is only the beginning of a robust endpoint management program. A robust OT Systems Management program includes configuration hardening, user and account management, software management, etc. In many cases, OT systems are insecurely designed and unpatched, making it ripe for ransomware.
  • Patch management: Most threats enter through commodity systems such as Windows machines. A company cannot patch everything in OT, but an end-to-end patch management program (i.e. automation and intelligent application of patches) is of great importance due to several environmental factors such as compliance, legislation and risk management (e.g., patches on hosts with RDP or firewalls connected to the Internet should be prioritized over a programmable logic controller (PLC) protected by several layers) where unfeasible, application whitelisting, and policy enforcement makes an attacker’s life very difficult to improve chances to defend or deny a ransomware attack on an OT organization.
    • OT challenge: Companies need to have a prioritized patching process and move to compensating controls when/where necessary.
  • Removable media: USBs, removable media, and transient devices are other forms of low hanging fruit, especially if a network is “air-gapped” or heavily controlled. Users will bypass the controls by way of removable media. As a best practice, system policies are easily deployed, whitelisting software used, registered secure drives, and other technologies such as 802.X ensure authorized systems are allowed on network segments.
    • OT challenge: Enumerating, applying, monitoring and enforcing removable media policies as well as extending to transient cyber assets.

Monitor network, system and application logs for anomalies

An attack often has precursory elements that indicate an infection. However, it could indicate a vulnerable system that is amidst an attack or is about to be compromised giving a defensive team an advantage to prevent a wide-scale infection or attack. One way of doing this is with what is called a “Canary” that places a system in the network that acts as the “canary in the coal mine” and alerts as the ransomware is impacting that endpoint allowing a quicker response.

  • OT challenge: Providing “OT context” to traditional SIEM and alerting tools.
  • Monitored external attack surfaces: Many attacks are successfully accomplished due to a misconfiguration or an inadvertent hole caused by a gap in change management. It is a best practice to monitor for exposed services (e.g., Shodan).

Access control and network segmentation

Stopping the spread of ransomware often comes down to placing firebreaks in its path. These can be in the form of network protections such as firewalls, other forms of segmentation or strict access control.

  • Implement network separation or segmentation. One key way to slow the spread of ransomware is to place network barriers between IT and OT (or even within segments of IT and/or OT) networks. This approach is a foundational element but one, because of its technical challenges, often underutilized.
    • OT challenge: Segmentation is not easy on IT or OT, but in OT particular challenges arise due to legacy equipment, need for physical cabling, the downtime required to move systems onto new firewalls, etc. OT segmentation requires a team with deep knowledge of networking and the OT systems themselves.
  • Isolate systems based on software, user role, and function: To protect systems compromised through remote access, local Windows networking flaws (e.g., print spool or server message block (SMB)/Network Basic Input/Output System (NeTBIOS)), or Office/Acrobat, isolate them based on function and ensure unnecessary software is not included in standardized golden images or the same AD server is not serving policy for IT and OT. This also applies to user-based accounts; if an human-machine interface (HMI) is an HMI, treat its operator as an operator, not as an administrator.
    • OT challenge: Finding, profiling and securing these types of controls – ability to correct and enforce baselines
  • Technical diversity between zones or systems: Consistency across systems has scaling advantages, but when a single vulnerability affects multiple products this strategy grounds the entire operations if exploited. Barriers such as a VPN (virtual private network) with 2FA (two-factor authorization), a remote access terminal server, and multiple firewall vendors exponentially increases the efforts it would take for an external attack to be successful.

Improving these five categories reduces the risk and impact of a ransomware attack, leverages existing technology investments and improves recovery in the event of a compromise. Each of these add successive protections and safeguards against a possible ransomware attack.Conclusion and success stories

OT-specific challenges are identified in this document not to show that a robust OT security program is unattainable or improbable but rather to help the reader identify key decision points that will help a successful program to achieve maximum protection with minimal challenges.

The application of ‘IT-like’ security controls in OT is increasingly being achieved in numerous industries, companies and countries around the world. But the true measure of success is in the maintenance and monitoring of their initial efforts. The companies that are significantly improving their security posture are acknowledging the unique challenges of an OT environment and making decisions such as:

  1. Building robust, 360-degree asset views
  2. Incorporating multiple functions into a single platform
  3. Tying together IT and OT skill sets at an enterprise level to review, monitor, plan and execute systemic security controls
  4. Automated data collection and remediation tasks
  5. Partnering with proven OT safe software and services vendors/consultants.

This story originally appeared on Verve’s website. Verve is a content partner of CFE Media.

Original content can be found at Control Engineering.


Author Bio: Verve Industrial