Achieve fault tolerance with a real-time software design

Data Distribution Service (DDS) specification from Object Management Group (OMG) is a data-centric publish/subscribe (DCPS) messaging standard for integrating distributed real-time applications. Here’s how process replication can increase a system’s fault tolerance.

04/29/2013


The most basic approach is to run multiple, identical Manager applications. Courtesy: Real-Time InnovationsData Distribution Service (DDS) specification from Object Management Group (OMG) is often used to build mission critical systems consisting of multiple components executing simultaneously and collaborating to achieve a certain task. In such systems, there are usually several components that are critically important for the overall functioning. Those components require extra attention when designing the system to ensure that the likelihood of them failing is minimized. Its focus on reliability, availability and efficiency make the DDS infrastructure particularly suitable to connect the plant floor with other areas of the enterprise.

DDS provides advanced features that help achieve such goals. While there are many methods, fault tolerance for failing publishing applications or machines can be achieved by means of active process replication.

Guiding example

For the purpose of illustrating the theory here with an example, the use case of the simple distributed finite state machine (FSM) is described. The FSM consists of a few components: a Master Participant, a Worker Participant, and a Manager. The Observer component, not essential for the functionality of the pattern, is passively observing only. This example focuses on the interaction between the Manager and the Observer.

Let’s assume that the Manager is the most critical process in the pattern, which is likely to be the case, especially if the pattern is expanded to support multiple Masters and Workers. The most obvious way to make the Manager component more robust is by running multiple processes simultaneously, all with the same responsibility. With that approach, the Manager component is defined as a conceptual item that may consist of multiple, simultaneously running application processes. For the sake of simplicity, the collection of those processes is referred to as the Manager component. Note that those could be executing in a distributed fashion on multiple machines.

Active process replication

Simultaneously running multiple Manager processes with the same purpose will decrease the likelihood that the Manager component will break down completely. DDS allows for adding and removing participants on the fly without requiring configuration changes, so the basic concept of process replication is supported. Still, there are multiple options from which to choose.

Plain process replication

To avoid disadvantages, another option is to make Manager processes aware of each other at the application level. Courtesy: Real-Time InnovationsThe most basic approach is to run multiple, identical Manager applications. Each acts the same, receiving the same transition requests and publishing the same state updates in response. As a consequence, applications observing the state machine will see multiple instances of that state machine simultaneously. This approach requires a minor adjustment to the data-model to allow multiple task instances to exist side-by-side. This is achieved by adding a key attribute that identifies the originating Manager application.

An advantage to this approach is its low complexity; just replicating everything is an approach everybody understands. There is no impact on the Manager application code, and all participants remain completely decoupled from each other. Additionally, the mechanism does not rely on any built-in replication mechanism from the middleware. This could be considered an advantage as well, because the developer now has every aspect of the replication process under control.

On the other hand, that latter argument could be seen as a disadvantage. Requiring all Observers of the state machine to be able to deal with multiple, identical versions simultaneously does have an impact on the application code and introduces an extra burden on the application developer. Also, the number of instances an instance state updates scales linearly with the number of Manager processes. For this particular state machine example that is not that big of a deal, but in the general case, it could have a significant impact on network bandwidth consumption and memory usage.


<< First < Previous 1 2 Next > Last >>

No comments
The Top Plant program honors outstanding manufacturing facilities in North America. View the 2015 Top Plant.
The Product of the Year program recognizes products newly released in the manufacturing industries.
The Engineering Leaders Under 40 program identifies and gives recognition to young engineers who...
Your leaks start here: Take a disciplined approach with your hydraulic system; U.S. presence at Hannover Messe a rousing success
Hannover Messe 2016: Taking hold of the future - Partner Country status spotlights U.S. manufacturing; Honoring manufacturing excellence: The 2015 Product of the Year Winners
Inside IIoT: How technology, strategy can improve your operation; Dry media or web scrubber?; Six steps to design a PM program
Getting to the bottom of subsea repairs: Older pipelines need more attention, and operators need a repair strategy; OTC preview; Offshore production difficult - and crucial
Digital oilfields: Integrated HMI/SCADA systems enable smarter data acquisition; Real-world impact of simulation; Electric actuator technology prospers in production fields
Special report: U.S. natural gas; LNG transport technologies evolve to meet market demand; Understanding new methane regulations; Predictive maintenance for gas pipeline compressors
Warehouse winter comfort: The HTHV solution; Cooling with natural gas; Plastics industry booming
Managing automation upgrades, retrofits; Making technical, business sense; Ensuring network cyber security
Designing generator systems; Using online commissioning tools; Selective coordination best practices

Annual Salary Survey

Before the calendar turned, 2016 already had the makings of a pivotal year for manufacturing, and for the world.

There were the big events for the year, including the United States as Partner Country at Hannover Messe in April and the 2016 International Manufacturing Technology Show in Chicago in September. There's also the matter of the U.S. presidential elections in November, which promise to shape policy in manufacturing for years to come.

But the year started with global economic turmoil, as a slowdown in Chinese manufacturing triggered a worldwide stock hiccup that sent values plummeting. The continued plunge in world oil prices has resulted in a slowdown in exploration and, by extension, the manufacture of exploration equipment.

Read more: 2015 Salary Survey

Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Society for Maintenance and Reliability Professionals an organization devoted...
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
This article collection contains several articles on the vital role that compressed air plays in manufacturing plants.
This article collection contains several articles on the Industrial Internet of Things (IIoT) and how it is transforming manufacturing.
This article collection contains several articles on strategic maintenance and understanding all the parts of your plant.
click me