How to build scalable data models with MQTT Sparkplug

A key to bridging the OT/IT gap is enabling successful data modeling, which is how organizations define and organize their business processes.

By Arlen Nipper May 20, 2021
Courtesy: Cirrus Link

Some companies undergoing digital transformation expect a straight, simple line from operational technology (OT) data to enterprise applications (see Figure 1). They hope to collect the data, add some information technology (IT)/Cloud tooling and achieve a simple internet of things (IoT) solution. In reality, OT data comes from myriad sources with various data types requiring complex IT/Cloud tooling to make sense of it all. OT data needs vary greatly from IT data needs, and companies need a way to satisfy both sides to successfully embrace IoT and digital transformation.

OT data consists of proprietary protocols and multiple data formats, varies across market segments and includes no contextual information. The data is designed for operations and is retrieved with poll/response methodology, then is directly coupled to applications over isolated networks.

IT requires data for data objects and modeling, in standard data formats, with contextual information and it must be secure and easy to integrate. The data should be decoupled to the enterprise and is best retrieved with publish/subscribe methodology.

Figure 1: Digital transformation looks simple, but OT data is very complex. Courtesy: Cirrus Link

Figure 1: Digital transformation looks simple, but OT data is very complex. Courtesy: Cirrus Link

A key to bridging the OT/IT gap is enabling successful data modeling. Data modeling is how organizations define and organize their business processes to unlock the value of their data. Unless they put the data into a unified format into the Cloud (the model) they cannot do anything useful with the data. As I like to say, “garbage in will be garbage out.” Data models allow everyone in the organization to understand and use the data more effectively with a single source of truth — no garbage. Successful data modeling can lead to business improvements ranging from reduced cycle time to fewer errors to improved collaboration.

However, there is a challenge in the IoT solutions market today: How to connect OT data to IT systems for data modeling and data integration.

Companies have done it, but it requires a great deal of custom work, code and a spider web of technologies. So many customers try to collect their data and get it to the Cloud, but they end up with a huge number of process variables in a data lake somewhere in the Cloud. They haven’t really solved the data model problem; they have just moved it down the road. System integrators often oversimplify the process, telling customers they will come in and write some code, but then the most common problem is that the solution will not scale. No matter how much code they write, without a data model, it is not scalable.

Message queuing telemetry transport (MQTT), an open-standard, publish/subscribe network protocol, combined with the Sparkplug Eclipse specification, provides a much simpler answer.

Figure 2: Using MQTT Sparkplug to connect OT data as a single source of truth. Courtesy: Cirrus Link

Figure 2: Using MQTT Sparkplug to connect OT data as a single source of truth. Courtesy: Cirrus Link

An OT-centric data model

MQTT Sparkplug has been touted as an excellent IoT protocol because it is a lightweight, publish/subscribe network protocol that is simple, efficient, secure and open with no vendor lock-in. MQTT is a message-oriented middleware, so the client connects to the broker and then publishes information. The data is decoupled, so one edge device can publish a metric and 100 applications (or more) can subscribe. The benefits are well documented. However, the purpose here is to focus on one benefit of the Sparkplug B specification — that it defines an OT-centric data model/asset.

Sparkplug is a new specification within the Eclipse Tahu project that defines how to use MQTT in a mission-critical, real-time environment. Sparkplug defines a standard MQTT topic namespace, payload and session state management for industrial applications while meeting the requirements of real-time supervisory control and data acquisition (SCADA) implementations. The Sparkplug B specification provides the data model needed to define a tag value for use with OT, also providing data to IT, making it 100% self-discoverable and easy to consume.

MQTT Sparkplug establishes a single source of truth for models, assets and tags at the edge, enabling OT data from various data sources and protocols and defining it for IT (see Figure 2). When customers are designing an IoT system, when they start their design, it is ideal for the data model to be as far to the edge as possible. Ideally, the data model should be in the device to establish that reliable, single source of truth.

Tags are the only piece of this puzzle typically addressed by IoT platforms and solutions, but MQTT Sparkplug goes beyond tags to create a single source of truth for models and assets as well. Without custom code, scripts, Python, Java or anything else complex and homegrown rarely scales or works long term.

When OT data is collected and then the model/asset/tags are converted to MQTT Sparkplug, the data can be sent to Cloud and enterprise applications for the auto-creation of data models without any programming or coding required. OT data is converted to IT data, then put in a standard interface for Big Data, which leads to scalable data insights and business improvements.

Windfarm example

CirrusLink built a sample use case for the MQTT Sparkplug data modeling capabilities at a windfarm. We connected a wind turbine, added attributes and process variables with MQTT Sparkplug, then created the model in AWS SiteWise. The benefit of this solution is companies can start where the expertise is, at the edge, at the wind turbine and then create the model to be consumed by any third party or Cloud application. MQTT Sparkplug provides the technology to create a model that says, “This is a wind turbine, at this location, with these process variables: windspeed, RPM and direction.” Then MQTT Sparkplug provides a model all the way from the edge to the Cloud for a single source of truth.

Now, any IoT platform, solution or application can be either a consumer or a provider of data models. There is no other technology besides MQTT Sparkplug that allows companies to build a generic data model, then an asset and then populate the asset. Without coding? Unheard of. OPC UA data models compete on some level but you can’t create those yourself. Plus, the true beauty of the solutions is this proper model/asset/tag definition enabled by MQTT Sparkplug allows the solution to be replicated at scale. The unique capabilities built into MQTT Sparkplug to define the data model and asset is proving to be an important differentiator in the IoT marketplace.

Author Bio: Arlen Nipper is president and CTO of Cirrus Link. He brings more than 40 years of experience in the SCADA industry to Cirrus Link as President and CTO. He was one of the early architects of pervasive computing and the Internet of Things and co-invented MQTT, a publish-subscribe network protocol that has become the dominant messaging standard in IoT. Arlen holds a bachelor’s degree in Electrical and Electronics Engineering (BSEE) from Oklahoma State University.