Bringing data modeling into focus

By Tony Paine, CEO of HighByte

By now, most manufacturers understand the value of Industry 4.0 data, but many struggle to standardize information coming from many different sources. As organizations connect more devices, they need a consistent way to deliver information to ensure interoperability and data integrity.

This is where data modeling comes into play.

The data model helps standardize raw input data and allows for contextual information—such as a source description, unit of measure, min/max ranges—that, when pulled together, define a real “thing” such as an asset, process, system or role.

Understanding the basic attributes and function of a data model helps manufacturers share and analyze information more effectively as they scale.

What is data modeling?

Data models are familiar to industrial users, but they take on different definitions based on a person’s experience or job function. For instance, a controls engineer might refer to the data model as a “user-defined tag.” Others associate data models with the library of built-in complex tags provided by the PLC vendor.

There are also standardized data models or data sets, such as ISA-95 and companion specs to OPC UA, which sometimes relate to specific industries. These models define nomenclature, how data should be represented, and the model’s structure (how it is laid out).

But many data models are vendor-specific (sometimes even device-specific). In addition, IT systems and cloud applications were designed to model, receive, and store information in a specific format. Across OT and IT devices, systems and applications, there is a lot of data-model diversity with little standardization in real-world practice.

Why is data modeling important?

Data modeling is important because models standardize information, enable interoperability, show intent, determine trust, and ensure proper data governance.

To expand on these ideas, data modeling enables the standardization of how data is categorized and pulled together for additional meaning. Modeling allows for interoperability when sharing information across various applications or when sharing the information between people with different knowledge of and use for the data. Users with different functions need to be able to look at data and quickly understand its source, structure and what the model represents (like a pump or a production line).

Also, data modeling shows intent: what a value is, what it should be, if it’s in an acceptable range, and whether it can be trusted. Effective data modeling requires a separate abstraction layer dedicated to merging, modeling and securely sharing data for proper data governance across the enterprise.

What does a data model look like?

A data model is not—and should not—be complicated. At its most basic definition, a data model is one or many name-value pairs. Data models are created as logical collections of these name-value pairs that are related in some way and—when put together—become a valuable and useful information object.

For example, you might create a data model that represents a thermostat. The first attribute is a current value. The second attribute is a set-point value. The third attribute is a unit value. The model clearly articulates how a thermostat should be represented for the enterprise. In this example, every thermostat will have a name, current value (a floating-point value), set point, and unit of measure (a static character indicating degrees Fahrenheit or degrees Celsius).

A thermostat is obviously a simple thing to model. But this same concept applies to even the most complex process or piece of equipment that you may want to model. You will distill the model down to its very primitive data points that, together, have more important meaning. You will include any contextual attributes in the model that describe what the data points are and what they should be, so the information becomes self-describing.

Where should I model my data?

To achieve a data infrastructure that can really scale, you need a dedicated abstraction layer. Let’s examine the reasons why.

A single environment

A dedicated abstraction, or DataOps, layer is essential because not every application conforms to one standard. Orchestrating integrations within a dedicated layer genericizes data modeling, so a user can work in a single environment to model any number of things. That layer then becomes responsible for transforming data into the specific data-modeling schemas for all consuming applications. This is game changing for users who need to collect, merge, transform, and share information with many applications on-premises and in the cloud because they can make changes or scale without breaking existing integrations.

Visibility

Oftentimes, automation engineers know they have hardware and software on the plant floor producing and collecting raw data, but they don’t know who is connecting to them and what data is being shared. DataOps provides visibility into who is producing and consuming the raw data and how changes will impact the rest of the enterprise.

One-step data prep

A centralized location also reduces data-preparation redundancy and decreases system-integration time. Information can be automatically propagated to any vendor’s application without touching each application individually.

Passive connectivity

A DataOps layer provides passive connectivity, meaning users won’t need to schedule downtime or rewire integrations to establish communication with the solution. An industrial DataOps solution can passively drop in, make connections to existing data sources, pull data, transform it, add context to it, and then push out real-time modeled information to running applications using their respective APIs.

Data quality and accuracy

The DataOps layer transforms raw data before making it available to all consuming applications so there is less chance of errors being made upstream. It provides a central location to manage conversions and transformations. If there is an error, it is detected quickly and easily fixed without troubleshooting each application or mining custom code.

Security

The modeling environment enables only authorized individuals to determine which applications should receive data and exactly what data they should receive. Consuming applications no longer have unfettered access to raw data sources. The DataOps solution abstracts away this direct connection. And rather than burying integrations in custom code, they are visible to authorized users and help protect potentially critical infrastructure.

Data modeling doesn’t need to be complex. Start small. The key point to remember is that effective models distill data sets down to their simplest form so they can easily be reused, helping you achieve standardization at scale to fully leverage the benefits of Industry 4.0.