Modern methods to unravel data complexity

Emerson's Stuart Harris

Data lake. Data fabric. Data brokerage. Data plumbing.

Whatever you want to call it, those of us in the industrial world acutely understand the challenge of extracting and integrating the various fragmented sets of data that industrial facilities generate. We hear from customers that data-access issues have delayed and even stopped projects from moving forward, despite promising business cases.

Thankfully, technology exists today to simplify data management. But it is more difficult than simply transporting all the operational technology (OT) data to the enterprise cloud and turning it into gold with advanced analytics. You have to rely on data-management technology that can connect to and collect relevant data from disparate sources across the enterprise and then contextualize that data so that it can be translated into the actionable information needed to make good business decisions. And these complex tasks must be carried out without taking down networks, adding security vulnerabilities or disrupting operations.

Many companies have found the right balance to succeed with this approach, but others are still feeling their way through. While every company is unique, there are some common data complexity trends that vex industrial manufacturers:

Diverse data sources. It’s hard to understate how diverse the data sets are in typical manufacturing facilities—potentially hundreds of dedicated interfaces developed to different standards by a multitude of vendors, generated by equipment and systems using different protocols. From structured time-series data (temperature, pressure, flow and alarms) to unstructured data (videos, images, maintenance manuals and operations logs), OT data is highly complex and varied.

Connectivity. All the diverse data sources also have their own connectivity issues. Besides connecting to data from physical assets, manufacturers also need a broad library of connectors that integrate with other data sources like control systems, other software platforms or even interfaces for non-production information that is sometimes as obscure as weather data. If you can’t get all the data to where it needs to go, it just remains siloed—and, ultimately, unused.

Legacy systems. Traditional process historians were developed decades ago using a broad range of protocols, some industry specific, custom built and outdated. They were not designed for today’s world of big data, and connecting the data from these legacy systems to external computing platforms is difficult. Furthermore, if you transport large historical data sets or stream large volumes of real-time data without managing the communications at the edge, you run the risk of disrupting production systems or networks, a serious and undesirable consequence.

Data models. Collecting and connecting diverse data is the first step, but being able to contextualize and understand the relationships of all the data is critical. For example, in a manufacturing plant, a reliability engineer and production accountant may access some of the same data in their job functions, but they will use it in very different ways.

Networks and firewalls. OT systems and process-data historians reside on the plant-level network, with one or more firewalls between these networks and the enterprise cloud. Like legacy systems, plant-level networks were not designed to move massive amounts of data to the cloud. The way network loading is balanced during the transport of large volumes of data is critical in successfully landing this data in cloud platforms. Network firewalls, as well as OT and IT security policies, are additional barriers to collecting and organizing vast amounts of operational data.

The solution

With these data complexity issues in mind, the ideal solution for securely moving all the disparate OT data to the cloud is to use a central site and transport it as a single, compressed and encrypted stream across network boundaries, then to the cloud.

A single system that identifies, collects and contextualizes disparate data across an enterprise has many advantages, including delivering data that is meaningful, ready to use and specific to a user’s needs. Centralized data also reduces the overall number of systems that must be managed and protected from potential threat vectors.

Unifying data into a single repository offers the ultimate architecture flexibility and scalability, enabling connections to manufacturing data from multiple applications that can be leveraged at scales previously not possible. Through this approach, organizations can establish a sound data foundation that enables them to realize the true benefit of digital transformation: achieving significant operational improvements across the enterprise.

Stuart Harris is group president of digital transformation at Emerson