Data analytics for everyone: Beyond spreadsheets

Love it or hate it, the spreadsheet has been the backbone of every engineer’s analytics effort for close to 40 years. With spreadsheets approaching middle-age, now older than many of their users, there is a sense that we are nearing the end of an era due to the blunt approach of the spreadsheet and associated issues, and due to a new generation of data analytics offerings.

Consider the recent introductions of Schneider Electric’s EcoStruxture products, Rockwell’s Scio and ABB’s Ability platform—with their cloud, advanced analytics and machine learning messaging as common elements. Certainly, these companies’ marketing teams are working hard to brand and promote their new offerings, but at the same time it’s hard to miss the need for vendors to say and deliver something new to the market.years. With spreadsheets approaching middle-age, now older than many of their users, there is a sense that we are nearing the end of an era due to the blunt approach of the spreadsheet and associated issues, and due to a new generation of data analytics offerings.

Seeq's Michael Risse

The previous analytics generation, call it the “actionable insights era,” was defined by the spreadsheet and associated applications. It assumed a single, desktop-based user, interacting with disconnected data to produce—following hours of work and labor—a moment of insight.

It’s a condemnation of the actionable insight generation that after so many years of promises, there was so much concern about Grey2K. Where was all the insight and expertise delivered by actionable insights? It was in the heads of the employees streaming out the door, inaccessible at worst and undecipherable at best, in the formulas and files on the hard drives of the departed.

But as we look forward to a new generation of analytics, leaving behind actionable insights, it’s important to recognize that whatever succeeds it has to equal and improve on the core values of the spreadsheet generation. The reality is no product that comes close to spreadsheets for data analytics in terms of breadth of users across industries. Despite the fact that spreadsheets seem particularly ill-fitted for working with continuous-analog signals, and despite marketing claims of new and improved offerings, its use continues, for two main reasons.

First is simply the accessibility of spreadsheets to users. In an age of technology innovations for other audiences—machine learning for data scientists, cloud computing for IT departments, Google search for consumers—the spreadsheet remains the accessible solution at hand for every engineer.

The second is flexibility: the ability to solve any type of problem, given enough time and effort. If the spreadsheet doesn’t have the needed feature, there’s always a function or code snippet to close the gap, either available online or custom coded.

These foundation points for broad acceptance of spreadsheets must be considered the starting point for any discussion of process industry data analytics going forward. From there, we can consider the following four requirements for what will replace the spreadsheet and address its shortcomings:

Subject Matter Expert (SME) Focus and Flexibility
Simple Sharing and Saving of Analysis Results
Big Data and Cognitive Computing Capability
Process Manufacturing Focus

SME focus & flexibility

The first requirement for the next generation of analytics tools is the strength of the spreadsheet user experience...it’s personalization. It turns out that employees who don’t know what they are looking for don’t know what to find, so a self-service, SME focus is the absolute starting point for any new analytics solutions. This requires a reasonable price point, ease of use and a fit for data analytics in process manufacturing.

Meeting these requirements enables data experimentation, enabling SMEs to try things and investigate data, assets and processes without a five-figure purchase requisition or the need for IT assistance. Self-service data analytics delivers ad hoc, ask anything, interaction by SMEs with data located in one or multiple silos.

For SMEs, a self-service approach and flexibility to solve a variety of problems are the nucleus of the front-line analytics experience in process manufacturing. New solutions will offer features paralleling the changes already seen in alternatives for spreadsheets in IT departments, such as cloud-based, multi-user, large scale and visualization solutions from companies such as Smartsheet, Anaplan and Tableau.

It’s interesting that there are so many billion-dollar companies already competing in the IT space as spreadsheet replacements, whereas options for process manufacturing data analytics have lagged behind. This situation is not likely to continue, as data analytics solutions for process industry applications are emerging.

Simple sharing and saving of results

As previously discussed, one of the main failures of spreadsheets has been their focus on a single user. New data analytic solutions must therefore move beyond individual users to support collaboration among teams, colleagues and even off-site vendors and partners. These solutions must also support knowledge capture in the analytics process to support greater transparency and re-use of efforts.

The prior spreadsheet model is single user-centric, with collaboration after-the-fact in network files and SharePoint sites at best, and frequently no sharing of results at worst. Going forward, new data analytics offerings will have collaboration features to enable teams and organizations to participate in the development of solutions, and then easily find and leverage work already completed [Figure 1].

Figure 1: Collaboration screenshot. Collaboration through online publication and sharing of results is a must for a modern data analytics solution.

The actionable-insights generation had two levels of disconnect. First, data was disconnected from its source and copied or transformed into a spreadsheet for processing. Second, the output of the analytics effort was typically distributed as an image of the output as a static graphic, on paper or in a PowerPoint slide.

In the future, by leveraging a web-based approach to data analytics distribution, a viewer should be able to click through from data visualizations to the underlying data, and even back to the source data. This doesn’t just mean a live dashboard with constantly updating sensor data, it also means automated support for calculated outputs on data streams and batches beyond simple measures and metrics. The result is a single model spanning KPIs and documents that can be easily shifted across time ranges because the underlying connection to the source data is always maintained.

The term sometimes used for these last two points on the engagement and access to insights across the analytics process and outcomes is “data democratization”. The result is more employees participating, accessing and interpreting data.

Big data and cognitive computing capability

Data volumes, data diversity and data contextualization are the obvious starting point for what the next generation of analytics needs to address. Current volumes are overwhelming existing tools that were never meant for such tasks: millions of rows, thousands of columns. And we should expect only more sensors and more data as wireless data gathering, cloud-based data storage and other technologies continue to advance. The unfortunate result for many process industry firms is lots of data, but not much in the way of useful information.

In particular, the need for data contextualization, ensuring relevant manufacturing and business information is accessible to place data in the wider context of the organization, will grow as more data becomes available from more sources. Typical solutions today may tap up to 10 sources for recipes, sensor data, quality attributes, personnel information, asset data, etc.—each stored in their own silos. The number of data sources required to solve a problem is likely to grow as SMEs are empowered to pursue ad hoc investigations across any data set at any time.

Figure 2: Accelerating Impact of ML. The right data analytics solution, in the hands of a process engineer or expert, can be used to accomplish all the steps needed to create and distribute insights.

The second component of the next generation for analytics is leveraging the innovation in data management and processing, and cognitive computing. The more popular terms for these innovations include big data and machine learning, among others [Figure 2]. These innovations represent perhaps the biggest gap between the past and future of data analytics, because tapping the incredible intellectual property contributed by the world’s leading software companies will result in radical improvements to the data analytics user experience. Any solution which does not leverage these technologies, and other new advances as they occur, will quickly become obsolete.

Process-manufacturing focus

Data cleansing, wrangling and transformation requirements are key reasons why the actionable insights era needs to be replaced with solutions purpose-built for time series data analytics. About 70% of the top 10 Excel functions are for manipulating and organizing data including IF, LOOKUP, VLOOKUP, MATCH, CHOOSE, FIND, INDEX, etc. These functions are required to reformat time series data into rows and columns before the actual insights begin, and thus cut into productivity.

Further, data manipulation activities typically represent 50-90% of the time spent in spreadsheets. These querying, slicing, dicing, sorting, affiliating, synchronizing, interpolating, pivoting, contextualizing and indexing activities must all take place prior to actual work. A new generation of analytics address these mundane, repeated efforts required for working with time series data is long overdue [Figure 3].

Figure 3: Analyzing data screenshot. Creating value from process data required direct and visual interaction, as shown in this Seeq screenshot, that integrates time series data and calculations in one user interface.

Finally, the next generation of analytics solutions must leverage, but not require, a web-based computing model. This will support the current user experience, and a timeline for a gradual adoption of web or cloud deployment aligned to the customer’s timeline.

For the user, this means browser-based access to data analytics from a variety of locations and devices. Engineers shouldn’t have to be where their data or computer is for insights and activity. At the same time, the migration to the cloud for data storage and access will continue to evolve for at least another decade, and in some cases far longer, so a cloud-only deployment model will not suffice for most firms today.

These requirements may be balanced by having a web-based architecture for on-premise or cloud-based deployment, with the ability to connect to data sources residing either on-premise or in the cloud.

Conclusion

Saying goodbye to spreadsheets for process data analytics will help firms deliver on the expectations and requirements of new business imperatives such as digital transformation, Industry 4.0 and smart manufacturing. Whatever the term used, the common outcome of these initiatives is delivering benefits to organizations in terms of improved margins, yield, quality and safety by producing insights from new and collected data sources. These benefits will be delivered by a new generation of data analytics offerings.

Author bio

Michael Risse is a vice president at Seeq Corporation, a company building innovative productivity applications for engineers and analysts that accelerate insights into industrial process data. He was formerly a consultant with big data platform and application companies, and prior to that worked with Microsoft for 20 years. Michael is a graduate of the University of Wisconsin at Madison, and he lives in Seattle.