How OT Analytics Turns Raw Data Into Trusted Reporting

Published 1 May 2019 Updated 10 Jun 2024 Est. reading time 10 minutes

One question we hear often: what is the best reporting system for an operational system? Here is how we think through that question.

What Is the Best Reporting System for a Process Historian or SCADA System?

The most common reporting tools paired with process historians are PowerBI, MS SRS, Tableau, Oracle BI, Qlik, and Tibco. Most historian vendors also bundle native analysis tools, trending widgets, or plugins that help novice users start consuming data quickly. Those bundled tools are usually well optimised for their own product. A generic BI tool needs extra effort to shape the data before even an experienced report developer can use it well.

In general, data needs to be shaped, contextualised, and organised for easy consumption, what we call dimensioning the data, before report writers arrive to build something new. Done manually by a developer or DBA, this is complex. Done with a purpose-built layer between the plant historian and the enterprise, it is simple. This abstraction layer creates a single source of truth for reporting data, captured at the rate most likely to be consumed by reports and dashboards, while raw data stays available to engineers, data scientists, and analysts who need it. We regularly see report writers freed from building reports with eight or nine joins, simply because the underlying data was already dimensioned for efficient retrieval.

So, back to the original question: the best reporting system for your process historian might be the one you already have. Most organisations have already selected a BI tool independent of the OT stack, and that makes sense given the relative number of users. The honest answer is that the question is loaded with trade-offs either way.

How Important Is Dimensioning Data?

The larger the data sets and the more complex the relationships between devices, assets, and instruments, the more important it becomes to dimension the data efficiently. Compounding this, people who genuinely understand process data, usually process engineers and analysts, are a precious resource, and they rarely have the technical skills to build data warehouses manually, just as data warehouse specialists rarely understand process data. Technology exists to solve exactly this gap, taking on the complex, repetitive work humans shouldn't have to manually replicate. Ideally, this is a one-time setup, not an ongoing burden.

How Expensive Are Business Reports?

We have heard of an organisation paying well over a million dollars for a single report. Without knowing the specifics, the likely explanation is that the underlying data set was never dimensioned, organised, or simplified for reporting in the first place. If an organisation needs a thousand reports, templates built on undimensioned data simply don't scale economically.

Core products, whether the BI tool or the process historian, often get blamed for a poor outcome, especially when one party implements the OT stack and another implements the BI layer. The polished presentation of a good BI tool can create the illusion that anyone could produce the same result, but the data underneath is frequently poor quality, either unqualified at the source or never dimensioned for consumption. Dimensioning can thankfully be applied retrospectively to systems with existing historical data, so organisations struggling with reporting outcomes today don't need to start from scratch. It's a matter of rethinking how the data is organised and shared, not throwing it away.

How Important Is Trusted Data?

Having data stored and connected to an instrument doesn't make it accurate or useful, it can be repeatable and still repeatably inaccurate. Sampling rate and configuration matter, particularly when publishing to audiences who aren't process specialists, and once confidence in the data is lost, it's difficult to rebuild. The last decade solved the problem of getting data into process historians efficiently; the next challenge is understanding the nature of that data well enough that querying trillions of stored data points doesn't require querying all of it. Data needs context to be properly understood. Two decades ago, the rule was to store everything in case it was needed later. Today, with acquisition rates far faster and decisions increasingly made on the data's face value, that data has to stand on its own, and reporting systems need to be architected for that from the start, something that hasn't always been done well.

How Do We Ensure Process Data Is Correct?

Edge processing, new storage technology, and data cleansing techniques all help, but there's no substitute for calibrating what gets stored in the first instance, an engineering discipline as essential today as it was 20 years ago. Beyond that, the next step is detecting anomalies in existing data streams, traditionally at the operational level where operators flag problems to maintenance. That approach works until the volume of exceptions exceeds what a person can manage. Machine and process learning, as an extension of anomaly detection, can determine deviations to operating envelopes for individual instruments or complex multivariable processes with a level of confidence that now exceeds manual detection, and pinpoint the most significant contributors to that change.

Do I Need New Technology for Digital Transformation?

While sorting out data systems and working toward reliable data sets, it's easy to miss that the same new technologies can be used to clean up what already exists. Machine learning isn't only about a future fully automated asset that predicts failure months ahead, it's also about leveraging today's investment even before calibration and accuracy are fully established. The most important step in data management is setting that new baseline.

A useful baseline needs reliable infrastructure and an architecture that supports dimensioning, so advanced analytics and AI can find the value hidden in an organisation's existing data. Big data analytics and better real-time information drive process improvements that often transfer across an industry, but the real hidden value is specific and local to each enterprise, found by correlating seemingly unrelated local circumstances and process inputs about specific equipment and assets. The repeatability of the data stream is what unlocks that value, and it's achievable today with existing technology. Digital transformation depends on reliable data streams built this way.