Home  /  Articles  /  Data Lake for Industrial Control Systems

Data Lake for Industrial Control Systems

Published 29 Nov 2019 Updated 2 Nov 2023 Est. reading time 6 minutes

In May 2018, Urban Utilities (UU) and Parasyn commenced their journey of liberating UU's disparate SCADA and other process information system data into one centralised industrial data lake.

Industrial Data Lake at UU

The chosen technologies for this undertaking were selected from the Aveva APM product suite including eDNA Historian and Intelligence. On October 21st 2019, after standing up two data centres, managing 75 Terabytes of historical source data, testing a fully redundant infrastructure and server software applications, the 500,000-point Enterprise Historian system went live actively collecting real time data.

Enterprise Historian symbol showing the continuous cycle of process improvement

The configured data sources currently include CitectSCADA (multiple versions), ClearSCADA, Mosaic, Radtel, LIMS, Elpro and BOM. The first data source began storing real time data from a regional SCADA system in December 2018. Systematically other sites were added, the secondary data centre activated, and system redundancy was tested to make the eDNA Enterprise Historian system a highly available Big Data repository.

When the system went live, Parasyn was part way through training over 200 power users on how to best use the process historian Client Tools and Web interface. After formal training and hands on practice, the Power Users in turn become the champions of the system, ensuring there are plenty of process experts available to help the organisation maximise its use of real time and historical data.

Surprised Users

So, what were the real winners for UU? We considered feedback and made observations about what surprised the stakeholders, users and trainees. Here is a summary of what we heard:

  • Do you mean I can get data about assets in both networks and plant in the one place?
  • Is it that easy for me to export data?
  • Can I really replay what happened in the system even though the data is coming from different unrelated systems?
  • Is this data updating in real time? I thought this was just a database!
Digital binary city skyline representing industrial big data

Technical Challenges

In more technical terms, the enterprise historian system with all its components provides a number of important benefits which in some cases are even more important for a large organisation like UU, and essential for one which is an amalgamation of other businesses. The amalgamation meant UU inherited other organisations' standards, conventions, assets, asset metadata and related data systems, as is. Inheriting other systems and creating a unified user interface is challenging because starting from scratch is never feasible.

For large organisations, the user base, stakeholder group and business processes are wide and varied. UU's approach to change management was big, including technical, executive and business user groups. Change management, a fundamental element of all digital transformation projects, was key to a successful outcome for the Enterprise Historian implementation. Stakeholders were consulted during the entire journey, ultimately leading to an outcome where the stakeholders own what is provided rather than the project team being “advocates” or “defenders” of the new tech.

Change Management, a fundamental element of all digital transformation projects, was key to a successful outcome.

Technical Benefits

The new system has many features which demonstrate what happens when enterprise requirements are successfully gathered and implemented into a single system to deliver coordinated outcomes:

  • Single centralised data store bridging across all infrastructure (networks and plant), simplifying data access, administration, analytics, report writing and a platform for machine learning.
  • Uniform data point configuration using a structure that makes sense to the entire UU group, not just a third-party contractor or any one stakeholder group. Standards or hierarchy can be used to search, locate and consume data, no data specialists required.
  • Native interfaces for ClearSCADA and CitectSCADA, the chosen platforms for UU Network SCADA and Plant HMI.
  • Tools for backloading and maintenance of historical data from legacy systems, so years of old system data can be referenced and correlated with other events such as weather, operational controls, planning and regulatory changes.
Data gathering across multiple SCADA sources

Why UU Needs an Enterprise Historian

Putting sheer size to the side, there are relational databases, plant process historians, relational databases in the cloud and on-premise historians, with only a couple of choices that won't send you broke or to the asylum waiting for basic data about what happened last year.

In terms of system architecture, the eDNA solution provides speed, history storage redundancy, support for data source redundancy, application server redundancy, and a distributed architecture with buffered field data collectors including secure transport and storage. The technology and architecture is flexible and has been tailored for the existing UU source technologies, with plans in place for it to be adjusted as SCADA systems are upgraded, added to or replaced.

What Next for Liberated Big Data?

For a generation, the challenge has been to capture enough data, store it, and retrieve relevant information in a timely fashion. With Digital Transformation initiatives driving process improvement, asset optimisation and workforce efficiency gains, quality data is essential to move to decisions based on empirical evidence. The enterprise historian and intelligent data repositories are a rudimentary building block for this transformation journey, but it is only as good as the data itself and how it is organised.

Many Big Data platform providers and researchers are now concerned that the adoption rate of Artificial Intelligence models into production environments is less than 10%, meaning the best intentions are not leading to long term results. The data lake itself, the technology performance and the availability of infrastructure does not produce business results, people do. Engagement, technology take up and change management have always been the decider.

Projects without interested users are the mothball big data solutions of the 2020s. What is next is decided by how you start and who gets involved. We have the technology, but do we have the right plan and the people to carry it through?