Advanced analytics allows engineers and data scientists to work together

Empowering engineers to interact directly with data, often using algorithms developed by data scientists, accelerates time to insight.

By Krista Novstrup December 1, 2021
Courtesy: Seeq

The speed by which people gain insights defines competitive advantage. That statement was recently said by a manager responding enthusiastically to a series of engineers and subject matter experts (SMEs) presenting on the advanced analytics work they had done to increase production, improve reliability and monitor emissions.

The common thread of the presentation was the use of an advanced analytics application to provide the company’s engineers and SMEs with a streamlined method to access data, perform analysis and share findings. By infusing existing knowledge of their respective areas into the analyses, the teams experienced quicker time to insight.

Reflecting on this example, a key factor to success was bringing together process knowledge with advanced analytics and machine learning (ML) algorithms. In organizations today, these two pieces of the puzzle are often siloed. Engineers are on the front lines of the manufacturing facilities while data scientists are in a separate, central organization. But what if that divide was eliminated?

Practically speaking, retraining all engineers in data science is not realistic, nor do all engineers have an interest in becoming proficient Python programmers. However, that does not mean this idea is without merit. Given the right advanced analytics application, engineers can efficiently collaborate with data scientists, providing companies with a competitive advantage through accelerated time to insights.

Data access and analysis challenges

To empower engineers and other SMEs, the challenges of accessing data, analyzing it and engaging the right expertise at the right time must be considered. First, engineers and data scientists need to access the data. Relevant data for analysis comes from multiple on-premises process historians and other databases. It may include data stores for lab data, maintenance tracking, production planning and production accounting — along with new greenfield sensors streaming directly to the cloud.

Information technology (IT) departments have been focused on consolidating this data, moving it from on-premises to the cloud and striving for the holy grail of organizing data for easier discovery. While progress is being made, these projects take months or years to complete when insights are needed today to improve operations. Therefore, engineers find themselves with the onerous task of wrangling various exports from different data sources into a spreadsheet before analysis begins.

With data finally in hand, the complexity of analysis will be limited by access to applications, knowledge of advanced analytics and ML algorithms and available time. Many engineers have become adept at manipulating a spreadsheet to explore and analyze data, and they are eager to expand their skill sets in this area. However, the job of the front-line engineer is one of many competing priorities, typically leaving minimal time to focus on developing complex analyses, especially when using spreadsheets, a tool not designed for the task (see Figure 1).

Figure 1: Spreadsheets were not designed to analyze process data, leading to difficulties.

Figure 1: Spreadsheets were not designed to analyze process data, leading to difficulties. Courtesy: Seeq

An antidote has been the hiring of data scientists, creating centralized teams and occasionally embedding a data scientist within the business. These data scientists have knowledge of advanced analytics and ML algorithms, and they have been trained to use Python and various other software platforms to apply the algorithms. However, the data scientists are siloed from the engineers or, more specifically, the process knowledge and experience that resides with the engineers.

As a result, analyses completed solely by data scientists are often not optimal due to a lack of process knowledge, making them less likely to succeed, and less likely to be trusted and acted upon by plant personnel. Alternately, analyses require excessive time spent by data scientists and engineers collaborating with each other. Either way, an opportunity exists for companies to realize increased benefits.

Advanced analytics applications address issues

The right advanced analytics application can address these and other issues by connecting engineers and SMEs directly to the data of interest. This type of application can connect to the data wherever it resides, allowing engineers and others to investigate and analyze data at their stream of thought. It also enables data scientists to develop and deploy algorithms embedded in these applications for use by front-line engineers.

Advanced analytics applications connect to all relevant data sources (e.g., process, lab and maintenance) using a suite of connectors. This provides access to data from where it is currently stored, without the need to copy, and then store it in a data lake or the application. Furthermore, advanced analytics applications connect to multiple data sources simultaneously. Implementation of the solution does not depend on IT departments completing their data projects. Instead, as those new data stores come online, the application can be updated to use them (see Figure 2).

Figure 2: Seeq can be used to automatically access data from a wide variety of data sources.

Figure 2: Seeq can be used to automatically access data from a wide variety of data sources. Courtesy: Seeq

Next, advanced analytics applications provide engineers with tools to cleanse, contextualize, investigate, model and monitor. Tailored to the unique challenges of time series data, tasks that are tedious in a spreadsheet or business intelligence tool become trivial. Additionally, the application must incorporate collaboration tools to document the analysis steps, and to create reports and dashboards for sharing insights with colleagues, operators and managers. This ensures the results are seen by the right audience. With the right advanced analytics application, not only are engineers able to accomplish existing tasks with greater efficiency, which allows them more time to focus on other activities, but they can also investigate and follow hunches they otherwise would not have pursued.

Finally, advanced analytics applications that can incorporate ML algorithms mean engineers do not need to learn how to use new software tools to expand their skill set. Instead, they can access advanced algorithms and ML models in the same place they are already trending data, monitoring operations and performing analytics. Application of these algorithms becomes a natural extension of their workflow and not a separate effort. In this environment, engineers can infuse their process understanding into the creation of an ML model, and then immediately vet the results against related process data and analysis. Now, instead of the data scientist’s efforts resulting in one model for one asset, the data scientist’s efforts can be scaled to allow many front-line engineers to develop models, each for their respective assets.

Advanced analytics in action

Value chain optimization. In a refinery, intermediate feedstocks are regularly purchased to allow reactors to run at production capacity when upstream units are not producing enough feed. The composition and quality of the purchased feedstock impacts the amount of feed that can be temporarily stored onsite and then processed. Poor quality feedstocks are sold at discount, which means a greater margin in profit if a refinery can purchase and process them for use. Typically, the window of opportunity to purchase these feedstocks is limited, so a process engineer must quickly confirm the refinery will be able to store and process the feedstock.

One process engineer at an oil and gas company questioned the existing models, which indicated they needed to limit their purchase of poor-quality feedstock during the summer months. Using Seeq’s advanced analytics application, the engineer found periods of similar and dissimilar operation during both summer and winter months and was then able to cleanse the data and generate a new model based on actual plant data and seasonal operation (see Figure 3).

Figure 3: Using Seeq, an engineer developed a correlation algorithm and deployed it as an add-on tool to identify input signals with greatest effect on operations.

Figure 3: Using Seeq, an engineer developed a correlation algorithm and deployed it as an add-on tool to identify input signals with greatest effect on operations. Courtesy: Seeq

Using the new model, the engineer determined the plant could process two to three times as much poor-quality feedstock. The analysis and results were then summarized in a report for review with management, and they decided to purchase additional poor-quality feedstock, resulting in a realized value of more than $1 million per year.

Environmental stewardship. For manufacturing facilities in any industry vertical, environmental stewardship is on the shortlist of priorities to meet regulatory requirements and improve environmental, social and governance (ESG) scorecards. Accounting for, reporting on and minimizing emissions is not a trivial task. A single manufacturing facility can have numerous sources and an even greater number of operating variables, which influence emissions.

Data scientists at an oil and gas company developed a neural network algorithm to estimate NOx emissions based on current operating conditions. The ML algorithm was deployed via an add-on tool within Seeq, and then used by the company’s engineers worldwide, who then applied the algorithm to their specific facilities. Furthermore, the engineers used the ML algorithm to perform what-if scenarios that allow them to make informed decisions regarding changes to operations for reducing emissions.

The result was a widely adopted tool, which became a best practice for monitoring and reducing emissions. Instead of the data science team trying to apply their algorithm to each facility and report model results, the company scaled their efforts by deploying the algorithm across sites. In turn, front-line engineers seamlessly accessed and applied the algorithm from within the Seeq advanced analytics application they were already using for routine monitoring and troubleshooting.

The data scientists leveraged the engineers’ ability to access, cleanse and contextualize the data and to use a new algorithm deployed from within a single application. By applying the algorithm developed by the data scientists, the engineers became an extension of the data science team.

Final thoughts

Engineers and SMEs have a wealth of domain knowledge and understanding of their specific processes. Deploying ML algorithms in a way that makes it easy for engineers to access and apply those algorithms ensures that their process knowledge is incorporated in model creation, used to vet the results and produce rapid insights.

Consequently, when the data science team isn’t focused on applying their algorithms to every asset, they are able to focus on developing the next algorithm to increase sustainability, reduce downtime or increase production.

So, is it realistic to imagine every engineer a data scientist? In the strictest sense of formal education, probably not, but the lines are becoming increasingly blurred as companies adopt advanced analytics applications, which incorporate ML algorithms.

Author Bio: Krista Novstrup is a principal analytics engineer with Seeq Corporation where she has spent the last three years helping customers gain insight and value from their process data. In addition to her customer facing role, she is the analytics engineering group manager for EMEA and Partners. Prior to joining Seeq, she was with ExxonMobil Research and Engineering Company for eight years. Her most recent position at ExxonMobil was as global technology lead, supporting planning optimization. Dr. Novstrup earned a BS degree in chemical engineering from the University of Washington and a PhD in chemical engineering from Purdue University.