The worlds of analytics and data management are colliding. Digitally leading industrial enterprises are shifting their focus from how data is retained and controlled to how data is used and accessed, especially in the cloud. Data operations (DataOps) focusing on industrial subject-matter experts and industrial data scientists simplifies the processes of connecting to multiple IT and OT data sources, discovering data, performing conditioning and modeling, and building analytics models.
In this guide, you will learn more about the following aspects of DataOps and see examples of DataOps in action:
If there is one technology trend aside “AI” that is set to define the 2020s, it is “Ops.” From DevOps to DataOps to MLOps, focus is rightfully put on end-to-end operationalization rather than initial code, data, or algorithm development alone.
According to Gartner, “DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization."
Forrester defines DataOps as “the ability to enable solutions, develop data products, and activate data for business value across all technology tiers from infrastructure to experience."
It’s interesting to note that both of the definitions emphasize the centrality of collaboration to the concept, and they focus on integrating data across different parts of an organization. DataOps is very much about breaking down silos and optimizing the broad availability and usability of data.
Ultimately, DataOps aims for predictable data delivery and change management, using technology to automate, orchestrate, and operationalize data use and value dynamically.
DataOps is fundamental to driving up data literacy, the ability to read, write, and communicate contextualized data. Data literacy, in turn, is key to obtaining optimal value from data and technology. Therefore, the more organizations embrace DataOps, the better equipped they will be to truly harness the transformative potential of data.
Learn how Industrial DataOps powers OMV's asset integrity operations framework, cutting planning time by 25% and operating expenses by 50%, and helping the company reduce its environmental footprint.
These unprecedented times compel us to examine our operational efficiency and resilience, including how we operationalize data to address the needs of our diverse data consumers.
It is no secret that data management and analytics workflows have always been complex, siloed, and costly to enterprises all over the world.
Similar to DevOps, DataOps has a compelling and topical value proposition. With DataOps, you reduce specialized roles in your data-to-value workflows and enable higher data consumer autonomy and empowerment, thus creating higher resilience and a more lean and cost-efficient core for digital transformation.
According to Gartner, “the goal of DataOps is to create predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to orchestrate and automate data delivery with the appropriate levels of security, quality and metadata to improve the use and value of data in a dynamic environment.”
Additionally, Gartner (2019) found that analytics and data management leaders’ highest-ranked observation of change in environment is the existence of an “increased demand for new data types and the ability to model it independently” and that “the number of data and analytics experts in business is growing at three times the rate of experts in IT departments, which will force companies to rethink their organizational models and skillsets”.
Similar to DevOps for professionally developed software, DataOps and low-code form the technological foundation for citizen data science- and citizen-developed applications. Imagine what your data consumers can do when they are all empowered to speak data, to independently and collectively access all relevant, contextualized data, and to safely develop the next generation of productivity-enhancing digital applications.
In the tailwinds of DevOps’ transformative impact on SaaS companies and, more recently, enterprise-specific custom software development, DataOps is poised to deliver similar transformative impact on the speed of delivering converged data management and advanced analytics solutions to businesses.
To adequately extract the value of industrial data insights, it’s essential to make operationalizing data core to your business strategy. For industrial companies, the path to value from data liberation requires three crucial steps: data must be made available, useful and valuable in the industrial context.
Data engineers working on industrial digitalization projects often struggle with access to key source system data, a situation reminiscent of that in non-industrial verticals a decade ago. But industrial companies are not only facing the same challenges as, for example, their retail peers. They are presented with a superset of challenges resulting from the IT/OT convergence and associated nonconventional-IT-only data velocity, variety, and volume.
OT/IT convergence is not about turning IT pros into plant engineers or machine operators into data scientists — although the latter is indeed happening — but about executing on a strategy to align and bring together formerly isolated SMEs, cultures, data, and platforms deployed by OT and IT teams to improve operational performance through unified goals and KPIs.
Many organizations have already achieved step one towards value from data: liberating data from siloed source systems and aggregating it in a traditional data warehouse (DWH). In today’s mature DWH market, progressive data-driven organizations are actively using data fabric solutions as a complement to existing DWH strategies. With data fabric, organizations can liberate their data once again. Lifting it from the pool of aggregation and turning it into contextualized knowledge to deliver on their ambitions for advanced analytics.
But what is data fabric? And what sets it apart from data warehousing? More than 90% of your data lake data is not used by anyone in your business. Data scientists cannot find the data they need, nor discover contextually related data sets. Executive patience to orchestrate relevant, trusted data delivery to business is getting shorter.
Companies need to take the plunge into DataOps. They need to focus on semantically enriched data fabric, not legacy data lakes. Complementing existing DWH solutions with data fabric has dramatically reduced costs, while simultaneously enabling scalability, the speed of development, and data openness throughout our many complex customer organizations.
The two main pillars of data fabric are context and discovery. These define data fabric and make it both distinctly different from and complementary to existing DWH. Data context is the sum of meaningful use case supportive relationships within and across different data types and data artifacts. Data discovery is about making data effortlessly available to the right user in the right format.
Context — the direct and indirect relationships and metadata that turn "dumb data" into meaningful information — can only be unlocked by connecting OT with IT. Liberated data without full OT/IT context remains largely worthless, resulting in organizations drowning in data, but starving for insights — no matter where the raw data resides or who has access to it. As for liberating data, as a first step before contextualization, the focus needs to be on data liberation from the OT core as opposed to from the IT core.
Raw data — absent of well-documented and well-communicated contextual meaning — is like a set of coordinates in the absence of a mapping service. While some applications benefit from raw data, most applications — especially low-code application development — require data that has undergone some additional layer of contextual processing. This includes aggregated data, enriched data, and synthetic data resulting from machine learning processes.
While raw data is theoretically available across realms of immediate, potential, and not-yet-identified interest, active metadata management is often a mere afterthought. It winds up as the technology project’s flagship KPI but lacks enthusiasm or investment from the stakeholders along the way. This is where the value of data contextualization becomes most pronounced. Aggregated, enriched, and synthetic data delivered as an active data catalog is far more useful to application developers. Data lakes that only store data in an untransformed, raw form offer little relative value. The solution is to complement existing data lake practices with data contextualization.
Data contextualization goes beyond conventional data catalog features by providing relationship mining services using a combination of machine learning, rules-based decision-making, and subject-matter expert empowerment. Many mid-sized organizations, operating mostly with IT data, may benefit from a simpler data catalog solution. However, large industrial asset operators dealing with the synthesis of OT and IT data need enterprise-grade data contextualization solutions.
Contextualized data generates immediate business value and significant time savings in many industrial performance optimization applications, as well as across advanced analytics workstreams. We have found that data science teams are one of the greatest beneficiaries of data contextualization.
Enabling all data consumers to have instant access to all data with context is not easy, as data management and analytics workflows have always been complex, siloed, and costly. Adding insult to the injury, the competition for AI/ML data scientists is imposing its own set of requirements on data modeling, data source availability, data integrity, and out-of-the-box contextual metadata on data. In the rush to show digital execution, many have embraced the AI hype that has led to quickly demonstrable digital proofs of concept, but not truly operationalized — and even less scaled — concrete business OPEX value.
As contextual intelligence emerges as a crucial discipline on the new landscape, organizations are set to increase their investments in automated and guided data contextualization capabilities.
Moreover, there is a move away from imposing a single structure on data sets, in favor of an active metadata approach. This is a product of the multiple diverse structures and insights that emerge from data via AI and ML augmentation. Augmented data management has emerged as a vital tool in various offerings, including active metadata, AI and ML algorithms, and data fabric designs using semantic knowledge graphs.
Augmented data management is ushering in a new phase of data management, where the long-anticipated collaboration between humans and machines — specifically the AI and ML engines — becomes a reality. Together the two work across the flow of data within the company, with humans performing creative and strategic activity, supported by the processing and heavy lifting power of artificial intelligence.
It is not the data alone that holds the keys to value, but our ability to understand and operationalize the data. To make our data speak human and drive data literacy, “the ability to read, write, and communicate data in context” (Gartner), among our data consumers. This is why metadata is the new black. Metadata is the key for almost all forms of automation and augmentation in modern data utilization scenarios. Indeed, the ability to identify meaningful relationships — across data types, people, places, and objects — is one that is absolutely fundamental to generating real value from data and analytics.
When assessing modern data management solutions, augmented capabilities are becoming a key differentiator. Under increasing commercial pressures, data and analytics leaders need ways to connect, ingest, analyze, and share data more efficiently, with both increased speed and lower cost.
In line with the trend towards democratization of data and data management, the AI/ML convergence is bringing new, less specialist users into the picture. The traditional roles of data architect, data scientist, and application developer are being joined by roles such as citizen data integrator, citizen data scientist, and citizen developer for greater productivity across a greater scope of use case solving.
In this new environment, the organizations that succeed will be those who can take the leap beyond traditional approaches. Mainstream self-service analytics are no longer adequate. Neither is a continued reliance on specialist data scientists, given their scarcity and high cost.
Instead, industry must embrace the citizen data scientist, who is not a specialist in data by background, but is empowered with the capabilities and practices that enable them to harness data effectively. The democratization of data management and technology means that more team members can glean predictive and prescriptive insights from data. They can come from a variety of roles. Despite not having the same analytical or technical skills as expert data scientists, they are still able to drive real value for the enterprise.
DataOps represents a new way of servicing data consumers, both old and new — data and analytics professionals as well as business and engineering professionals — with the same real-time contextualized data at your fingertips experience.
Learn how Industrial DataOps enabled secure sharing of selected live data between Aker BP and Framo, cutting emissions and waste by reducing maintenance needs by 30% and shutdowns by 70%, and increasing pump availability by 40%.
With always-on secure data access for all data consumers within and around our industrial enterprises becoming the new norm, the focus of chief data officers is turning to the intersection of data engineering and data custodianship (IT) and the business domain expertise for that data (OT/ET). This in turn is fueling a reversal of the past years’ rush toward the data lake as the single source of all data — replaced by interconnected domain Data-Products-as-a-Service.
To successfully implement Industrial DataOps, it is essential to move from a conventional centralized data architecture into a domain data architecture (also referred to as a data mesh). This solves many of the challenges associated with centralized, monolithic data lakes and data warehouses. The goal becomes domain-based data as a service, not providing rows and columns of data.
For domain data architecture to work, the data product owner teams need to ensure their data is discoverable, trustworthy, self-describing, interoperable, secure, and governed by global access control. In other words, they need to manage their data products as a service, not as data.
Any organization hoping to successfully transition into offering Data-Products-as-a-Service needs to embrace a shift in data governance ownership. The necessary move is from a centralized data team, such as digital or data center of excellence, into a collaborative setup where each data domain is co-owned by the respective business function producing the data in their primary business tools.
AI-driven active metadata creation permeates industrial data management, shifting the emphasis from data storage and cataloging to a true human data discovery experiences. For application developers and data scientists, understanding — and handling — industrial data is not as straightforward as dealing with most tabular data. What is even more difficult (for all who are not SMEs with years of intimate experience with the asset) is understanding the context of industrial data. Using NLP, OCR, computer vision, trained ontologies, and graph data models, ET/OT/IT will be automatically contextualized for intuitive human as well as programmatic discovery and analysis.
This means that the same shift that has already taken place on the consumer side, for example in e-commerce — where the primary navigation has shifted from product catalog navigation to automated content recommendations — will transform enterprise data discovery experiences, enabling new data consumers to self-service access large, varied, and complex data sets for the first time, unlocking citizen data scientist and citizen developer innovation potential at scale.
To operationalize data at scale with Industrial DataOps, you need to contextualize your data first. Contextualization — finding relationships in data and across data types, dimensions, things, and objects — forms the foundation of modern data and analytics. This applies to knowledge graphs, data fabrics, explainable AI, analytics on all types of content, and providing richer context for ML and AI.
True augmented data management delivered using DataOps will reduce the reliance on IT specialists for repetitive and low-impact data management tasks, while making your data consumers more independent and successful.
For organizations keen to extract the benefits of the new and emerging practice of DataOps, Gartner offers the following advice for launching a project:
1. Target a tight, well-defined scope.
2. Ensure there is a level of executive sponsorship: ideally from the CDO or other high-level data and analytics leader.
3. Be prepared to encounter and overcome resistance to changing existing practices as the concept is introduced.
4. Exploit other emerging practices, like data literacy, to deliver on the promise of DataOps.
5. Remember: DataOps is a practice, not a technology or tool. It is a cultural change that is supported by tooling.
Continuous and reliable delivery of data will improve the speed and effectiveness of:
In addition, the drive towards automation and more reliable pipelines of data is set to minimize risk and multiply data deployment opportunities within organizations.