When companies evaluate Cognite Data Fusion as an industrial AI and data platform, we are almost universally asked this question: "Do you persist data from our systems of record?"
This question is often framed as a straightforward yes or no, with the implication that persisting (storing) data is detrimental, while federating (integrating without physically moving or duplicating) data is ideal.
At Cognite, we boldly persist data. We always have. Counterintuitively, persisting data reduces complexity and total cost of ownership while providing our customers and partners an interactive and unparalleled experience when engaging with their data.
While data federation sounds appealing, solving critical operational challenges with industrial AI demands that data be accessible in real time, interoperable, and reusable. Scaling use cases that optimize processes, accelerate troubleshooting, reduce downtime, and increase efficiency requires interactive-level user performance when accessing data.
The Upside of Persisting Data
The benefits of data persistence are often overlooked when building a scalable data foundation for industrial AI use cases. For example, use cases like advanced troubleshooting with AI agents require both people and AI to engage with OT, IT, and engineering data. To deploy and scale this and similar use cases across multiple sites or units, there are three distinct advantages to persisting industrial data:
1. Interactive-level user performance
Whether operations and maintenance teams are trying to access relevant information in the field or process engineers are trying to troubleshoot process disruptions, speed is paramount. Persisting data in Cognite Data Fusion provides unparalleled interactivity of industrial data. 3D models are optimized to be accessible in the field, and five years’ worth of time series data can be called in less than 1 second. This is a real-time experience built for enterprise scale, as time-sensitive workflows don’t need to continuously call data from source systems for every action.
Federated models, by their very nature, struggle to deliver this level of performance when dealing with the sheer volume and velocity of industrial data. Persisting data enables optimized indexing and retrieval, making interactive exploration and analysis a delightful experience.

2. Content-rich Contextualization
The importance of context is commonly accepted in industry. However, contextualizing complex and inconsistent data with different formats, nomenclatures, and structures must extend beyond simply building associations. When data is federated from many source systems, contextualization is limited to metadata associations. Building metadata associations means a P&ID cannot be linked to an asset in an ERP, time series data from a historian, or a point cloud or 3D model. Where federating data falls short is the inability to access context within the content itself.
By persisting data, Cognite Data Fusion takes contextualization one step further. Pieces of equipment are identified within P&IDs, and users can select this equipment, access the associated time series, and access the 3D model that automatically opens to where the piece of equipment is located. This is the difference between linking metadata and contextualizing data within the content. A maintenance planner creating a work package does not need to manually sift through associated information; the right data is at their fingertips. This deep, persistent contextualization is what transforms raw data into actionable insights for AI.
3. Reusable Data Products
When data is persisted and intelligently organized, it can be reused, enriched, and applied across many industrial AI use cases. Instead of building one-off integrations for each use case, a persistently stored and contextualized data model can serve as the foundation for numerous data products. The same contextualized time series data, maintenance records, P&IDs, and point clouds can be reused across predictive maintenance, process optimization, anomaly detection, and energy efficiency initiatives. Each use case contains a predefined view into the relevant information, structured as a data product, referencing a single source with real-time data access.

By treating data as a product, use cases can be lifted and shifted from one unit or site to the next, eliminating the rework to query data from heterogeneous source systems. Achieving scale and speed requires the simplification offered through data persistence.
Addressing the Common Objections
Across 100s of conversations with our customers, those initially preferring a federated data model have a consistent set of objections. When focusing these discussions on the end-goal, delivering meaningful and scalable value to the business, conversations often shift from reviewing a set of requirements to adopting a partnership mentality on how to maximize business impact while still satisfying the architectural needs. Here are the most common objections, and how Cognite directly addresses each:
Unnecessary storage costs
Storage costs are pennies per GB, negligible relative to the cost of compute to deliver a performant user experience . Cognite Data Fusion is highly optimized in how data is stored based on the data type. Every type of industrial data, time series, event data, files, images, or 3D models, is understood and stored in best-in-class databases. This storage optimization happens automatically for users, without any management required.
For example, the highly-optimized time series database is both read- and write-optimized, meaning that Cognite Data Fusion can store trillions of time series data points while still maintaining sub-second access to this data. As of this writing, Cognite Data Fusion is currently hosting more than 66 trillion data points for customers globally. This meticulous optimization provides the sub-second performance and interactivity necessary to implement high-value use cases without excessive storage costs.
Added system complexity
Paradoxically, persisting data with a platform like Cognite Data Fusion can reduce the overall complexity of working with data. By unifying all data types from disparate systems into a single, contextualized knowledge graph, users do not need to reconcile information across multiple, siloed systems of record. With content-rich contextualization, users can confidently and simply find the information they need without having to search within metadata associations.
Additionally, by adopting a data products approach, applications and workflows running on top of Cognite Data Fusion all reference the same data, ensuring consistency of the information. Federated data approaches have the added complexity of optimizing queries across heterogeneous sources and managing the frequency of queries to avoid overloading a source system. Cognite Data Fusion provides a system of engagement from which you can interact with all systems of record without concern.
Lower Data Quality
Many operational tasks are still executed by exporting data to spreadsheets, analyzing this information, and then using this snapshot to make decisions. This data becomes stale, and multiple versions of the spreadsheet start circulating around the organization, with some teams making decisions based on conflicting information.To reduce copying and exporting data across the organization, Cognite Data Fusion uses data workflows to ensure everyone is working from the same real-time information.
With data workflows, users can link many different processing steps into one comprehensive series of jobs, delivering more observability, providing transparency, and reducing data management efforts. Combining data workflows with an interactive user experience in tools like Industrial Canvas allows users to collaborate around the same real-time information. By using trusted data pipelines into a foundation designed for user engagement, data quality consistently improves across our customers.
The Hidden Cost of Data Federation
Data federation works well for small-scale use cases, such as certain types of predictive maintenance, where predictive models may run once an hour. When trying to scale industrial AI use cases, the limitations of data federation start to become clear:
Performance Degrades at Scale
Federated queries are fundamentally limited by network latency between the federation layer and each source system. As the volume of data requested or the number of connected sources increases, performance quickly degrades. If a federated approach is used to query data from a historian, you will quickly find the scalability limits of powering interactive applications. A field technician may be trying to complete a task in the field, waiting minutes for all of the necessary data to appear. Industrial operations teams will struggle to adopt solutions that cannot provide the information they need when they need it.
Overloading Source Systems
Relying on live queries directly to operational source systems can impose an unmanageable load on these systems, potentially impacting their primary functions. Back to the example of accessing the historian, which is not designed for many users and applications to be requesting data at the same time. If the requests are not managed appropriately, these systems can become overloaded and even crash. By persisting data in a system of engagement, data must only be called from the historian once, and users and applications can access this data through a highly optimized data store.
Limited Scope of Use Cases
Federation often means you cannot access all the data in its full context. Addressing workflows like turnaround planning requires more than data to be linked across systems of record. Maintenance planners need to be able to look at a P&ID or CAD model, see open work orders within a given unit, and create a work package that a maintenance engineer can use to execute a repair. Without content-rich contextualization, industrial users still need to manually sift through linked sources.
Additionally, data in context is critical for training and deploying industrial AI agents to execute specific tasks like troubleshooting or creating work orders. Allowing AI agents to work alongside your teams requires data persistence to provide trusted and transparent recommendations in a timely manner.
Continued Off-Line Data Duplication
The irony of avoiding "persistence" through federation is that it often leads to uncontrolled off-line data duplication. Teams, frustrated by the performance and contextual limitations of federated views, resort to manually combining data in spreadsheets or creating even more information silos. Federated data approaches struggle to offer the same level of interactive user experience, lowering the overall quality of data used to drive decisions.
Summary
At Cognite, we believe that for industrial AI to truly deliver on its promise of process optimization, reduced downtime, and increased efficiency, a robust and intelligent data persistence strategy is essential. While the challenges of cost, complexity, quality, and security are valid considerations, our approach to optimizing storage, unifying data, and ensuring data quality directly addresses and mitigates these concerns.
By embracing the upside of persisting data, Cognite empowers industrial organizations to unlock the full potential of their operational data, enabling a new generation of powerful, contextualized, and performant industrial AI solutions that drive tangible business value.