Technology vendors love to talk about their platform’s ability to provide a so-called “single source of truth” in live system operations. Because the enterprise might typically run several core platforms across a choice of cloud service provider hyperscalers – and because that same enterprise might also run a handful of databases serving potentially hundreds (in some cases thousands) of applications, fragmented data landscapes are a simple fact of life. In a universe of variegated data, data silos are always with us. So how did we get into this mess and what can we do about it?
Saket Saurabh, CEO and co-founder of enterprise-grade data integration platform company Nexla reminds us that technology did all begin with a single source of truth – the mainframe. But as IDC research suggests, enterprise data is now growing at 42.2% annually, with organisations managing an average of 347.56 terabytes of data across seven or more siloed systems.
Centralised data management failures
“Every application, database, filesystem and SaaS service inevitably creates another data silo. From Hadoop-based data Lakes to modern data warehouses and lakehouses, enterprises have invested millions in the promise of a single source of truth,” said Saurabh. “Yet time and again, these grand visions have fallen short. So why does centralised data management repeatedly fail? Every few years, a dominant platform – Hadoop, data lakes, warehouses, lakehouses – all proposes the same promise i..e put all your data in [insert latest technology] and enjoy an end to data silos.”
But is this goal even realistic? More importantly, is it necessary? Making more copies of source data in a central system becomes a governance nightmare. MIT Sloan Management Review found that 78% of data professionals report that data governance becomes exponentially more complex with each additional copy of data maintained across the enterprise.
To understand the problem, Saurabh says we need to step away from data for a moment. He asks us to imagine a company in its early days with a single office. Collaboration is easy, meetings happen naturally and alignment is effortless. But as the company scales, new offices emerge across different locations and time zones. The dream of keeping everyone in one building becomes impractical. Instead of creating a centralized office, companies often attempt to reduce friction with tools like video conferencing and Slack… but these very often end up creating data silos. So could connectivity and accessibility hold the answer?
Copy of a Copy of a Copy
“Centralising data means creating new copies in a centralized location. While storage costs seem negligible, the operational challenges multiply rapidly as copies proliferate. These copies quickly fall out of sync with the source of truth, creating a cascading impact on an unknown number of consumers,” explained Saurabh. “The data management industry has attempted to address these challenges through catalogues and lineage tools, but this approach treats symptoms rather than the underlying problem. It certainly begs the question if there is a better approach.”
What does the Nexla chief think we should do next then?
Instead of chasing the utopia of “eliminating silos”, Saurabh sasys a more practical approach embraces their existence while ensuring access and usability. Saurabh lays down three core truths for us as follows:
#1 Accept data silos as inevitable
Every new system – whether an application, SaaS tool, or internal service – creates data. If teams are to operate efficiently, they must be free to choose the best tools for their work. Each new tool generates a data silo… and (says Saurabh) that’s okay.
#2 Centralise on the 20%
“Trying to centralise 100% of data is a losing battle. However, centralising key assets – roughly the 20% that truly drives business decisions – makes sense. The McKinsey Global Institute found that following this 80/20 rule in data management yields an average 3.2x ROI compared to attempts at full centralization. With modern data integration platforms, organizations can build pipelines that bring essential data into a unified view without forcing everything into a single repository,” said Saurabh.
#3 Access the remaining 80%
Instead of force-fitting all data into a central system, the suggestion here is that organisations should use “data products” to simplify discoverability and accessibility. Forrester Research reports that organisations implementing virtual data products achieve 47% faster time-to-insight and 35% reduction in data integration costs compared to traditional centralisation approaches.
Saurabh says that virtual data products support two approaches:
- For low to medium volume data, keep the data in place and enable real-time access across silos. This is done via virtual data products that act as a gateway to any system.
- For high-volume data requiring central computation, connect data products to a destination store and trigger an efficient ETL or ELT pipeline that brings data together from across silos into a cloud warehouse or lakehouse.
“When accessing data from a silo, three steps need to be performed. Ingest: to connect to the source of data, authenticate and read out the bytes; Prepare: to parse the data, infer schema and translate schema; Deliver: to document data entity, manage access control and deliver to the requirements of the data consumer,” said Saurabh. “Virtualized data products package all three into a single entity. Being virtualised, they can materialise the data at runtime depending on the needs of the consumer. For example, the same data product can provide a data API and also deliver to Iceberg Tables.”
Virtual Data Products bring additional benefits as they address the copy-of-copy-of-copy problem. New data products are not new copies of data but virtual derivatives of existing data products. Polyglot output is also important here because, being virtual means delayed materialisation of data at the time of delivery as per the needs of the data consumer. There is also a reusability element. The same data product can now be used by multiple consumers, with access control built into the data product itself.
Making data silos invisible, not extinct
“Organizations with a virtual data product approach significantly reduce data-related compliance incidents compared to those pursuing comprehensive data centralisation. The key insight is that connectivity and accessibility matter more than physical location. At Nexla, we believe in a practical, scalable approach: centralising what’s necessary, enabling access to the rest, and reducing friction so businesses can move faster. Because in the end, it’s not about where your data lives – it’s about how easily you can use it,” concluded Saurabh.
All said and done then, the story here leads us to the thought that the pursuit of “no data silos” is a distraction from what really matters, which is frictionless access to the right data at the right time. Instead of forcing all data into one place, businesses should focus on integrating intelligently. The future of data isn’t about eliminating silos; it’s about making them invisible to the people who need the data.