Tracking data lineage from data archaeology to digital twins

Tracking data lineage from data archaeology to digital twins

Data management must now grow and evolve a new arm that extends to data lineage control. As data streams flow through IoT machines, users, applications and of course AI agents, there is a naturally more urgent need to discover and track data provenance, the state of data in motion, which datasets interconnect to each other  and where information channels are headed next. The question is, just how deep down do we need to dig in archaeological terms to ensure we’re in control of data lineage?

Philip Dutton, CEO of Solidatus says he’s bringing a big shovel and is ready to dig.

Solidatus is a data lineage and metadata management tool that helps organisations understand, govern and manage their data by providing a visual, connected view of how data flows through systems. It enables data scientists and developers to automate processes, reduce costs, reduce (and hopefully mitigate) risks, drill down into data quality… and also regulatory compliance requirements by establishing a trusted, holistic understanding of their data’s lifecycle and origins.

In today’s regulated world, financial firms and large institutions in highly scrutinised industries face unprecedented pressure to know not just what data they use but precisely where it came from, how it has transformed and how it is governed.

Traditional tools fall short. Dutton says he recognises this fact and suggests that lineage stuck in documentation systems or buried in legacy silos simply won’t do. This is the realm and era being radically redefined by digital twins for compliance i.e. we need live, granular models of an organisation’s data estate that reshape lineage into a foundation for governance, trust and insight.

Digging into data archaeology

“At its core, data archaeology is the painstaking process of reconstructing lineage after the fact, cobbled together from logs, outdated documentation, or tribal knowledge,” explained Dutton. “It is reactive, tedious and invariably incomplete. This patchwork method fails to scale and it cannot support fast-moving AI deployments, intrusive audits, or stringent regulatory demands.”

Instead of exhuming what is already broken, he suggests that organisations now can build a living, accurate representation of how data flows, changes and impacts business systems, both historically and in real time. This is the idea behind Solidatus’s approach to essentially creating digital twins for highly complex data environments using advanced data lineage. A continuously updated model of every compute and data environment that shows lineage, ownership, sensitive data usage, transformation history and governance controls in one connected view.

“With this model in place, organisations can dynamically inventory and visualise how data moves and transforms across every system in the enterprise. They can audit and trace at fine-grained levels, right down to attributes rather than just tables or files. They can respond proactively to regulatory inquiries with confidence and speed. And, crucially, they can support AI governance and trust frameworks by showing the full lineage behind every dataset and every model,” noted Dutton.

Why this data lineage matters… now

The timing is said to be critical. Regulations like BCBS 239, DORA, the EU AI Act, ESG disclosure frameworks and GDPR demand both accountability and visibility. Institutions must now demonstrate where data came from, how it has changed, who has stewarded it and which controls are in place.

“Building a digital twin of the systems in which the data exists, turns this from an exercise in scrambling into a strategic advantage, one that can build trust with regulators, investors and stakeholders. The implications go beyond compliance. AI infrastructure depends on datasets that are trustworthy and traceable. Without a live, bi-temporal backbone for lineage and governance, scaling AI becomes reckless, as decisions are based on data whose origins and integrity are unverified. For explainable AI to move from aspiration to reality, lineage has to be transparent, dynamic and embedded,” said Dutton.

This is the space where Solidatus is focused. Its platform builds previously unattainable enterprise-wide visualisation of the data estate that map flows at a granular level and enrich them with context. Who owns the data, how it transforms, what controls apply,and what history it carries. The system is versioned and bi-temporal, so teams can reconstruct their entire data landscape at any point in time or simulate “what-if” scenarios to test resilience.

Regulatory rigour & machine learning maturity

The result (says the firm) is a platform that supports diverse use cases like regulatory compliance and reporting, AI deployment and governance, change management and operational resilience. It is not data lineage as an afterthought, but lineage engineered for regulatory rigour and accelerated AI maturity.

The combination of rising regulatory demands and the growth of AI means that traditional, reactive approaches to data are increasingly under strain. The ability to map and manage living data systems could be set to become an essential part of how enterprises build resilience and maintain trust.