Technology nomenclature evolves. Perhaps nowhere more prevalently than in cloud computing and data science.
Where we once thought of operations as nothing more than the Ops element of sysadmin and DBA database administration that eventually morphed into DevOps, we now have new notions of operations management that surface at various tangents throughout the new ephemeral abstracted IT stack.
Just as MLOps is the presence of operations services aligned to serve the effectiveness of Machine Learning (ML) engines, DataOps is operations for the data pipeline i.e. it is the combined and coalesced set of practices, processes and products used by data science centric operations teams to improve the velocity, veracity and value of the data lifeblood that modern businesses run on.
The rise and popularisation of DataOps has developed in line with the wider appreciation of the so-called ‘data pipeline’ as it exists today. A term which of course expresses the journey that data takes from creation and ingestion, through parsing and preparation… and onward to analytics, reporting and extension.
Life, the universe & data
Working in this space with a mission to provide answers to life, the universe and everything is Berlin, Germany-headquartered Y42. The company offers a cloud-based DataOps service that runs on top of Snowflake (data cloud, data warehousing and data-as-a-service platform) and Big Query (Google’s fully managed data warehouse with data management and analytics spanning ML, geospatial analysis and business intelligence).
Aiming to define a point of differentiation between itself and the rest of the data platform cosmos, Y42 says its product is a combined entity formulated from both software engineering and product management best practices simultaneously. Presumably suggesting that this is agile and intelligent data management, but with a peculiarly acute sense of real world application use case requirements – a claim that would arguably hold little water in terms of justifying ‘uniqueness’ in the eyes of its competitors.
The company itself launched in 2020 offering a no-code approach for building and managing data pipelines. The next evolution of the product has been built to address the core problems businesses face around data, including accessibility, lack of governance and collaboration challenges.
So in fact, these software and product management best practices have now resulted in new features including anomaly detection, data contracts and collaboration features, to name a few, that the firm says are now baked into every step of the pipeline.
“The use case for data has moved beyond ad-hoc reporting into the very lifeblood of a company. However, data pipelines built on an ad-hoc basis are inherently brittle and inevitably break over time, leading to an overflow of fire-fighting requests and – ultimately – mistrust in business data,” said Y42 CEO and founder, Hung Dang.
Dangs says his firm’s mission is one that applies to every organisation in every business vertical, whether it has one single data engineer, data analyst, or a whole data team.
“[We want those people] to be able to deploy production-ready data pipelines efficiently and consume data in any downstream application to make better business decisions,” he said.
Better ingredients, better pizza
The Y42 DataOps cloud runs on top of Snowflake and BigQuery, acting as a kind of mission-control centre for an organisation’s data pipelines on top of their cloud data warehouse. Key features include increased accessibility to data (infrastructure/tooling) through data tools including Airbyte for integrations, dbt core for transformations and Cube Dev for headless BI.
The company also talks about its improved governance and control options. Better role-based multi-level access control to data allows organisations to have full internal and external control of every data asset across the entire data pipeline. There is also asset intelligence to provide a system of ownership for better change management and accountability.
The aforementioned data contracts function enables organisations to enforce semantic and relationship-based standards between data tables.
Also possible now is complete observability and monitoring of the whole data pipeline through anomaly detection, tests and alerts alongside column-level data lineage that helps visualise the flow of data across the pipeline. This helps identify the root causes of performance and quality issues more quickly.
To facilitate better collaboration between teams, Y42 has also addressed version control by offering a Git engine that works with local changes in the browser and on the local machine. A modular canvas provides a commenting function as well as sticky notes, text, and drawing capabilities available across the service. Lastly, for now, a data catalogue function enables users to discover data assets, definitions and metrics.
A new approach to data?
Has Y42 produced a coalesced concoction data tools, functions, platform integrations and higher-level services born out of software engineering best practices intertwined with product management excellence to create something genuinely new? In a sense yes, this a relatively valid account of an organisation reimaging its platform approach at a – if not quite radical then at least holistic and all-encompassing – new kind of level.
Is it possible to find all these services in various guises elsewhere? Inevitably the answer is always going to be yes, but the manner in which the company has fused its total offering may well be appealing to customers who already appreciate the worth (and potential fragility if not cared for properly) of their data pipeline.
Is Y42 the answer? Let’s ask Marvin the paranoid android. You know, brain the size of a planet… and this is the sort of thing you lifeforms enjoy, is it? It sounds awful, but that’s only what Marvin thinks, the rest of the known universe might love it.