Two years after the general availability of the Unity Catalog, Databricks has now chosen to make the product open source.
Many companies using Databricks’ lakehouse architecture also choose the Unity Catalog, a tool for managing their data assets. The Unity Catalog adds value to a data strategy because the solution works well with the entire Databricks ecosystem. Until now, however, it was a closed-source product: while the documentation describes the object model well, the exact implementation details were not publicly available.
Now, Databricks is opting for the open source route, a logical move given the similar steps of other Databricks technologies. “This initiative builds on Databricks’ commitment to open ecosystems, ensuring customers have the flexibility and control they need without vendor lock-in,” the company said during the announcement at the Data + AI Summit.
The project is donated to the Linux Foundation, which officially accepted it on Thursday morning during the Summit.
Tip: Databricks buys Tabular and unifies data lakehouse standards
Interoperability, openness and unified governance
Unity Catalog OSS, where OSS stands for open source software, contains three core features, according to Databricks. First, it has a universal interface supporting all data formats and compute engines. This includes reading tables with Delta Lake, Apache Iceberg, and Apache Hudi. There is also support for the Iceberg REST Catalog and Hive Metastore (HMS) interface standards.
In addition, Databricks has ensured that Unity Catalog OSS is compatible with cloud platforms Microsoft Azure, AWS, Google Cloud, and Salesforce. For the compute engines, there is interoperability with Apache Spark, Presto, Trino, DuckDB, Daft, PuppyGraph, and StarRocks. Finally, the data and AI platforms with which it is compatible are dbt Labs, Confluent, Eventual, Fivetran, Granica, Immuta, Informatica, LanceDB, LangChain, Tecton, and Unstructured.
In addition to this interoperability feature, Unity Catalog OSS has an open nature. It provides open APIs and an Apache 2.0-licensed open source server to ensure flexibility and customer choice for engines, tools and platforms. In addition, Databricks cites unified governance as a third value. Unity Catalog OSS provides unified governance functionality for tabular and non-tabular data and AI assets such as models and generative AI tools. This should enable companies to simplify management and discovery.
Techzine is attending Databricks’ Data + AI Summit this week. Keep an eye on the website for the latest developments on the company.