Databricks releases data sharing, catalog, and automated pipelines

Get a free Techzine subscription!

Databricks’ annual conference changed names from “Spark Summit” to “Spark + AI Summit” and finally to “Data + AI Summit”, mirroring the company’s transition from the ‘Spark company’ to the ‘AI on Spark company’ and finally to what’s essentially a ‘Delta Lakehouse’ company.

Databricks has announced that it is rolling out a new project named Delta Sharing. The project is a proprietary SQL-based data pipeline platform that Databricks named Delta Live Tables. It also included a new proprietary Unity Catalog for data cataloging.

Databricks has been busy

According to a brief given by Databricks’ CEO Ali Ghodsi, there is a lot of technical detail showing how the company has been spending the $1 billion it raised in February to expand its offering. The Databricks Unified Data Analytics Platform can run on all three major public clouds and comes with features for governance, SQL analytics, MLOps, data engineering & data science, data sharing, and pipelines.

The company built all this on top of an ACID-compliant data lake that is decked out with an optimized query engine.

The highlights

The most impactful announcements would have to be:

  • Delta Sharing- This open standard for sharing files in Parquet and Delta Lakes formats is separate from the platform on which the data resides. It comes with built-in controls that make it easy to manage permissions, among other capabilities.
  • Delta Live Tables- Think of it as a system for ETL (extract, transform and load) pipelines, with some twists along the way.
  • The Unity Catalog (underpinned by Delta Sharing)

Databricks is trying its hardest to promote its data lakehouse model and is building a platform to support that goal.