2 min Analytics

Databricks unifies data engineering work through LakeFlow

Databricks unifies data engineering work through LakeFlow

The new product promises to simplify building reliable data pipelines.

LakeFlow is one of the solutions Databricks unveils at the Data + AI Summit. According to the company, LakeFlow is an essential innovation for data engineering as it remains a challenging and complex discipline. Data teams must manage data ingestion through siloed and proprietary systems. Think databases and enterprise apps. They often need to build complex and fragile connectors to arrange this properly. In addition, data preparation involves work to maintain complex logic. With latency spikes, disruption can occur, ultimately resulting in dissatisfied customers. Moreover, deploying pipelines and monitoring data quality often requires additional tools.

LakeFlow should address all of these challenges. It provides a single environment within Databricks’ Data Intelligence Platform, with integration set up with Unity Catalog for governance.

Tip: Unity Catalog is now open source software

Connect, Pipelines and Jobs

Databricks envisions a better future for engineers by first addressing the ingestion process. This is done through LakeFlow Connect, which offers a range of scalable connectors for databases such as MySQL, Postgres, SQL Server and Oracle. Connectors are also available for enterprise software: Salesforce, Microsoft Dynamics, SharePoint, Workday and NetSuite. With this technology, Arcion, acquired last year for nearly 100 million euros, also comes into its own. Arcion ensures low latency and high efficiency. LakeFlow Connect aims to make all data, regardless of size, format or location, available for batch and real-time analysis.

Another component that unifies engineering tasks within LakeFlow is called Pipelines. This component builds on Databricks Delta Live Tables and lets data professionals implement data transformation and ETL in SQL and Python. “LakeFlow eliminates the need for manual orchestration and unifies batch and stream processing. It offers incremental data processing for optimal price/performance. LakeFlow Pipelines makes even the most complex of streaming and batch data transformations simple to build and easy to operate,” Databricks said during the announcement.

Finally, Databricks has looked at how the new solutions can orchestrate workflows in the Data Intelligence Platform. One accomplishes this through LakeFlow Jobs, a component that provides observability to detect, diagnose, and mitigate data issues to improve pipeline reliability. LakeFlow Jobs automates the deployment, orchestration, and monitoring of pipelines in a single environment.

Tip: Databricks solidifies Mosaic AI as a foundation for building AI apps