5 min Analytics

Databricks moves from lakehouse to data intelligence

Databricks moves from lakehouse to data intelligence

Databricks has been developing the lakehouse in recent years. Now that this architecture is in place, the company is talking more about the Data Intelligence Platform. What makes Databricks’ latest approach suitable for businesses?

Looking purely at the choice of the term data intelligence and what it entails, we can see that Databricks continues to follow its vision with this. However, Databricks now has more technology than when the lakehouse concept took off five years ago. Moreover, it has become possible to do more with data due to developments in generative AI and general advances around AI and BI. Databricks envisions a future where access to data is much easier for every employee and AI applications are ubiquitous. With that comes a Data Intelligence Platform.

A data platform according to Databricks

A Data Intelligence Platform is fundamentally a platform suitable for managing data by data engineers and scientists. Its foundation is the lakehouse, which is used for management. Then, it applies AI to understand the data in the architecture and realize data intelligence. So if we want to explain a Data Intelligence Platform adequately, the first thing to do is understand the lakehouse. The image below gives a rough idea of the techniques Databricks deploys for the lakehouse.

Databricks Lakehouse-platform.

It is good to know that the lakehouse technologies and frameworks are primarily open source components that unify unstructured and structured data in an architecture. Until now, those two types often required two separate architectures. Structured data has been around for a long time and comes from Excel files and databases, for example. Long-standing companies are used to housing this type of data in data warehouses.

However, unstructured data is becoming increasingly popular. For example, such data may come from video and audio files but generally cannot function as desired in a data warehouse architecture. Thus, for unstructured data, a data lake was built, which is much more suitable to provide appropriate performance.

However, with a warehouse and lake, two separate environments had been created in enterprise environments. These two environments are not necessarily always interoperable. The lakehouse capitalizes on precisely that. The best things about a warehouse and a lake should come together by storing all data in open formats. Databricks’ frameworks for this purpose ensure that employees can query all data sources. All data workloads, whether the end goal is business intelligence or artificial intelligence, reside on the same architecture.

Addressing challenges

The lakehouse concept caught on in the marketplace. Competitors of Databricks now also offer such an architecture. Organizations have also frequently implemented the lakehouse to modernize their data infrastructure. However, according to Databricks, companies face challenges with competitors’ data platforms. For example, they would not consistently deliver the correct performance, require considerable technical skills to use and manage, and not be optimally suited for large language models.

The Data Intelligence Platform should change that. It features lakehouses and AI models. These models analyze the data (content and metadata) and how the data is used (e.g., queries and reports). This allows the platform to understand an organization’s language. For example, Databricks’ architecture is widely used in healthcare and financial services. Institutions in those sectors have a lot of jargon, so standard models do not fully understand communication. Databricks’ models, however, look at data in workloads to learn how a business communicates. That way, users can use the Databricks platform with the terms they are used to from their profession.

With the additional intelligence about data, the Data Intelligence Platform should also be able to support more AI applications. It was already possible for companies with a lakehouse architecture to use any data source. However, new AI models allow the Data Intelligence Platform to deliver new insights from data, such as metrics and KPIs. Getting those insights and intelligence traditionally involves a lot of programming work, but AI models can partially take over that work.

Ultimately, Databricks’ new platform approach is primarily depicted in the image below.

Dataricks data-intelligentieplatform.

Generative AI understands data

The difference from the lakehouse architecture comes primarily from adding a Data Intelligence Engine. To add this engine to the platform, Databricks acquired MosaicML in mid-2023. A whopping $1.3 billion was paid for MosaicML, equivalent to about 1.2 billion euros—a historic moment for Databricks to pay such an amount. The amount is also remarkably high when you consider that MosaicML raised just under $64 million in all investment rounds combined, reaching a valuation of $222 million. However, Databricks wanted to go far for the technology, seen as a competitor to OpenAI. Given the popularity of generative AI and its sought-after nature, the historic acquisition is, therefore, explicable.

MosaicML has quickly put development work into further interoperability with the original Databricks technology. Out of this came the Data Intelligence Engine, also known as DatabricksIQ. DatabricksIQ also makes the lakehouse architecture a bit stronger again, for example, by automatically indexing columns and strengthening data partitioning for better query performance.

Also, the platform can automatically add descriptions and tags to all data assets in the Unity Catalog governance environment. These descriptions and tags are used to create an understanding of jargon and acronyms. This enables better semantic search and improves the quality of AI assistants.

With the Data Intelligence Platform, Databricks has further improved its architecture to enable any data application. This can be modern business intelligence reporting or a modern software application incorporating a lot of AI. By adding additional intelligence, the lakehouse has become a platform where every data workload comes into its own.

Tip: Databricks introduces Lakehouse for Healthcare and Life Sciences