Databricks has announced a new Ingest platform to simplify data management for business intelligence (BI) and machine learning applications.
The provider of data science software describes its approach as a data lakehouse, referring to the unification of data lakes and data warehouses. In this way, more user applications should be supported. Originally, data warehouses and data lakes were important products for storing different types of data in order to make them ready for analytics. With Ingest, Databricks wants to bring every type of data together in one place: the data lakehouse.
According to the data science firm, the usual course of events is not ideal. Organisations break down data into traditional entities: structured data and big data. The data sets are then used separately for BI and machine learning applications. As a result, data lakes and data warehouses become separated, resulting in a slow processing process or fragmented results. In addition, the traditional approach leads to data silos.
With Ingest, data teams can load data from a variety of common business applications. Databricks has built a partner network for this purpose. For example, data from applications such as Salesforce and SAP, databases such as Oracle and MongoDB and storage services such as Amazon S3 and Google Cloud Storage can be combined in one data lakehouse. Databricks also expresses the intention to further expand the integrations. For example, integrations with Informatica and Talend are planned.
Users can also set up auto-load capabilities, so that data constantly flows into the data lakehouse without having to perform maintenance themselves. Ingest automatically stores data from various sources.
In addition, Databricks wants Ingest to work well with Delta Lake, the framework of the company that runs as a storage layer on top of data lakes and now runs on top of data lakehouses.