2 min

Databricks, the company behind Apache Spark, aims to unite data, engineering and people. It wants to do this by defining standards for various processes, including distributed machine learning training, implementations and deployment. That’s what ZDNet‘s interview with CEO Matei Zaharia shows.

Much of this work Databricks wants to do with his own creation called MLFlow. This is a toolkit that should help to standardise the process of developing machine learning applications and to move them to production. According to Zaharia, however, everything starts with data engineering.

“In about 80 percent of the use cases, the ultimate goal of people is to use data science of machine learning. But to do this, you need a pipeline that can reliably collect data over a longer period of time. Both are important, but you need data engineering to do the rest. We focus on users with large volumes, which is more challenging. If you use Spark for distributed processing, you have a lot of data.”

However, this often also means that the data comes from various sources. Now Spark and Data – the cloud platform of Databricks built on Spark – support all reading and writing to a large number of data sources. But Databricks now wants to go one step further, by unifying different frameworks for machine learning from the lab to production via MLFlow.

It also builds a standard framework for data and execution via Project Hydrogen. This means that the data and the execution are united, different ML-frameworks data can be exchanged and the training and the interference process are standardized.

MLFlow

The goal of MLFlow is to provide support in following up experiments, sharing and reusing projects and developing production models. Not only will it be possible to deploy ML models on Spark and Delta, but MLFlow can also export them as REST services that can be run on any platform, or on Kubernetes. Cloud environments are also supported. It now concerns AWS SageMaker and Azure ML.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.