Google Cloud Dataproc, a Google Cloud service for running Apache Spark and Hadoop clusters, will be released for Kubernetes. The service ensures that users of Apache Spark and Hadoop don’t have to manage their infrastructure.

Cloud Dataprox will initially be launched as an alpha version on Kubernetes. The aim of the launch is for enterprise organisations to be able to run Apache Spark workloads on Google Kubernetes Engine Clusters. This means that Dataproc users can migrate their workloads to their own data centres, because GKE is almost universally available, through Google Anthos.

Unified management

Apache Spark workloads often run on Hadoop YARN clusters. Cloud Dataproc ensures that users can manage their clusters from a single overview. In this way, it is no longer necessary to use different cluster management systems. “Supporting both YARN and Kubernetes can bring your enterprise the needed flexibility to modernize certain hybrid workloads while continuing to monitor YARN-based workloads,” says Google.

Expansions in the long term

TechCrunch reports that the service so far only supports Apache Spark, but that Google also wants to support other open-source projects in the future. “Enterprises are increasingly looking for products and services that support data processing across multiple locations and platforms,” said Matt Aslett, research vice president at 451 Research. “The launch of Cloud Dataproc on Kubernetes is significant in that it provides customers with a single control plane for deploying and managing Apache Spark jobs on Google Kubernetes Engine in both public cloud and on-premises environments.” In short, the launch of Google Cloud Dataproc is a new step towards supporting the hybrid cloud.