Google has recently updated its cloud service Dataproc with new features. Dataproc users can use GPUs and various automated features with the new update.

The tech giant indicates that the new features will mainly boost machine learning projects and will facilitate the daily maintenance of these projects.

Google Dataproc

Dataproc is a cloud service that simplifies the running of Apache Spark and Apache Hadoop clusters. Tasks that would normally take hours or days can be performed in seconds or minutes, thanks to Dataproc and the new GPU’s.

Eight graphics processing units

With the new features, Google adds more efficiency to the service. For example, Dataproc users can now add GPUs to Hadoop and Spark clusters in machine learning projects. These can run AI models many times faster than a standard centralized processing unit (CPU). Users can benefit from eight Nvidia GPUs in the public cloud, including the Tesla V100 model.

Automatic cluster scaling

Apart from the GPUs, Google Dataproc users can now also use the autoscaling feature with the update. This allows the size of a cluster to be scaled up or down automatically, depending on the need. This has several advantages. For example, the feature makes it easier to deal with abrupt peaks when an application sends a large amount of data to a Spark project. In addition, engineers no longer have to manually take care of the extra infrastructure for an algorithm when a test cluster needs to be scaled up.

Chris Crosbie of Google’s cloud analytics group explains that with the autoscale feature, a cluster will automatically grow as needed to process the entire dataset and then automatically scale down when processing is complete.

Other new features

Another new feature is the ability to set a limit on how long a cluster may be inactive before Dataproc automatically deletes it. Furthermore, Dataproc users can now also automate certain tasks in SparkR. SparkR is an extension of Spark, with which R programs can be run in the framework.