AWS has released Amazon EMR Studio. The development environment allows data scientists to build applications in the R, Python, Scala and PySpark languages with direct integration to Amazon EMR.

EMR Studio was announced as a preview version at the Amazon re:Invent event in 2020. With the IDE, Amazon is targeting data scientists and data engineers. They can use the IDE to develop, visualise and debug applications in the aforementioned programming languages.

Jupiter Notebooks

To simplify debugging, Amazon makes use of Jupiter Notebooks. This is an open-source web application that allows live documents to be shared with code, equations, visualisations and text, similar to how Google Docs works. Furthermore, tools such as Spark UI and YARN Timeline Service should further simplify debugging. The code written within Jupiter Notebook can be run directly on Amazon EMR within Amazon EC2 or Amazon EKS.

New features based on feedback

Based on feedback from the preview users, Amazon added some new features to EMR Studio. It’s possible to use the EMR-Console, AWS CloudFormation or the AWS CLI to create a new instance of EMR Studio. The EMR Console guides the user through a number of steps to easily set up access management and assign users to groups to an EMR Studio. In the user interface, the configurations can be viewed again and deleted if necessary. In AWS CloudFormations, the creation of Studio instances can even be automated based on a template. Support for authentication with Microsoft Active Directory has also been added.

Templates

Another new feature is that administrators can now limit the parameters available in cluster templates. When a user wants to create a cluster based on the template, only the specified parameters can be changed. Furthermore, Amazon has added some notebook examples to make it easier to build data science applications in EMR Studio. These include examples for PySpark code for querying a Hive metastore and Python code for visualisation. Users can copy the code into their own EMR Studio workspace, modify it as needed and run it from there.

Availability

Amazon EMR Studio is now available. A tutorial to get started with the IDE is available on the Amazon website.

Tip: AWS adds Python to CodeGuru and aggressively slashes prices