New AWS tool allows users to do data cleansing without coding

Get a free Techzine subscription!

DataBrew allows data scientists and data managers to visually prepare data.

Amazon Web Service has announced the general availability of AWS Glue DataBrew, a new visual data preparation tool. DataBrew enables customers to clean and normalize data without writing code.

Since 2016, data engineers have used AWS Glue to create, run, and monitor extract, transform, and load (ETL) jobs. AWS Glue provides both code-based and visual interfaces, and has dramatically simplified extracting, orchestrating, and loading data in the cloud for customers.

DataBrew is a next step enhancement to AWS Glue

Data analysts and data scientists have wanted an easier way to clean and transform this data, according to AWS. The company claims DataBrew is designed to meet that need.

DataBrew allows data exploration and experimentation directly from AWS data lakes, data warehouses, and databases. Most importantly, it does this without the user writing code. The solution offers customers over 250 pre-built transformations to automate data preparation tasks. For example, it filters anomalies, standardizes formats, and corrects invalid values. Such filtering would otherwise require days or weeks writing hand-coded transformations.

After preparing the data, customers can immediately start using it with AWS and third-party analytics. They can also use it with machine learning (ML) services to query the data and train machine learning models.

The new tool makes data cleansing accessible to all users

AWS has issued a demonstration video that shows how the capabilities of the DataBrew program. In the video, DataBrew removes special characters in a database entry which are of no use in data analysis (e.g. ampersand). 

“AWS customers are using data for analytics and machine learning at an unprecedented pace, said Raju Gulabani, VP of Database and Analytics, AWS.

“However, these customers regularly tell us that their teams spend too much time on the undifferentiated, repetitive, and mundane tasks associated with data preparation.

“Customers love the scalability and flexibility of code-based data preparation services like AWS Glue, but they could also benefit from allowing business users, data analysts, and data scientists to visually explore and experiment with data independently, without writing code. AWS Glue DataBrew features an easy-to-use visual interface that helps data analysts and data scientists of all technical levels understand, combine, clean, and transform data.”

There are no upfront commitments or costs to use AWS Glue DataBrew. Customers only pay for creating and running transformations on datasets. More information can be found at https://aws.amazon.com/glue/features/databrew.

Tip: AWS emphasises the importance of a Well Architected Framework