AWS launches a new data preparation service for machine learning

Named SageMaker Data Wrangler, Amazon’s new service makes it easy for data scientists to prepare data for machine learning training. The company also launched SageMaker Feature Store, available in SageMaker Studio, a relatively new service.

With it, one can name, find, organize, and share machine learning features.

Amazon is also planning to launch Sagemaker Pipelines, a new service that integrates with the platform. It will bring a CI/CD service for machine learning to create and automate workflows and create an audit trail for model components like data configurations and training.

Infrastructure won’t be a problem for too long

AWS’ CEO Andy Jassy said in his keynote at the company’s re:Invent conference that data preparation remains one of the significant problems in the machine learning industry. Typically, users have to write their queries and the code to get the data from the data store.

Then, they have to write the queries to transform the code and then combine features to get the desired outcome.

All this work does not have anything to do with building the models but has everything to do with the infrastructure used to create the models. With inefficiencies like this, it becomes harder to get things done on time.

Making modeling easier

Data Wrangler has more than 300 pre-configured data transformation built-in for users to deploy in converting the column types or input missing data with mean or median values.

There are also built-in visualization tools that can help identify potential errors and tools to check if there are inconsistencies in the data before deploying the model.

All the workflows can be saved in a notebook or as a script for teams to replicate. With the introduction of SageMaker Pipelines, users can automate the rest of the workflow.

Tip: Amazon introduces translation tool for SQL commands

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Tech calendar

Stay tuned, subscribe!

AI only works if the infrastructure is right

HPE closes acquisition of Juniper Networks

Memory-safe malware: Rust challenges security researchers

ServiceNow aims to disrupt Salesforce with new AI-based CRM

Workday further elasticates adaptive planning

Infor Now 2023: Industry-specific ERP, now with RPA

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices