AWS launches a new data preparation service for machine learning

Named SageMaker Data Wrangler, Amazon’s new service makes it easy for data scientists to prepare data for machine learning training. The company also launched SageMaker Feature Store, available in SageMaker Studio, a relatively new service.

With it, one can name, find, organize, and share machine learning features.

Amazon is also planning to launch Sagemaker Pipelines, a new service that integrates with the platform. It will bring a CI/CD service for machine learning to create and automate workflows and create an audit trail for model components like data configurations and training.

Infrastructure won’t be a problem for too long

AWS’ CEO Andy Jassy said in his keynote at the company’s re:Invent conference that data preparation remains one of the significant problems in the machine learning industry. Typically, users have to write their queries and the code to get the data from the data store.

Then, they have to write the queries to transform the code and then combine features to get the desired outcome.

All this work does not have anything to do with building the models but has everything to do with the infrastructure used to create the models. With inefficiencies like this, it becomes harder to get things done on time.

Making modeling easier

Data Wrangler has more than 300 pre-configured data transformation built-in for users to deploy in converting the column types or input missing data with mean or median values.

There are also built-in visualization tools that can help identify potential errors and tools to check if there are inconsistencies in the data before deploying the model.

All the workflows can be saved in a notebook or as a script for teams to replicate. With the introduction of SageMaker Pipelines, users can automate the rest of the workflow.

Tip: Amazon introduces translation tool for SQL commands

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Whitepapers

Stay tuned, subscribe!

Domain-specific AI beats general models in business applications

Ingram Micro slowly gets back on its feet after ransomware attack

HPE OpsRamp plays a very important role in the platform

The AI wave is forcing organizations to rethink their infrastructure

EUVD security database is Europe’s next step towards autonomy

Dutch government starts consultation for NIS2 bill

NIS2: law lacks future-proof ideas, challenging ambitions and recovery

NIS2 compliance is the beginning, better security the goal

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon