AWS SageMaker is designed for developing, training and deploying machine learning models. Amazon added more than 60 features to the solution in the past year. Investments are significant. Releases are paced at such a speed that keeping up with every new capability poses quite a challenge for current and interested users. That’s where we come in.
AWS SageMaker is one of the fastest-growing and most complex solutions in the history of AWS. Updates follow one another at a quick pace. In the past year, Amazon added more than 60 features. The organization has good reasons to value SageMaker highly. It facilitates the development of machine learning models — and such models are the future for fast innovation.
As mentioned earlier, the basics of SageMaker are anything but simple. Understanding the future that Amazon envisions with SageMaker is a daunting task. An intensive update schedule doesn’t help. Yet, understanding is essential. SageMaker plays a starring role in the future of AWS. As such, we clarify the solution’s workings by exploring the six SageMaker updates launched at AWS re:Invent.
What does SageMaker stand for?
SageMaker is an umbrella term for Amazon’s software and hardware service for developing, training, and deploying machine learning models. Each step in the development, training and deployment process has a SageMaker tool. Think about connecting data sources, structuring data, tagging data, training models and deploying them.
AWS is not the only provider of solutions for the process. In fact, in almost every part of the SageMaker stack, you encounter applications with an open-source origin. SageMaker’s differentiating factor is infrastructure. Each solution hooks directly into AWS’s computing services. Users do not pay per application but rather the computing power required per hour.
Where’s SageMaker moving towards?
Accessibility is at the heart of recent SageMaker updates. The goal is not only to strengthen technology for data collection and model development but to bring the technology to a broader user base. AWS wants to democratize machine learning.
AWS wants to democratize machine learning
SageMaker Canvas
The latter brings us to SageMaker Canvas, the first of six additions that Amazon introduced at AWS re:Invent.
Canvas was designed to make SageMaker accessible to users without technical backgrounds and give them the ability to do data science. For example, professionals in finance, marketing and human resources. The tool includes a no-code interface for the entire modelling process. It can collect, prepare and process data into machine learning models without having to write a single line of code.
The resulting models are useful for predicting a variety of issues. Trends concerning turnover, employee and customer retention are simple but realistic examples. The professionals who demand such insights usually depend on data scientists or pre-built software. With SageMaker Canvas, AWS hopes to eliminate that dependence partially. The foundation of a fully functional, predictive machine learning model can be built without traditional knowledge.
SageMaker Ground Truth Plus
Like humans, ML models reason based on experience. Training an ML model revolves around the supply of expertise in the form of data. Data must be relevant and clean for an ML model. If you put garbage in your model, your model will also put garbage out. Make sure the data put into the model for training is as perfect as it can be.
SageMaker has always made it possible to connect the data in a storage environment with ML models. In 2018, AWS expanded that capability. AWS launched SageMaker Ground Truth: a tool to make data more readable to ML models through a process called ‘labelling’.
At AWS re:Invent, AWS introduced an extension of Ground Truth. The so-called Ground Truth Plus is a service. AWS partners with specialized professionals to label your data. These professionals use Ground Truth to label an organization’s datasets as desired. The result is rapid preparation of qualitative data to train ML models with. In short: outsourcing of data preparation. An example AWS gave is a group of radiologists who can identify cancer or heart problems based on pictures or scans of the human body.
SageMaker Studio
Workflows for ML modelling can be developed in any notepad and most programming languages. Since ML projects consist of contributions from a variety of specialists, and specialists work with different languages and notepads, bringing code together is typically time-consuming. With SageMaker Studio, the third introduction announced at AWS re:Invent, AWS hopes to bridge the problem.
SageMaker Studio can be thought of as a Visual Studio Code for data science. The launch entails an integrated development environment (IDE) for developing workflows for each step of the ML modelling process. Because fast access to data sources is essential in this process, SageMaker Studio has built-in integration with data lakes in Amazon S3 and Spark, Hive and Presto in Amazon EMR. Popular frameworks for ML model development (e.g. TensorFlow, PyTorch and MXNet) are supported as well.
Compared to existing IDEs, SageMaker Studio does not bring anything groundbreaking to the table. However, the IDE centralizes the necessities for ML modelling into one environment, in an effort to make itself attractive to any developer involved in an ML project. If these developers adopt the IDE collectively, their contributions can be more easily assimilated into a coherent whole.
SageMaker Training Compiler
The fourth introduction, SageMaker Training Compiler, introduces a specialized compiler for optimizing the code that makes up deep learning models. Doing so reduces CPU consumption. Data scientists working on deep learning models regularly test multiple versions of a model to find the least CPU-intensive version. The process is tremendously time-intensive. According to Amazon, code in TensorFlow and PyTorch can be compiled with a single operation to demand less power from CPUs.
SageMaker Inference Recommender
Introduction five builds on Amazon’s bridge between infrastructure and development. SageMaker Inference Recommender advises on the optimal compute instance for an ML model. The consideration normally depends on lengthy testing periods and specialist knowledge about the relationship between ML models and infrastructure. The tool performs simulations to recommend the optimal compute instance. The simulations produce benchmarks, which are presented in SageMaker Studio. Because of the latter, SageMaker Inference Recommender can be used for both infrastructure selection and theoretical benchmarking.
SageMaker Serverless Inference
The final introduction, SageMaker Serverless Inference, also covers the topic of infrastructure. The tool keeps an eye on the moments at which an operating ML model is accessed. For example, the moment a chatbot responds to a custom message or a finance professional runs a forecast. Serverless Inference ensures that infrastructure is made available only when necessary. In any other case, an organization does not pay computing costs. Computing instances scale up and down automatically, depending on real-time requirements. Cost-effectiveness is key.