Cerebras partnership breathes new life into AWS Trainium

When it comes to AI workloads, inference is by far the most important. There are many ways to run AI models on a daily basis, but the most efficient method has long been elusive. AWS and Cerebras are collaborating in a way that redefines the nature of these workloads. What can ‘disaggregated’ inferencing deliver?

The clear distinction between AI training and inference is fairly straightforward. While LLMs rely on training to become functional systems, inference is how an LLM is actually deployed. Every output, in whatever form, is the result of inference. But the breakdown of AI workloads goes beyond this dichotomy. Inferencing itself also consists of two elements in Transformer models: prefill and decode. AWS and Cerebras are now also separating these two components.

Prefill and decode

AI training requires massive computing power and is often the primary reason “AI Factories” are established. Inferencing is less demanding and can run via the public cloud at a manageable cost. But AWS has discovered that Trainium, originally intended for heavy training workloads, excels in the area of prefill. Cerebras, the maker of massive “wafer-scale” AI chips, appears to excel when it comes to decoding.

Prefill involves processing the input, whether it’s a message from an end-user to a chatbot, an image, or an API call via MCP from another application. Here, computing power is the limiting factor. AWS Trainium, described elsewhere as a “disaster,” seems far removed from the performance level demanded by major AI labs. Anthropic, said to be the “only meaningful Trainium customer,” employs a multi-cloud strategy. In addition to AWS, it relies on computing power from Google Cloud and, by extension, the TPUs on that platform.

AWS Trainium therefore needs a new raison d’être. The way out appears to be AI inference. This can be seen somewhat as a downgrade in terms of objectives, as it is a less demanding workload and is often the type of workload that a former training chip runs when it no longer offers the state-of-the-art performance it once did.

Cerebras, however, offers something else: bandwidth. 21 petabytes per second (!) is reportedly the maximum throughput of the latest CS-3, equipped with 900,000 cores. A single “chip” is in fact a single “wafer,” which is normally cut up to build multiple processors. Petabyte-level speeds are only possible because connectivity within a chip is actually always many times faster than between chips, such as with separate memory modules and a GPU.

And that is precisely what AI inference requires in the second, final step. Decode, the step following prefill, revolves around generating tokens and thus the output. This is the end result: a chatbot’s response to a question, an AI-generated image, and so on.

The bottom line is a new idea

The magic word in the announced collaboration between AWS and Cerebras is “disaggregation.” That is the splitting of prefill and decode. With this combination, available in production for the first time, we can safely say that a new era for AI inference is dawning.

The technology itself isn’t out of thin air: in September, research focused on splitting prefill and decode was published. That was among different GPU vendors, but the naming of the AI chips doesn’t matter much here.

Another technical term for this phenomenon is heterogeneous parallelism, or running different types of chips for the same workload, which perform calculations simultaneously. We suspect a somewhat easier-to-remember term will emerge as other hyperscalers adopt the same methodology.

From Loss to Profit

The announcement will have to prove itself. AWS states that Anthropic and OpenAI remain committed to Trainium. That will also have to do with the billions of dollars AWS is investing in both parties.

Now, however, AWS appears to have a new plan. Trainium 4 is expected to launch in 2027, with the goal once again being to become the go-to choice for AI training among AI labs. But eventually—whether shortly after release or later—Trainium 4 will likely follow in the footsteps of Trainium 3 and be utilized in a similar partnership with Cerebras chips.

Even more AI processors could follow this paradigm. This goes beyond benchmarks and leverages the AI capacity currently being built out for the long term. In this regard, AWS and Cerebras have positioned themselves for the future.

Panasonic Toughbook 56: a rugged AI PC for most conditions

Panasonic has announced the Toughbook 56, a rugged 14-inch laptop designed for field workers in defence, util...

Erik van Klinken March 12, 2026

Review

Review ASUS NUC 15 Pro: brings computing power to impossible places

We received an ASUS NUC 15 Pro, a very small mini PC that delivers sufficient computing power for everyday ta...

Coen van Eenbergen February 20, 2026

Top story

ClickShare combines MDEP with ease of use for video conferencing

Organizations are geared up for video conferencing, but their solutions aren't always mature. ClickShare, par...

Erik van Klinken February 17, 2026

Red Hat powers Panasonic Toughbook 56 with a rugged OS

During the unveiling of the latest Panasonic Toughbook in Stockholm, Windows 11 is running on its display uni...

Erik van Klinken 3 days ago

Expert Talks

Tech calendar

Cerebras partnership breathes new life into AWS Trainium

Prefill and decode

The bottom line is a new idea

From Loss to Profit

Stay tuned, subscribe!

The RAMpocalypse is a warning for stricter performance KPIs

Yenlo is evolving into a scalable integration partner

ASML to build large new campus in Eindhoven

Salesforce reveals its own Agentic IT Service Platform

Why SAP says best-of-breed software era is over

"Not all clouds are created equal" in the AI era: how is OCI different?

Workday Rising EMEA: platform transformation: Pipedream, AI agents and sovereignty

The Zero-Drift Frontier: Modern Edge Demands on Kubernetes

When is an SBOM not an SBOM? CISA’s Minimum Elements

Sovereign: the new normal for AI and cloud native (and how to make it work)

De IT Afdeling van de toekomst

GITEX ASIA 2026

GITEX ASIA 2026

Southeast Asia AI Application Summit 2026

SAS Innovate 2026

Team '26

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices