Thinking Machines wants to make AI more predictable

Thinking Machines Lab, founded by former OpenAI executive Mira Murati, wants to solve a persistent problem: the unpredictability of AI models.

In its first blog post, the lab revealed how it wants to combat randomness in AI responses. Researcher Horace He argues that better control over GPU processes could be the key. This would result in more reliable AI for science, business, and training techniques.

Until now, the cause of this inconsistency was usually sought in floating point rounding errors and parallel calculations on GPUs. Because additions with floating point numbers are not associative, the order of calculations can cause small differences. Combined with the fact that GPUs execute thousands of threads in parallel and the order of execution is not always the same, this seemed to be the logical explanation. However, the new research shows that this picture is not entirely accurate. Many GPU kernels do indeed deliver bit-identical results when executed multiple times with the same input.

The real culprit appears to be the lack of batch invariance. This means that the outcome of a calculation for a single input can change depending on the batch size in which that input is processed or the number of other requests running simultaneously on the server. Three core components of transformer architectures appear to be sensitive: RMSNorm, matrix multiplication, and attention. The way these operations are optimized for performance means that the calculation order can change with different batch sizes, which in turn leads to minor rounding differences that ultimately become visible in the output.

Minor delay

The Thinking Machines Lab has rewritten these operations to make them batch-invariant. This means that the reductions and additions always take place in the same order, regardless of the batch size or server load. This eliminates the minor numerical differences, making the results truly deterministic. Experiments showed that a thousand repetitions of the same prompt without batch invariance yielded eighty different answers. With the new approach, all thousand runs gave exactly the same result. The price paid for this is a moderate delay in performance, often between twenty and fifty percent, but the researchers emphasize that this is acceptable in practice.

According to Thinking Machines, this is more than a technical detail. For research, it means that experiments become more reproducible. For companies, it makes debugging and testing easier and more reliable. In reinforcement learning, it is even called a breakthrough, because training and sampling can now deliver bit-identical results and thus truly proceed on-policy.

Thinking Machines Lab presented this work as the first contribution in a new blog series called Connectionism. The company aims to share more publications, code, and research results to foster an open research culture. Thinking Machines has now raised $2 billion in seed funding and has managed to attract a team of former OpenAI researchers.

The company is developing its first product, which will cater to researchers and startups seeking to adapt or customize their models. Whether the batch invariance techniques will be incorporated directly into this product has not yet been confirmed. Still, the vision is clear: AI must not only be powerful, but also consistent and reliable.

Nadella positions Microsoft as the AI do-it-all company

In his annual letter to Microsoft staff, investors and customers, Satya Nadella predictably zeroes in on AI. ...

Erik van Klinken 3 days ago

Major Microsoft Copilot Fall update: what’s interesting?

Microsoft has announced the Fall 2025 Update for Copilot, featuring 12 new features. The AI assistant is set ...

Berry Zwets 3 days ago

Top story

Pax8 wants MSP’s to become MIP’s: what does that mean?

Cybersecurity and AI play key role in MSP transformation

Sander Almekinders October 21, 2025

Expert Talks

Tech calendar

Thinking Machines wants to make AI more predictable

Minor delay

Stay tuned, subscribe!

Salesforce enters ITSM market, will ServiceNow regret its CRM move?

Pax8 wants MSP’s to become MIP’s: what does that mean?

Veeam increases data resilience with measurable and reliable security

Axis Communications builds cyber wall around popular IoT devices

Nutanix CTO explains their VMware alternative and multi-cloud strategy

How VMware VCF 9 and Tanzu simplify enterprise automation

Oracle Database @ AWS: best of both worlds?

SAP's AI migration tools from ECC to S/4HANA: faster and cheaper ERP transitions

Three Ways Secure Modern Networks Unlock the True Power of AI

How to Safeguard and Prepare Exchange Server against Natural Disasters?

Minimizing liability is not the same as security: Lessons learned from Collin’s Aerospace cyberattack

Synology Solution Day 2025

Dell Technologies Forum

BrickCon The Databricks Community Conference

Appdevcon

Webdevcon

Dutch PHP Conference

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices