3 min Applications

Thinking Machines wants to make AI more predictable

Thinking Machines wants to make AI more predictable

Thinking Machines Lab, founded by former OpenAI executive Mira Murati, wants to solve a persistent problem: the unpredictability of AI models.

In its first blog post, the lab revealed how it wants to combat randomness in AI responses. Researcher Horace He argues that better control over GPU processes could be the key. This would result in more reliable AI for science, business, and training techniques.

Until now, the cause of this inconsistency was usually sought in floating point rounding errors and parallel calculations on GPUs. Because additions with floating point numbers are not associative, the order of calculations can cause small differences. Combined with the fact that GPUs execute thousands of threads in parallel and the order of execution is not always the same, this seemed to be the logical explanation. However, the new research shows that this picture is not entirely accurate. Many GPU kernels do indeed deliver bit-identical results when executed multiple times with the same input.

The real culprit appears to be the lack of batch invariance. This means that the outcome of a calculation for a single input can change depending on the batch size in which that input is processed or the number of other requests running simultaneously on the server. Three core components of transformer architectures appear to be sensitive: RMSNorm, matrix multiplication, and attention. The way these operations are optimized for performance means that the calculation order can change with different batch sizes, which in turn leads to minor rounding differences that ultimately become visible in the output.

Minor delay

The Thinking Machines Lab has rewritten these operations to make them batch-invariant. This means that the reductions and additions always take place in the same order, regardless of the batch size or server load. This eliminates the minor numerical differences, making the results truly deterministic. Experiments showed that a thousand repetitions of the same prompt without batch invariance yielded eighty different answers. With the new approach, all thousand runs gave exactly the same result. The price paid for this is a moderate delay in performance, often between twenty and fifty percent, but the researchers emphasize that this is acceptable in practice.

According to Thinking Machines, this is more than a technical detail. For research, it means that experiments become more reproducible. For companies, it makes debugging and testing easier and more reliable. In reinforcement learning, it is even called a breakthrough, because training and sampling can now deliver bit-identical results and thus truly proceed on-policy.

Thinking Machines Lab presented this work as the first contribution in a new blog series called Connectionism. The company aims to share more publications, code, and research results to foster an open research culture. Thinking Machines has now raised $2 billion in seed funding and has managed to attract a team of former OpenAI researchers.

The company is developing its first product, which will cater to researchers and startups seeking to adapt or customize their models. Whether the batch invariance techniques will be incorporated directly into this product has not yet been confirmed. Still, the vision is clear: AI must not only be powerful, but also consistent and reliable.