New Tülu 3 claims to beat DeepSeek

The new AI model Tülu 3 405B is a new version of Tülu 3, released in November. According to developer Ai2, the product beats DeepSeek thanks to post-training recipes.

Shortly after the release of DeepSeek, the battle in the AI world is picking up. We already saw Alibaba come out with a model that would also outperform DeepSeek. Now it is Ai2’s turn with Tülu 3 405B, which conducted research stress tests on its Reinforcement Learning from Verifiable Rewards (RLVR) approach and training infrastructure. The Reinforcement Learning method reinforces the model’s specific skills.

RLVR is a component of the post-training recipe that Ai2 employs. This further includes the following:

Careful data acquisition and synthesis, focusing on core skills
Supervised fine-tuning (SFT) on a carefully selected mix of prompts and their completions
Direct Preference Optimization (DPO) on both off- and on-policy preference data
A standardized evaluation suite for the development, decontamination and final evaluation phases

Where does claim come from?

DeepSeek quickly attracted attention with an open-source approach and by operating on cheaper hardware. At the same time, performance is good, as benchmarks show. It is precisely with benchmarks that Ai2 is now showing it could beat DeepSeek. Especially in PopQA (for factual information from its own knowledge), GSM8K (for computational skills), and HumanEval+ (for code generation capabilities), Tülu 3 scores well. Below is the comparison chart.

Tabel met prestatiebenchmarks voor verschillende modellen, waarbij de Tulu-3 405B-varianten als beste presteerders worden aangemerkt.

However, in some tests, DeepSeek performs better, while Llama 3.1 and GPT-4o also do well. For example, the BigBenchHard and MATH benchmarks state that DeepSeek is better at reasoning and math, respectively.

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Whitepapers

New Tülu 3 claims to beat DeepSeek

Where does claim come from?

Stay tuned, subscribe!

Amazon S3: almost 20 years old, but still very modern

Many roads lead to Oracle: the routes taken by VTTI and Hendrix Genetics

E-commerce solutions provider puts its own portfolio on display

Intel and Altera aim to bring AI to edge computing with new series of chips

RFID gives optimal insight and overview in both store and warehouse

Manhattan Associates provides supply chain software, is it more than a fancy name?

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon