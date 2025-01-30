The new AI model Tülu 3 405B is a new version of Tülu 3, released in November. According to developer Ai2, the product beats DeepSeek thanks to post-training recipes.

Shortly after the release of DeepSeek, the battle in the AI world is picking up. We already saw Alibaba come out with a model that would also outperform DeepSeek. Now it is Ai2’s turn with Tülu 3 405B, which conducted research stress tests on its Reinforcement Learning from Verifiable Rewards (RLVR) approach and training infrastructure. The Reinforcement Learning method reinforces the model’s specific skills.

RLVR is a component of the post-training recipe that Ai2 employs. This further includes the following:

Careful data acquisition and synthesis, focusing on core skills

Supervised fine-tuning (SFT) on a carefully selected mix of prompts and their completions

Direct Preference Optimization (DPO) on both off- and on-policy preference data

A standardized evaluation suite for the development, decontamination and final evaluation phases

Where does claim come from?

DeepSeek quickly attracted attention with an open-source approach and by operating on cheaper hardware. At the same time, performance is good, as benchmarks show. It is precisely with benchmarks that Ai2 is now showing it could beat DeepSeek. Especially in PopQA (for factual information from its own knowledge), GSM8K (for computational skills), and HumanEval+ (for code generation capabilities), Tülu 3 scores well. Below is the comparison chart.

However, in some tests, DeepSeek performs better, while Llama 3.1 and GPT-4o also do well. For example, the BigBenchHard and MATH benchmarks state that DeepSeek is better at reasoning and math, respectively.