2 min Devops

DeepSeek Coder V2: Chinese open source model challenges America

DeepSeek Coder V2: Chinese open source model challenges America

China’s DeepSeek is grabbing the attention with the release of DeepSeek Coder V2. The model is aimed at developers and supports 338 programming languages. Moreover, it is the first open-source model to surpass GPT-4 Turbo in benchmarks.

DeepSeek Coder wins out over GPT-4 Turbo. However, the battle between DeepSeek Coder and GPT-4 Turbo has several aspects. For one, it is simultaneously a battle between Chinese and American innovation and puts up open-source AI against a closed AI model.

Currently, the U.S. has an edge in developing AI as well-scoring AI models currently come mainly from the US. With DeepSeek, China can shake up the market. The recently released model is not only a challenger to OpenAI, but also to Anthropic with Claude 3 Opus and Google with Gemini 1.5 Pro.

Benchmarks show specialization in coding

So, in terms of performance, the model puts up noteworthy performance, but what can a developer expect? The model helps in coding tasks and knows how to deal with 338 programming languages. Thanks to the 128,000-character context window, even more complex tasks are no problem. The context window determines how many characters a user has available to give input or a command to the AI.

The Chinese model is specialized in coding and mathematics. This is evident from benchmarks conducted through MBPP+, HumanEval and Aider, which focus on evaluating code generation, editing and abilities to solve problems. Only GPT-4o scores even better on the tests. A summary of the benchmarks is shown in the chart below.

Targeted performance

Developers can put the specializations into practice to best use the model’s capacities. DeepSeek makes it possible to activate only the model’s expert parameters. These are 2.4 billion parameters for those running the smaller model, consisting of 16 billion parameters. The expanded model of 236 billion parameters can make use of 21 billion expert parameters. The model arrived at these specializations through training on datasets of six trillion tokens that consisted mainly of coding- and math-related data.

The model is available on Hugging Face. There is also the ability to link to the model via an API, although that is not free.

Also read: AI coding tools divide developers’ opinions