Chinese AI startup DeepSeek will release its next-generation AI model V4 with powerful coding capabilities in mid-February.
According to The Information, internal tests by DeepSeek employees have shown that V4 may outperform rivals such as Anthropic’s Claude and OpenAI’s GPT series, specifically in coding tasks.
The latest V4 model has also achieved breakthroughs in processing extremely long code prompts. This could be a significant advantage for developers working on complex software projects. The processing capacity for long contexts builds on the sparse attention technology in V3.2-Exp.
DeepSeek uses a Mixture of Experts (MoE) architecture that is more energy-efficient than classic dense models. V3 already had 671 billion parameters, with only a portion being activated per prompt.
Growing international attention
DeepSeek has attracted worldwide attention with its efficient approach. Training the R1 model reportedly cost only $294,000, significantly less than what US companies estimate for comparable models.
Nevertheless, the company is under increasing scrutiny. Reuters previously reported that the Chinese AI startup, which claimed in January to have built a low-cost alternative to ChatGPT, is being investigated in some countries for its security and privacy practices. The launch of V4 in mid-February will reveal whether DeepSeek can further strengthen its position against the established players in the AI market.
Tip: DeepSeek-V3 overcomes challenges of Mixture of Experts technique