DeepSeek-V3 overcomes challenges of Mixture of Experts technique

DeepSeek is releasing the third version of its model as an open-source product. The model contains 671 billion parameters but doesn’t deploy them all simultaneously when providing responses.

DeepSeek, a Chinese AI developer, competes with commercial developers through open-source products. The company is regularly successful in doing so. Like its most recent development with DeepSeek-V3, which is available for download via Hugging Face.

The model improves upon its predecessor and surpasses Llama 3.1 405B and Qwen2.5 72B in benchmarks, particularly excelling in coding tasks and mathematical calculations. While it slightly underperforms compared to Anthropic and OpenAI models, it introduces innovative features that will contribute to future LLM development.

Mixture of Experts

DeepSeek-V3 is based on a MoE (Mixture of Experts) architecture. This is a technique that has already proven successful for other players. For example, Microsoft successfully launched the Phi-3.5 models based on this technique last summer.

The Mixture of Experts technique incorporates multiple specialized models called “experts,” each with distinct domain expertise. Based on the input query or prompt, the system communicates with the most suitable model to deliver optimal results. This provides the user with the best possible result.

More energy efficient

This approach enhances efficiency and reduces hardware requirements. Although the complete LLM contains 671 billion parameters, each individual model comprises 34 billion. This distribution makes query processing significantly more energy-efficient.

The MoE technique also provides training advantages. The model was trained on 14.8 trillion tokens over 2,788 thousand computing hours—relatively modest compared to other projects requiring tens of thousands of GPUs running for days. This training method also reduces developer costs. An aspect that still plagues OpenAI to this day.

Also read: OpenAI’s business model isn’t working as bankruptcy looms

Constraint addressed

All the efficiency of this technology comes with a downside. Previous developers ran up against the fact that data was unevenly distributed among the various “experts”. This could negatively impact the quality with which a search query is then answered.

DeepSeek claims to have developed a method to avoid these problems. This method calls it attention or attention and identifies the key elements in the sentence. While this technique isn’t new, DeepSeek’s implementation performs multiple passes to capture important details that might be missed initially. This is to identify important details that may have been overlooked on a first read.

Finally, DeepSeek-V3 deploys another trick to enable faster inference. In doing so, the model ensures that multiple tokens are always generated simultaneously. This is while other models handle tokens one by one.

Currently, the new version is offered at the same price as DeepSeek-V2. As of Feb. 8, this will change.

💰 API Pricing Update

🎉 Until Feb 8: same as V2!
🤯 From Feb 8 onwards:
Input: $0.27/million tokens ($0.07/million tokens with cache hits)
Output: $1.10/million tokens

🔥 Still the best value in the market!

🐋 3/n pic.twitter.com/OjZaB81Yrh
— DeepSeek (@deepseek_ai) December 26, 2024

Top story

Is English the next programming language? JetBrains’ CEO says no

AI evangelists like Nvidia's Jensen Huang proclaim that English will become the next programming language. Je...

Erik van Klinken July 8, 2025

Tech calendar

DeepSeek-V3 overcomes challenges of Mixture of Experts technique

Qualitative answers in an energy-efficient way

Mixture of Experts

More energy efficient

Constraint addressed

Stay tuned, subscribe!

Broadcom launches Tomahawk Ultra with 250ns network latency

Many roads lead to Oracle: the routes taken by VTTI and Hendrix Genetics

Replatforming virtualized workloads: Do your VMs need a new home?

Domain-specific AI beats general models in business applications

AI without ethics will never truly serve humanity

Snowflake makes AI Data Cloud the brain of any business

How do you roll out GenAI in enterprise environments?

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices