DeepSeek launches V3.2-Exp with breakthrough in sparse attention

DeepSeek introduces its experimental V3.2-Exp model with sparse attention technology. The innovation promises to process long texts much more efficiently, while maintaining virtually identical output quality to the previous V3.1-Terminus model.

Chinese AI company DeepSeek has launched V3.2-Exp, an intermediate step towards its next-generation architecture. The experimental version builds on the V3.1-Terminus model, introducing DeepSeek Sparse Attention (DSA). This sparse attention technology is expected to improve training and inference in long contexts significantly.

V3.2-Exp is immediately available to developers through various platforms. HuggingFace provides access to the model, while vLLM offers day-0 support. The model works on various hardware configurations, from Nvidia H200 to AMD chips.

For developers who want to run locally, DeepSeek has made inference code available. The conversion process from HuggingFace model weights to local use does require adjustments for GPU configuration and expert settings.

Sparse attention as a breakthrough

The core of the update lies in the sparse attention mechanism. This technology selects only relevant parts of long texts for processing, drastically reducing the computing power required. Traditional attention mechanisms view each word in relation to all other words, which requires exponentially more computing power for long texts.

According to DeepSeek, DSA achieves “fine-grained sparse attention” for the first time. The system maintains model quality while substantially improving efficiency in long contexts. For developers, this means faster training and cheaper inference for extensive documents.

Benchmark performance

DeepSeek has thoroughly tested V3.2-Exp against the earlier V3.1-Terminus model. On benchmarks such as MMLU-Pro, both models score identically with 85.0 points. On programming challenges such as Codeforces, V3.2-Exp even performs slightly better with 2121 points versus 2046 for V3.1-Terminus. The company states that it deliberately used identical training configurations to enable a fair comparison.

DeepSeek has also released open-source kernels. TileLang offers kernels for research purposes, while DeepGEMM and FlashMLA provide high-performance CUDA kernels for production use. These tools are designed to help developers maximize their use of sparse attention.

The V3.2-Exp model operates under an MIT license, allowing for both commercial and academic use. For organizations working with lengthy documents, sparse attention technology can lead to a significant improvement in efficiency.

Databricks and OpenAI collaborate on enterprise AI models

The collaboration between Databricks and OpenAI makes GPT-5 available to 20,000+ enterprise customers. Dat...

Berry Zwets September 25, 2025

Snowflake launches Open Semantic Interchange to combat AI chaos

Snowflake, together with partners, is introducing the open-source initiative 'Open Semantic Interchange' (OSI...

Berry Zwets September 24, 2025

Top story

How important is data analytics in cycling?

In the high-stakes world of professional cycling, marginal gains can spell the difference between victory and...

Berry Zwets September 19, 2025

Top story

Qualcomm’s vision: you’re the maestro, AI is your ensemble

The most personal technology ever

Coen van Eenbergen 2 days ago

Expert Talks

Tech calendar

Save the Data

October 1, 2025 Kasteel Woerden

Whitepapers

Enhance your data protection strategy for 2025

The Data Protection Guide 2025 explores the essential strategies and...

DeepSeek launches V3.2-Exp with breakthrough in sparse attention

Sparse attention as a breakthrough

Benchmark performance

Stay tuned, subscribe!

Cisco Sovereign Critical Infrastructure gives customers complete control over infrastructure

EU investigates SAP maintenance and support practices

Qualcomm’s vision: you’re the maestro, AI is your ensemble

ServiceNow goes after the mid-market with its AI-based Core Business Suite

Nutanix CTO explains their VMware alternative and multi-cloud strategy

Slack is evolving into a work operating system

Infor's industry-specific ERP strategy and Velocity Suite deep dive

The AI productivity mirage: why leaders are aiming at the wrong target

Meeting future workload demands: the case for emerging memory technologies

How AI and automation are redefining ROI in the enterprise

Enhancing video encoding: The AV1 support in the new ARTPEC-9 System-on-Chip

Save the Data

National 6G Conference

Innovation Week 2025

Luxembourg Venture Days

Dell Technologies Forum

BrickCon The Databricks Community Conference

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices