DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention
Published on: October 06, 2025
High-Level Architectural Shift
graph TD subgraph A["DeepSeek-V3.1-Terminus (Dense Attention)"] direction TB style A fill:#f9f,stroke:#333,stroke-width:2px Input1[Input Query] --> Attention1{Core Attention}; All_KV1[All Key-Value Pairs from Context] --> Attention1; Attention1 --> Output1[Output]; Complexity1["Complexity: O(L²)"] -.-> Attention1; end subgraph B["DeepSeek-V3.2-Exp (Sparse Attention)"] direction TB style B fill:#ccf,stroke:#333,stroke-width:2px Input2[Input Query] --> DSA{"DeepSeek Sparse Attention (DSA)"}; All_KV2[All Key-Value Pairs from Context] --> DSA; DSA -- "Filters to Top-K Pairs" --> Attention2{Core Attention}; Input2 --> Attention2; Attention2 --> Output2[Output]; Complexity2["Complexity: O(Lk)"] -.-> Attention2; end
Inside DeepSeek Sparse Attention (DSA)
graph TD subgraph "DSA Internal Workflow" A[Input: Query Token & Full Context] --> B[1.Lightning Indexer]; B -- "Computes relevancy scores for all tokens" --> C[2.Top-k Selector]; C -- "Selects only the most relevant k tokens" --> D[Output: Sparse Key-Value Pairs]; end D --> E{Main Attention Mechanism}; F[Original Query Token] --> E; E --> G[Final Output];
Core Innovation and Benefits
mindmap root((DeepSeek Sparse Attention)) ::icon(fa fa-lightbulb) Core Innovation: Selective Token Processing Lightning Indexer ::icon(fa fa-bolt) Rapidly scores token relevance Top-k Selector ::icon(fa fa-filter) Picks only the highest-scored tokens Problem Solved High Computational Cost of Dense Attention Complexity is O(L²) Scales poorly with long contexts Key Benefits ::icon(fa fa-rocket) Improved Efficiency New Complexity is O(Lk) Faster inference for long sequences Reduced API & compute costs Comparable Performance Maintains model quality despite sparsity
Source: