Defeating Nondeterminism in LLM Inference
Published on: September 11, 2025
Tags: #non-determinism #ai #llm
The Root Cause of Non-Determinism
graph TD A["User Experience:
Non-Deterministic Output
(Different results for the
same query)"] --> B{Why does this happen?}; B --> C["LLM Inference Server uses
Dynamic Batching"]; C --> D[Batch size changes constantly
based on server load]; D --> E{Core Issue: Lack of 'Batch
Invariance' in GPU Kernels}; E --> F["The kernel changes its computation strategy
(e.g., how it sums numbers)
based on batch size"]; F --> G[This changes the order of
floating-point math
operations]; G --> H["Fundamental Property:
Floating-point math is not associative
(a + b) + c ≠ a + (b + c)"]; H --> I[Tiny numerical differences
appear and accumulate]; I --> A; style E fill:#f9f,stroke:#333,stroke-width:2px
How Different Batch Sizes Lead to Different Outputs
flowchart TD subgraph "Two Identical User Requests" Req1(Request 1) Req2(Request 2) end Req1 --> Server1{Server is busy} Req2 --> Server2{Server is idle} Server1 --> Batch1[Processed in a LARGE batch] Server2 --> Batch2[Processed in a SMALL batch] Batch1 --> Kernel1[Kernel splits calculation
across many cores for
efficiency] Batch2 --> Kernel2["Kernel uses a different
(less parallel) calculation
path"] Kernel1 --> Ops1["Math Order A:
(a+b) + (c+d)"] Kernel2 --> Ops2["Math Order B:
((a+b) + c) + d"] Ops1 --> Result1[Result is 1.234567] Ops2 --> Result2[Result is 1.234568] Result1 --> Output1["Final Output A"] Result2 --> Output2["Final Output B"]
The Solution and Experimental Proof
graph TD subgraph "Problem" P[Non-Deterministic LLM
Inference] end subgraph "Proposed Solution" S[Engineer 'Batch-Invariant'
Kernels] S_Desc["This forces the kernel to use the same computation path
and math order, no matter
the batch size."] S --> S_Desc end subgraph "Experiment: 1 Prompt, 1000 Requests" Before["Before (Default Kernels)"] --> Result_Before["Result: 80 Unique Outputs
(Non-Deterministic)"] After["After (Batch-Invariant
Kernels)"] --> Result_After["Result: 1 Identical Output
(Deterministic!)"] end P --> S S --> After style Result_Before fill:#fbb,stroke:#333,stroke-width:2px style Result_After fill:#bbf,stroke:#333,stroke-width:2px