Defeating Nondeterminism in LLM Inference
Published on: 11 September 2025
Tags: #non-determinism #ai #llm
The Root Cause of Non-Determinism
graph TD
A["User Experience:
Non-Deterministic Output
(Different results for the
same query)"] --> B{Why does this happen?};
B --> C["LLM Inference Server uses
Dynamic Batching"];
C --> D[Batch size changes constantly
based on server load];
D --> E{Core Issue: Lack of 'Batch
Invariance' in GPU Kernels};
E --> F["The kernel changes its computation strategy
(e.g., how it sums numbers)
based on batch size"];
F --> G[This changes the order of
floating-point math
operations];
G --> H["Fundamental Property:
Floating-point math is not associative
(a + b) + c ≠ a + (b + c)"];
H --> I[Tiny numerical differences
appear and accumulate];
I --> A;
style E fill:#f9f,stroke:#333,stroke-width:2px
How Different Batch Sizes Lead to Different Outputs
flowchart TD
subgraph "Two Identical User Requests"
Req1(Request 1)
Req2(Request 2)
end
Req1 --> Server1{Server is busy}
Req2 --> Server2{Server is idle}
Server1 --> Batch1[Processed in a LARGE batch]
Server2 --> Batch2[Processed in a SMALL batch]
Batch1 --> Kernel1[Kernel splits calculation
across many cores for
efficiency]
Batch2 --> Kernel2["Kernel uses a different
(less parallel) calculation
path"]
Kernel1 --> Ops1["Math Order A:
(a+b) + (c+d)"]
Kernel2 --> Ops2["Math Order B:
((a+b) + c) + d"]
Ops1 --> Result1[Result is 1.234567]
Ops2 --> Result2[Result is 1.234568]
Result1 --> Output1["Final Output A"]
Result2 --> Output2["Final Output B"]
The Solution and Experimental Proof
graph TD
subgraph "Problem"
P[Non-Deterministic LLM
Inference]
end
subgraph "Proposed Solution"
S[Engineer 'Batch-Invariant'
Kernels]
S_Desc["This forces the kernel to use the same computation path
and math order, no matter
the batch size."]
S --> S_Desc
end
subgraph "Experiment: 1 Prompt, 1000 Requests"
Before["Before (Default Kernels)"] --> Result_Before["Result: 80 Unique Outputs
(Non-Deterministic)"]
After["After (Batch-Invariant
Kernels)"] --> Result_After["Result: 1 Identical Output
(Deterministic!)"]
end
P --> S
S --> After
style Result_Before fill:#fbb,stroke:#333,stroke-width:2px
style Result_After fill:#bbf,stroke:#333,stroke-width:2px
Sources: