vLLM: temperature=NaN and temperature=Infinity bypass validation and propagate to GPU kernels
🔗 CVE IDs covered (1)
📋 Description
Summary
All temperature validation gates use comparison operators (<, >), which silently evaluate to False for NaN and for positive Infinity in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. Note: -Infinity is correctly caught.
Root Cause
sampling_params.py:384:
if 0 < self.temperature < _MAX_TEMP: # NaN → False; +Inf → False
sampling_params.py:462:
if self.temperature < 0.0: # NaN → False; +Inf → False
raise VLLMValidationError(...)
No math.isnan() or math.isinf() check exists anywhere in sampling_params.py.
Python semantics (verified): float('nan') < 0.0 → False, float('inf') < 0.0 → False.
Impact
Crash of inference worker on GPU kernel execution with NaN/Inf softmax input, degrading service for all concurrent users.
Remediation
Add math.isfinite(self.temperature) check in _verify_args(). Reject non-finite float values with a 400 error.
Fix
A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/45116
🎯 Affected products1
- pip/vllm:<= 0.23.0