
CVE-2025-48944 – vLLM Tool Schema allows DoS via Malformed pattern and type Fields
https://notcve.org/view.php?id=CVE-2025-48944
30 May 2025 — vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted. Version 0.9.0 fixes the... • https://github.com/vllm-project/vllm/pull/17623 • CWE-20: Improper Input Validation •

CVE-2025-48943 – vLLM allows clients to crash the openai server with invalid regex
https://notcve.org/view.php?id=CVE-2025-48943
30 May 2025 — vLLM is an inference and serving engine for large language models (LLMs). Version 0.8.0 up to but excluding 0.9.0 have a Denial of Service (ReDoS) that causes the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to GHSA-6qc9-v4r8-22xg/CVE-2025-48942, but for regex instead of a JSON schema. Version 0.9.0 fixes the issue. • https://github.com/vllm-project/vllm/commit/08bf7840780980c7568c573c70a6a8db94fd45ff • CWE-248: Uncaught Exception •

CVE-2025-48942 – vLLM DOS: Remotely kill vllm over http with invalid JSON schema
https://notcve.org/view.php?id=CVE-2025-48942
30 May 2025 — vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to but excluding 0.9.0, hitting the /v1/completions API with a invalid json_schema as a Guided Param kills the vllm server. This vulnerability is similar GHSA-9hcf-v7m4-6m2j/CVE-2025-48943, but for regex instead of a JSON schema. Version 0.9.0 fixes the issue. • https://github.com/vllm-project/vllm/commit/08bf7840780980c7568c573c70a6a8db94fd45ff • CWE-248: Uncaught Exception •

CVE-2025-48887 – vLLM has a Regular Expression Denial of Service (ReDoS, Exponential Complexity) Vulnerability in `pythonic_tool_parser.py`
https://notcve.org/view.php?id=CVE-2025-48887
30 May 2025 — vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Denial of Service (ReDoS) vulnerability in the file `vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py` of versions 0.6.4 up to but excluding 0.9.0. The root cause is the use of a highly complex and nested regular expression for tool call detection, which can be exploited by an attacker to cause severe performance degradation or make the service unavailable. The pattern contains multiple nested quanti... • https://github.com/vllm-project/vllm/commit/4fc1bf813ad80172c1db31264beaef7d93fe0601 • CWE-1333: Inefficient Regular Expression Complexity •

CVE-2025-46722 – vLLM has a Weakness in MultiModalHasher Image Hashing Implementation
https://notcve.org/view.php?id=CVE-2025-46722
29 May 2025 — vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializes PIL.Image.Image objects using only obj.tobytes(), which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30x100 and 100... • https://github.com/vllm-project/vllm/commit/99404f53c72965b41558aceb1bc2380875f5d848 • CWE-1023: Incomplete Comparison with Missing Factors CWE-1288: Improper Validation of Consistency within Input •

CVE-2025-46570 – vLLM’s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel
https://notcve.org/view.php?id=CVE-2025-46570
29 May 2025 — vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0. • https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f • CWE-208: Observable Timing Discrepancy •

CVE-2025-47277 – vLLM Allows Remote Code Execution via PyNcclPipe Communication Service
https://notcve.org/view.php?id=CVE-2025-47277
20 May 2025 — vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use of the `PyNcclPipe` class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through the `PyNcclCommunicator` class, while CPU-side co... • https://docs.vllm.ai/en/latest/deployment/security.html • CWE-502: Deserialization of Untrusted Data •

CVE-2025-30165 – Remote Code Execution Vulnerability in vLLM Multi-Node Cluster Configuration
https://notcve.org/view.php?id=CVE-2025-30165
06 May 2025 — vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary vLLM host. When data is received on this `SUB` socket, it is deserialized with `pickle`. This is unsafe, as it can be abused to execute code on a remote machine. • https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L295-L301 • CWE-502: Deserialization of Untrusted Data •

CVE-2025-32444 – vLLM Vulnerable to Remote Code Execution via Mooncake Integration
https://notcve.org/view.php?id=CVE-2025-32444
30 Apr 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the m... • https://github.com/vllm-project/vllm/blob/32b14baf8a1f7195ca09484de3008063569b43c5/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py#L179 • CWE-502: Deserialization of Untrusted Data •

CVE-2025-46560 – vLLM phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service
https://notcve.org/view.php?id=CVE-2025-46560
30 Apr 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicio... • https://github.com/vllm-project/vllm/blob/8cac35ba435906fb7eb07e44fe1a8c26e8744f4e/vllm/model_executor/models/phi4mm.py#L1182-L1197 • CWE-1333: Inefficient Regular Expression Complexity •