
CVE-2025-48887 – vLLM has a Regular Expression Denial of Service (ReDoS, Exponential Complexity) Vulnerability in `pythonic_tool_parser.py`
https://notcve.org/view.php?id=CVE-2025-48887
30 May 2025 — vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Denial of Service (ReDoS) vulnerability in the file `vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py` of versions 0.6.4 up to but excluding 0.9.0. The root cause is the use of a highly complex and nested regular expression for tool call detection, which can be exploited by an attacker to cause severe performance degradation or make the service unavailable. The pattern contains multiple nested quanti... • https://github.com/vllm-project/vllm/commit/4fc1bf813ad80172c1db31264beaef7d93fe0601 • CWE-1333: Inefficient Regular Expression Complexity •

CVE-2025-46722 – vLLM has a Weakness in MultiModalHasher Image Hashing Implementation
https://notcve.org/view.php?id=CVE-2025-46722
29 May 2025 — vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializes PIL.Image.Image objects using only obj.tobytes(), which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30x100 and 100... • https://github.com/vllm-project/vllm/commit/99404f53c72965b41558aceb1bc2380875f5d848 • CWE-1023: Incomplete Comparison with Missing Factors CWE-1288: Improper Validation of Consistency within Input •

CVE-2025-46570 – vLLM’s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel
https://notcve.org/view.php?id=CVE-2025-46570
29 May 2025 — vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0. • https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f • CWE-208: Observable Timing Discrepancy •

CVE-2025-47277 – vLLM Allows Remote Code Execution via PyNcclPipe Communication Service
https://notcve.org/view.php?id=CVE-2025-47277
20 May 2025 — vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use of the `PyNcclPipe` class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through the `PyNcclCommunicator` class, while CPU-side co... • https://docs.vllm.ai/en/latest/deployment/security.html • CWE-502: Deserialization of Untrusted Data •

CVE-2025-32444 – vLLM Vulnerable to Remote Code Execution via Mooncake Integration
https://notcve.org/view.php?id=CVE-2025-32444
30 Apr 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the m... • https://github.com/vllm-project/vllm/blob/32b14baf8a1f7195ca09484de3008063569b43c5/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py#L179 • CWE-502: Deserialization of Untrusted Data •

CVE-2025-30202 – Data exposure via ZeroMQ on multi-node vLLM deployment
https://notcve.org/view.php?id=CVE-2025-30202
30 Apr 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an XPUB ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor paralleli... • https://github.com/vllm-project/vllm/commit/a0304dc504c85f421d38ef47c64f83046a13641c • CWE-770: Allocation of Resources Without Limits or Throttling •

CVE-2025-29783 – vLLM Allows Remote Code Execution via Mooncake Integration
https://notcve.org/view.php?id=CVE-2025-29783
19 Mar 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts. This vulnerability is fixed in 0.8.0. • https://github.com/vllm-project/vllm/commit/288ca110f68d23909728627d3100e5a8db820aa2 • CWE-502: Deserialization of Untrusted Data •

CVE-2025-29770 – vLLM denial of service via outlines unbounded cache on disk
https://notcve.org/view.php?id=CVE-2025-29770
19 Mar 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server. • https://github.com/vllm-project/vllm/blob/53be4a863486d02bd96a59c674bbec23eec508f6/vllm/model_executor/guided_decoding/outlines_logits_processors.py • CWE-770: Allocation of Resources Without Limits or Throttling •

CVE-2025-25183 – vLLM using built-in hash() from Python 3.12 leads to predictable hash collisions in vLLM prefix cache
https://notcve.org/view.php?id=CVE-2025-25183
07 Feb 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. • https://github.com/python/cpython/commit/432117cd1f59c76d97da2eaff55a7d758301dbc7 • CWE-354: Improper Validation of Integrity Check Value •

CVE-2025-24357 – vLLM allows a malicious model RCE by torch.load in hf_model_weights_iterator
https://notcve.org/view.php?id=CVE-2025-24357
27 Jan 2025 — vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code during unpickling. This vulnerability is fixed in v0.7.0. • https://github.com/vllm-project/vllm/commit/d3d6bb13fb62da3234addf6574922a4ec0513d04 • CWE-502: Deserialization of Untrusted Data •