4 results (0.002 seconds)

CVSS: 10.0EPSS: 0%CPEs: 1EXPL: 0

19 Mar 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts. This vulnerability is fixed in 0.8.0. • https://github.com/vllm-project/vllm/commit/288ca110f68d23909728627d3100e5a8db820aa2 • CWE-502: Deserialization of Untrusted Data •

CVSS: 6.8EPSS: 0%CPEs: 1EXPL: 0

19 Mar 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server. • https://github.com/vllm-project/vllm/blob/53be4a863486d02bd96a59c674bbec23eec508f6/vllm/model_executor/guided_decoding/outlines_logits_processors.py • CWE-770: Allocation of Resources Without Limits or Throttling •

CVSS: 2.6EPSS: 0%CPEs: 1EXPL: 0

07 Feb 2025 — vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. • https://github.com/python/cpython/commit/432117cd1f59c76d97da2eaff55a7d758301dbc7 • CWE-354: Improper Validation of Integrity Check Value •

CVSS: 7.6EPSS: 0%CPEs: 1EXPL: 0

27 Jan 2025 — vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code during unpickling. This vulnerability is fixed in v0.7.0. • https://github.com/vllm-project/vllm/commit/d3d6bb13fb62da3234addf6574922a4ec0513d04 • CWE-502: Deserialization of Untrusted Data •