// For flags

CVE-2024-5206

Sensitive Data Leakage in sklearn.feature_extraction.text.TfidfVectorizer in scikit-learn/scikit-learn

Severity Score

4.7
*CVSS v3

Exploit Likelihood

*EPSS

Affected Versions

*CPE

Public Exploits

0
*Multiple Sources

Exploited in Wild

-
*KEV

Decision

Track*
*SSVC
Descriptions

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.

Se identificó una vulnerabilidad de fuga de datos confidenciales en TfidfVectorizer de scikit-learn, específicamente en versiones hasta la 1.4.1.post1 incluida, que se solucionó en la versión 1.5.0. La vulnerabilidad surge del almacenamiento inesperado de todos los tokens presentes en los datos de entrenamiento dentro del atributo `stop_words_`, en lugar de almacenar solo el subconjunto de tokens necesarios para que funcione la técnica TF-IDF. Este comportamiento conduce a una posible fuga de información confidencial, ya que el atributo `stop_words_` podría contener tokens que debían descartarse y no almacenarse, como contraseñas o claves. El impacto de esta vulnerabilidad varía según la naturaleza de los datos que procesa el vectorizador.

*Credits: N/A
CVSS Scores
Attack Vector
Local
Attack Complexity
High
Privileges Required
Low
User Interaction
None
Scope
Unchanged
Confidentiality
High
Integrity
None
Availability
None
* Common Vulnerability Scoring System
SSVC
  • Decision:Track*
Exploitation
Poc
Automatable
No
Tech. Impact
Partial
* Organization's Worst-case Scenario
Timeline
  • 2024-05-22 CVE Reserved
  • 2024-06-06 CVE Published
  • 2024-06-07 EPSS Updated
  • 2024-08-01 CVE Updated
  • ---------- Exploited in Wild
  • ---------- KEV Due Date
  • ---------- First Exploit
CWE
  • CWE-921: Storage of Sensitive Data in a Mechanism without Access Control
CAPEC
Affected Vendors, Products, and Versions
Vendor Product Version Other Status
---- -