CVE-2025-38472

netfilter: nf_conntrack: fix crash due to removal of uninitialised entry

Severity Score

5.5

*CVSS v3

Exploit Likelihood

*EPSS

Affected Versions

*CPE

Public Exploits

*Multiple Sources

Exploited in Wild

*KEV

Decision

*SSVC

Descriptions

In the Linux kernel, the following vulnerability has been resolved: netfilter: nf_conntrack: fix crash due to removal of uninitialised entry A crash in conntrack was reported while trying to unlink the conntrack
entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in
a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore
ct->status and pretend its 0, the entry matches those that are newly
allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED,
__nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x.
E is now re-inited on cpu z. cpu y was preempted before
checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as
expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash:
cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink
E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence:
ct refcnt remains at 1, eventually skb will be free'd and E gets
destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table
insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen
before hlist add resp. the timeout fixup, so switch to set_bit and
before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right
before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation"
case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an
expiry check. After this patch, once the confirmed bit check pas
---truncated---

En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: netfilter: nf_conntrack: corrección de fallo debido a la eliminación de una entrada no inicializada Se informó de un fallo en conntrack al intentar desvincular la entrada de conntrack de la lista de cubos hash: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete en ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired en ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get en ffffffffc124efbc [nf_conntrack] [..] La estructura nf_conn está marcada como asignada desde slab, pero parece estar parcialmente inicializada: el puntero hlist ct es basura; parece el valor hash ct (de ahí el fallo). ct->status es igual a IPS_CONFIRMED|IPS_DYING, que es lo esperado ct->timeout es 30000 (=30 s), lo cual es inesperado. Todo lo demás parece una entrada conntrack udp normal. Si ignoramos ct->status y suponemos que es 0, la entrada coincide con las que se acaban de asignar pero que aún no se han insertado en el hash: - los punteros hlist ct están sobrecargados y almacenan/cachean el hash de la tupla sin procesar - ct->timeout coincide con el tiempo relativo esperado para un nuevo flujo udp en lugar del valor absoluto de 'jiffies'. Si no fuera por la presencia de IPS_CONFIRMED, __nf_conntrack_find_get() habría omitido la entrada. La teoría es que alcanzamos la siguiente ejecución: cpu x cpu y cpu z encontró la entrada E encontró la entrada EE está vencida nf_ct_delete() devuelve E a rcu slab init_conntrack E se reinicia, ct->status establecido en 0 respuesta tuplehash hnnode.pprev almacena el valor hash. cpu y encontró E justo antes de que se eliminara en la cpu x. E ahora se reinicia en la cpu z. La cpu y fue interrumpida antes de verificar la expiración y/o el bit de confirmación. ->refcnt establecido en 1 E ahora es propiedad de skb ->timeout establecido en 30000 Si la cpu y se reanudara ahora, observaría que E ha expirado, pero omitiría E debido a que falta el bit CONFIRMED. nf_conntrack_confirm se llama establece: ct->status |= CONFIRMED Esto es incorrecto: E aún no se agregó a la tabla hash. La CPU y se reanuda, observa que E ha expirado pero CONFIRMADO: nf_ct_expired() -> sí (ct->el tiempo de espera es de 30 s) bit confirmado establecido. La CPU y intentará eliminar E de la tabla hash: nf_ct_delete() -> establecer bit MORIR __nf_ct_delete_from_lists Incluso este escenario no garantiza un fallo: la CPU z aún mantiene el/los bloqueo(s) del depósito de la tabla, por lo que y bloquea: esperar a que z mantenga el bloqueo de giro CONFIRMADO está establecido, pero no hay garantía de que ct se agregue al hash: la lógica "chaintoolong" o "clash resolution" omiten el paso de inserción. responder hnnode.pprev aún almacena el valor del hash. desbloquea el bloqueo de giro devolver NF_DROP En caso de que la CPU z inserte la entrada en la tabla hash, la CPU y desvinculará E nuevamente de inmediato, pero no ocurre ningún fallo. Sin la ejecución de la CPU y, la lista de memoria basura no tiene importancia: ct refcnt permanece en 1, skb se liberará y E se destruirá mediante nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. Para resolver esto, mueva la asignación IPS_CONFIRMED después de la inserción de la tabla, pero antes del desbloqueo. Pablo señala que el almacenamiento de bits de confirmación podría reordenarse para que ocurra antes de la adición de la lista de memoria o de la corrección del tiempo de espera, por lo que se debe cambiar a set_bit y a la barrera de memoria before_atomic para evitarlo. No importa si otras CPU pueden observar una entrada recién insertada justo antes de que se establezca el bit CONFIRMED: este evento no se distingue del caso anterior, "E es la encarnación anterior": la entrada se omitirá. También modifique nf_ct_should_gc() para que primero verifique el bit confirmado. La secuencia de gc es: 1. Verificar si la entrada ---truncado---

In the Linux kernel, the following vulnerability has been resolved: netfilter: nf_conntrack: fix crash due to removal of uninitialised entry A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check pas ---truncated---

Several vulnerabilities have been discovered in the Linux kernel that may lead to a privilege escalation, denial of service or information leaks. For the stable distribution (trixie), these problems have been fixed in version 6.12.41-1.

*Credits: N/A

Attack Vector

Local

Attack Complexity

Low

Privileges Required

Low

User Interaction

None

Scope

Unchanged

Confidentiality

None

Integrity

None

Availability

High

Attack Vector

Local

Attack Complexity

Low

Authentication

None

Confidentiality

None

Integrity

None

Availability

Complete

* Common Vulnerability Scoring System

SSVC

Decision:-

Exploitation

Automatable

Tech. Impact

* Organization's Worst-case Scenario

Timeline

2025-04-16 CVE Reserved
2025-07-28 CVE Published
2025-07-29 CVE Updated
2025-08-03 EPSS Updated
---------- Exploited in Wild
---------- KEV Due Date
---------- First Exploit

CWE

CAPEC

References (7)

URL	Tag	Source
https://git.kernel.org/stable/c/1397af5bfd7d32b0cf2adb70a78c9a9e8f11d912	Vuln. Introduced
https://git.kernel.org/stable/c/594cea2c09f7cd440d1ee1c4547d5bc6a646b0e4	Vuln. Introduced

URL	Date	SRC

URL	Date	SRC
https://git.kernel.org/stable/c/a47ef874189d47f934d0809ae738886307c0ea22	2025-07-24
https://git.kernel.org/stable/c/76179961c423cd698080b5e4d5583cf7f4fcdde9	2025-07-24
https://git.kernel.org/stable/c/fc38c249c622ff5e3011b8845fd49dbfd9289afc	2025-07-24
https://git.kernel.org/stable/c/938ce0e8422d3793fe30df2ed0e37f6bc0598379	2025-07-24
https://git.kernel.org/stable/c/2d72afb340657f03f7261e9243b44457a9228ac7	2025-07-17

URL	Date	SRC

Affected Vendors, Products, and Versions

Vendor		Product				Version		Other		Status
Vendor	Product	Version	Other	Status	<-- -->	Vendor	Product	Version	Other	Status
Linux Search vendor "Linux"		Linux Kernel Search vendor "Linux" for product "Linux Kernel"				>= 5.19 < 6.1.147 Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.1.147"		en		Affected
Linux Search vendor "Linux"		Linux Kernel Search vendor "Linux" for product "Linux Kernel"				>= 5.19 < 6.6.100 Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.6.100"		en		Affected
Linux Search vendor "Linux"		Linux Kernel Search vendor "Linux" for product "Linux Kernel"				>= 5.19 < 6.12.40 Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.12.40"		en		Affected
Linux Search vendor "Linux"		Linux Kernel Search vendor "Linux" for product "Linux Kernel"				>= 5.19 < 6.15.8 Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.15.8"		en		Affected
Linux Search vendor "Linux"		Linux Kernel Search vendor "Linux" for product "Linux Kernel"				>= 5.19 < 6.16 Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.16"		en		Affected
Linux Search vendor "Linux"		Linux Kernel Search vendor "Linux" for product "Linux Kernel"				5.18.13 Search vendor "Linux" for product "Linux Kernel" and version "5.18.13"		en		Affected