// For flags

CVE-2024-26976

KVM: Always flush async #PF workqueue when vCPU is being destroyed

Severity Score

7.0
*CVSS v3.1

Exploit Likelihood

*EPSS

Affected Versions

*CPE

Public Exploits

0
*Multiple Sources

Exploited in Wild

-
*KEV

Decision

Track*
*SSVC
Descriptions

In the Linux kernel, the following vulnerability has been resolved:

KVM: Always flush async #PF workqueue when vCPU is being destroyed

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put. Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock. async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Workqueue: events async_pf_execute [kvm]
RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
Call Trace:
<TASK>
async_pf_execute+0x198/0x260 [kvm]
process_one_work+0x145/0x2d0
worker_thread+0x27e/0x3a0
kthread+0xba/0xe0
ret_from_fork+0x2d/0x50
ret_from_fork_asm+0x11/0x20
</TASK>
---[ end trace 0000000000000000 ]---
INFO: task kworker/8:1:251 blocked for more than 120 seconds.
Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/8:1 state:D stack:0 pid:251 ppid:2 flags:0x00004000
Workqueue: events async_pf_execute [kvm]
Call Trace:
<TASK>
__schedule+0x33f/0xa40
schedule+0x53/0xc0
schedule_timeout+0x12a/0x140
__wait_for_common+0x8d/0x1d0
__flush_work.isra.0+0x19f/0x2c0
kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
kvm_put_kvm+0x1c1/0x320 [kvm]
async_pf_execute+0x198/0x260 [kvm]
process_one_work+0x145/0x2d0
worker_thread+0x27e/0x3a0
kthread+0xba/0xe0
ret_from_fork+0x2d/0x50
ret_from_fork_asm+0x11/0x20
</TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed. And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue(). Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay. kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

trace_kvm_async_pf_completed(addr, cr2_or_gpa);

__kvm_vcpu_wake_up(vcpu);

mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue. Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5cbebee ("KVM: Prevent module exit until al
---truncated---

En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: KVM: siempre vacíe la cola de trabajo asíncrona #PF cuando se destruya la vCPU. Siempre vacíe la cola de trabajo asíncrona #PF por vCPU cuando una vCPU esté limpiando su cola de finalización, por ejemplo, cuando una VM y todo sus vCPU están siendo destruidas. KVM debe asegurarse de que ninguna de sus devoluciones de llamada de la cola de trabajo se esté ejecutando cuando se coloca la última referencia al _módulo_ KVM. Regalar una referencia a la VM asociada evita que la devolución de llamada de la cola de trabajo elimine la referencia a la memoria de vCPU/VM liberada, pero no evita que el módulo KVM se descargue antes de que se complete la devolución de llamada. Elimine el regalo de recuento de VM equivocado, ya que llamar a kvm_put_kvm() desde async_pf_execute() si kvm_put_kvm() vacía la cola de trabajo asíncrona #PF resultará en un punto muerto. async_pf_execute() no puede regresar hasta que finalice kvm_put_kvm(), y kvm_put_kvm() no puede regresar hasta que finalice async_pf_execute(): ADVERTENCIA: CPU: 8 PID: 251 en virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm] Módulos vinculados en: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: GW 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 Nombre de hardware: Estándar QEMU PC (Q35 + ICH9, 2009), BIOS 0.0.0 06/02/2015 Cola de trabajo: eventos async_pf_execute [kvm] RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm] Seguimiento de llamadas: async_pf_execute+0x198/0x260 [kvm ] Process_one_work+0x145/0x2d0 work_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 ---[ end trace 0000000000000000 ]--- INFORMACIÓN: tarea kworker /8:1: 251 bloqueado durante más de 120 segundos. Contaminado: GW 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" desactiva este mensaje. tarea:kworker/8:1 estado:D pila:0 pid:251 ppid:2 banderas:0x00004000 Cola de trabajo: eventos async_pf_execute [kvm] Seguimiento de llamadas: __schedule+0x33f/0xa40 Schedule+0x53/0xc0 Schedule_timeout+0x12a/0x140 __wait_for_common+0x8d/0x1d0 __flush_work.isra.0+0x19f/0x2c0 kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm] kvm_arch_destroy_vm+0x78/0x1b0 [kvm] x320 [kvm] async_pf_execute+0x198/0x260 [kvm] proceso_one_work+0x145/ 0x2d0 trabajador_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 Si kvm_clear_async_pf_completion_queue() realmente vacía la cola de trabajo, entonces no hay necesidad de regalar un referencia porque todas las invocaciones de async_pf_execute () se verá obligado a completarse antes de que la vCPU y su VM sean destruidas o liberadas. Y eso, a su vez, corrige el error de descarga del módulo, ya que __fput() no ejecutará module_put() en la última referencia de vCPU hasta que se haya liberado la vCPU, por ejemplo, si al cerrar el archivo de vCPU también se coloca la última referencia al módulo KVM. Tenga en cuenta que kvm_check_async_pf_completion() también puede sacar el elemento de trabajo de la cola de finalización y, por lo tanto, también necesita vaciar la cola de trabajos, ya que kvm_clear_async_pf_completion_queue() no verá el trabajo. En teoría, esperar en la cola de trabajo podría retrasar una vCPU debido a la espera de que se complete el trabajo, pero esa es una posibilidad muy, muy pequeña y probablemente un retraso muy pequeño. kvm_arch_async_page_present_queued() realiza incondicionalmente una nueva solicitud, es decir, efectivamente retrasará la entrada del invitado, por lo que el trabajo restante es realmente solo: trace_kvm_async_pf_completed(addr, cr2_or_gpa); __kvm_vcpu_wake_up(vcpu); mmput(mm); y mmput() no puede eliminar la última referencia a las tablas de páginas si la vCPU aún está activa, es decir, la vCPU no se atascará al derribar las tablas de páginas. ---truncado---

*Credits: N/A
CVSS Scores
Attack Vector
Local
Attack Complexity
High
Privileges Required
Low
User Interaction
None
Scope
Unchanged
Confidentiality
High
Integrity
High
Availability
High
* Common Vulnerability Scoring System
SSVC
  • Decision:Track*
Exploitation
None
Automatable
No
Tech. Impact
Total
* Organization's Worst-case Scenario
Timeline
  • 2024-02-19 CVE Reserved
  • 2024-05-01 CVE Published
  • 2024-05-01 EPSS Updated
  • 2024-08-02 CVE Updated
  • ---------- Exploited in Wild
  • ---------- KEV Due Date
  • ---------- First Exploit
CWE
  • CWE-400: Uncontrolled Resource Consumption
CAPEC
Affected Vendors, Products, and Versions
Vendor Product Version Other Status
Vendor Product Version Other Status <-- --> Vendor Product Version Other Status
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 4.19.312
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 4.19.312"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 5.4.274
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 5.4.274"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 5.10.215
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 5.10.215"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 5.15.154
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 5.15.154"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 6.1.84
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 6.1.84"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 6.6.24
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 6.6.24"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 6.7.12
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 6.7.12"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 6.8.3
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 6.8.3"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 2.6.38 < 6.9
Search vendor "Linux" for product "Linux Kernel" and version " >= 2.6.38 < 6.9"
en
Affected