// For flags

CVE-2024-26939

drm/i915/vma: Fix UAF on destroy against retire race

Severity Score

5.5
*CVSS v3.1

Exploit Likelihood

*EPSS

Affected Versions

*CPE

Public Exploits

0
*Multiple Sources

Exploited in Wild

-
*KEV

Decision

Track*
*SSVC
Descriptions

In the Linux kernel, the following vulnerability has been resolved:

drm/i915/vma: Fix UAF on destroy against retire race

Object debugging tools were sporadically reporting illegal attempts to
free a still active i915 VMA object when parking a GT believed to be idle.

[161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]
[161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0
...
[161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1
[161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
[161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]
[161.360592] RIP: 0010:debug_print_object+0x80/0xb0
...
[161.361347] debug_object_free+0xeb/0x110
[161.361362] i915_active_fini+0x14/0x130 [i915]
[161.361866] release_references+0xfe/0x1f0 [i915]
[161.362543] i915_vma_parked+0x1db/0x380 [i915]
[161.363129] __gt_park+0x121/0x230 [i915]
[161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915]

That has been tracked down to be happening when another thread is
deactivating the VMA inside __active_retire() helper, after the VMA's
active counter has been already decremented to 0, but before deactivation
of the VMA's object is reported to the object debugging tool.

We could prevent from that race by serializing i915_active_fini() with
__active_retire() via ref->tree_lock, but that wouldn't stop the VMA from
being used, e.g. from __i915_vma_retire() called at the end of
__active_retire(), after that VMA has been already freed by a concurrent
i915_vma_destroy() on return from the i915_active_fini(). Then, we should
rather fix the issue at the VMA level, not in i915_active.

Since __i915_vma_parked() is called from __gt_park() on last put of the
GT's wakeref, the issue could be addressed by holding the GT wakeref long
enough for __active_retire() to complete before that wakeref is released
and the GT parked.

I believe the issue was introduced by commit d93939730347 ("drm/i915:
Remove the vma refcount") which moved a call to i915_active_fini() from
a dropped i915_vma_release(), called on last put of the removed VMA kref,
to i915_vma_parked() processing path called on last put of a GT wakeref.
However, its visibility to the object debugging tool was suppressed by a
bug in i915_active that was fixed two weeks later with commit e92eb246feb9
("drm/i915/active: Fix missing debug object activation").

A VMA associated with a request doesn't acquire a GT wakeref by itself.
Instead, it depends on a wakeref held directly by the request's active
intel_context for a GT associated with its VM, and indirectly on that
intel_context's engine wakeref if the engine belongs to the same GT as the
VMA's VM. Those wakerefs are released asynchronously to VMA deactivation.

Fix the issue by getting a wakeref for the VMA's GT when activating it,
and putting that wakeref only after the VMA is deactivated. However,
exclude global GTT from that processing path, otherwise the GPU never goes
idle. Since __i915_vma_retire() may be called from atomic contexts, use
async variant of wakeref put. Also, to avoid circular locking dependency,
take care of acquiring the wakeref before VM mutex when both are needed.

v7: Add inline comments with justifications for:
- using untracked variants of intel_gt_pm_get/put() (Nirmoy),
- using async variant of _put(),
- not getting the wakeref in case of a global GTT,
- always getting the first wakeref outside vm->mutex.
v6: Since __i915_vma_active/retire() callbacks are not serialized, storing
a wakeref tracking handle inside struct i915_vma is not safe, and
there is no other good place for that. Use untracked variants of
intel_gt_pm_get/put_async().
v5: Replace "tile" with "GT" across commit description (Rodrigo),
-
---truncated---

En el kernel de Linux, se resolvió la siguiente vulnerabilidad: drm/i915/vma: Corrección de UAF al destruir contra ejecución de retirada. Las herramientas de depuración de objetos informaban esporádicamente intentos ilegales de liberar un objeto i915 VMA aún activo al estacionar un GT que se creía que estaba inactivo. [161.359441] ODEBUG: objeto activo libre (estado activo 0): ffff88811643b958 tipo de objeto: i915_active sugerencia: __i915_vma_active+0x0/0x50 [i915] [161.360082] ADVERTENCIA: CPU: 5 PID: 276 en lib/debugobjects.c:514 _imprimir_objeto+ 0x80/0xb0 ... [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 No contaminado 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1 [161.360314] Nombre de hardware: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 21/04/2022 [161.360322] Cola de trabajo: i915 desordenado __intel_wakeref_put_work [i915] [161.360592] RIP 0010:debug_print_object+0x 80/0xb0... [161.361347] debug_object_free +0xeb/0x110 [161.361362] i915_active_fini+0x14/0x130 [i915] [161.361866] referencias de versión+0xfe/0x1f0 [i915] [161.362543] i915_vma_parked+0x1db/0x380 [i915] .363129] __gt_park+0x121/0x230 [i915] [161.363515 ] ____intel_wakeref_put_last+0x1f/0x70 [i915] Se ha rastreado que eso sucede cuando otro subproceso desactiva el VMA dentro del asistente __active_retire(), después de que el contador activo del VMA ya se haya reducido a 0, pero antes de que se desactive la desactivación del objeto del VMA. reportado a la herramienta de depuración de objetos. Podríamos evitar esa ejecución serializando i915_active_fini() con __active_retire() a través de ref->tree_lock, pero eso no impediría que se use VMA, por ejemplo, desde __i915_vma_retire() llamado al final de __active_retire(), después de ese VMA ya ha sido liberado por un i915_vma_destroy() concurrente al regresar de i915_active_fini(). Entonces, deberíamos solucionar el problema a nivel de VMA, no en i915_active. Dado que __i915_vma_parked() se llama desde __gt_park() en la última colocación del wakeref del GT, el problema podría solucionarse manteniendo el wakeref del GT el tiempo suficiente para que __active_retire() se complete antes de que se libere el wakeref y se estacione el GT. Creo que el problema fue introducido por el commit d93939730347 ("drm/i915: Eliminar el recuento de vma") que movió una llamada a i915_active_fini() desde un i915_vma_release() eliminado, llamado en la última colocación del kref de VMA eliminado, a i915_vma_parked() ruta de procesamiento llamada en la última colocación de un wakeref GT. Sin embargo, su visibilidad para la herramienta de depuración de objetos fue suprimida por un error en i915_active que se solucionó dos semanas después con el commit e92eb246feb9 ("drm/i915/active: Reparar la activación del objeto de depuración que falta"). Un VMA asociado con una solicitud no adquiere un wakeref GT por sí solo. En cambio, depende de un wakeref mantenido directamente por el intel_context activo de la solicitud para un GT asociado con su VM, e indirectamente del wakeref del motor de ese intel_context si el motor pertenece al mismo GT que la VM del VMA. Esos wakerefs se liberan de forma asincrónica con la desactivación de VMA. Solucione el problema obteniendo un wakeref para el GT del VMA al activarlo y colocando ese wakeref solo después de que se desactive el VMA. Sin embargo, excluya el GTT global de esa ruta de procesamiento; de lo contrario, la GPU nunca quedará inactiva. Dado que se puede llamar a __i915_vma_retire() desde contextos atómicos, utilice la variante asíncrona de wakeref put. Además, para evitar la dependencia del bloqueo circular, tenga cuidado de adquirir el wakeref antes del mutex de VM cuando ambos sean necesarios. v7: agregue comentarios en línea con justificaciones para: - usar variantes sin seguimiento de intel_gt_pm_get/put() (Nirmoy), - usar la variante asíncrona de _put(), - no obtener el wakeref en caso de un GTT global, - obtener siempre el primer wakeref fuera de vm->mutex. ---truncado---

A use-after-free flaw was found in drivers/gpu/drm/i915/i915_vma.c in the Linux kernel that may lead to a crash.

*Credits: N/A
CVSS Scores
Attack Vector
Local
Attack Complexity
Low
Privileges Required
Low
User Interaction
None
Scope
Unchanged
Confidentiality
None
Integrity
None
Availability
High
* Common Vulnerability Scoring System
SSVC
  • Decision:Track*
Exploitation
None
Automatable
No
Tech. Impact
Total
* Organization's Worst-case Scenario
Timeline
  • 2024-02-19 CVE Reserved
  • 2024-05-01 CVE Published
  • 2024-05-01 EPSS Updated
  • 2024-08-02 CVE Updated
  • ---------- Exploited in Wild
  • ---------- KEV Due Date
  • ---------- First Exploit
CWE
  • CWE-416: Use After Free
CAPEC
Affected Vendors, Products, and Versions
Vendor Product Version Other Status
Vendor Product Version Other Status <-- --> Vendor Product Version Other Status
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 5.19 < 6.1.88
Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.1.88"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 5.19 < 6.6.29
Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.6.29"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 5.19 < 6.8.3
Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.8.3"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 5.19 < 6.9
Search vendor "Linux" for product "Linux Kernel" and version " >= 5.19 < 6.9"
en
Affected