// For flags

CVE-2023-52738

drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

Severity Score

"-"
*CVSS v-

Exploit Likelihood

*EPSS

Affected Versions

*CPE

Public Exploits

0
*Multiple Sources

Exploited in Wild

-
*KEV

Decision

Track
*SSVC
Descriptions

In the Linux kernel, the following vulnerability has been resolved:

drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
<TASK>
amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
devm_drm_dev_init_release+0x49/0x70
[...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: drm/amdgpu/fence: se solucionó el error debido a que drm_sched init/fini no coincide. Actualmente, amdgpu llama a drm_sched_fini() desde la rutina SW fini del controlador de valla; se espera que se llame a dicha función. sólo después de que la función de inicio respectiva, drm_sched_init(), se haya ejecutado correctamente. Sucede que recientemente nos enfrentamos a una falla en la sonda del controlador en Steam Deck, y se llamó a la función drm_sched_fini() incluso sin que su contraparte se hubiera llamado previamente, lo que provocó el siguiente error: amdgpu: la sonda de 0000:04:00.0 falló con error -110 ERROR: desreferencia del puntero NULL del kernel, dirección: 0000000000000090 PGD 0 P4D 0 Ups: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd No contaminado 6.2.0-rc3-gpiccoli #338 Nombre del hardware : Valve Jupiter/Jupiter, BIOS F7A0113 04/11/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Seguimiento de llamadas: amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [ amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] Para evitar eso, verifique si drm_sched se inicializó correctamente para un anillo determinado antes de llamar a su contraparte fini. Observe que idealmente usaríamos sched.ready para eso; dicho campo se establece como lo último en drm_sched_init(). Pero amdgpu parece "sobreescribir" el significado de dicho campo; en el ejemplo anterior, por ejemplo, fue un anillo GFX el que provocó el bloqueo y el campo sched.ready se configuró en verdadero en la rutina de inicio del anillo, independientemente del estado de el programador DRM. Por lo tanto, terminamos usando sched.ops según la sugerencia de Christian [0] y también eliminamos la verificación no_scheduler [1]. [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994- f85f-d837-609f-7056d5fb7231@amd.com/

*Credits: N/A
CVSS Scores
Attack Vector
-
Attack Complexity
-
Privileges Required
-
User Interaction
-
Scope
-
Confidentiality
-
Integrity
-
Availability
-
* Common Vulnerability Scoring System
SSVC
  • Decision:Track
Exploitation
None
Automatable
No
Tech. Impact
Partial
* Organization's Worst-case Scenario
Timeline
  • 2024-05-21 CVE Reserved
  • 2024-05-21 CVE Published
  • 2024-05-22 EPSS Updated
  • 2024-08-02 CVE Updated
  • ---------- Exploited in Wild
  • ---------- KEV Due Date
  • ---------- First Exploit
CWE
CAPEC
Affected Vendors, Products, and Versions
Vendor Product Version Other Status
Vendor Product Version Other Status <-- --> Vendor Product Version Other Status
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 5.15 < 5.15.94
Search vendor "Linux" for product "Linux Kernel" and version " >= 5.15 < 5.15.94"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 5.15 < 6.1.12
Search vendor "Linux" for product "Linux Kernel" and version " >= 5.15 < 6.1.12"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
>= 5.15 < 6.2
Search vendor "Linux" for product "Linux Kernel" and version " >= 5.15 < 6.2"
en
Affected
Linux
Search vendor "Linux"
Linux Kernel
Search vendor "Linux" for product "Linux Kernel"
5.14.10
Search vendor "Linux" for product "Linux Kernel" and version "5.14.10"
en
Affected