CVE-2023-52738
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Severity Score
Exploit Likelihood
Affected Versions
Public Exploits
0Exploited in Wild
-Decision
Descriptions
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.
Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:
amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
<TASK>
amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
devm_drm_dev_init_release+0x49/0x70
[...]
To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.
Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].
[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/
En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: drm/amdgpu/fence: se solucionó el error debido a que drm_sched init/fini no coincide. Actualmente, amdgpu llama a drm_sched_fini() desde la rutina SW fini del controlador de valla; se espera que se llame a dicha función. sólo después de que la función de inicio respectiva, drm_sched_init(), se haya ejecutado correctamente. Sucede que recientemente nos enfrentamos a una falla en la sonda del controlador en Steam Deck, y se llamó a la función drm_sched_fini() incluso sin que su contraparte se hubiera llamado previamente, lo que provocó el siguiente error: amdgpu: la sonda de 0000:04:00.0 falló con error -110 ERROR: desreferencia del puntero NULL del kernel, dirección: 0000000000000090 PGD 0 P4D 0 Ups: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd No contaminado 6.2.0-rc3-gpiccoli #338 Nombre del hardware : Valve Jupiter/Jupiter, BIOS F7A0113 04/11/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Seguimiento de llamadas: amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [ amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] Para evitar eso, verifique si drm_sched se inicializó correctamente para un anillo determinado antes de llamar a su contraparte fini. Observe que idealmente usaríamos sched.ready para eso; dicho campo se establece como lo último en drm_sched_init(). Pero amdgpu parece "sobreescribir" el significado de dicho campo; en el ejemplo anterior, por ejemplo, fue un anillo GFX el que provocó el bloqueo y el campo sched.ready se configuró en verdadero en la rutina de inicio del anillo, independientemente del estado de el programador DRM. Por lo tanto, terminamos usando sched.ops según la sugerencia de Christian [0] y también eliminamos la verificación no_scheduler [1]. [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994- f85f-d837-609f-7056d5fb7231@amd.com/
CVSS Scores
SSVC
- Decision:Track
Timeline
- 2024-05-21 CVE Reserved
- 2024-05-21 CVE Published
- 2024-05-22 EPSS Updated
- 2024-12-19 CVE Updated
- ---------- Exploited in Wild
- ---------- KEV Due Date
- ---------- First Exploit
CWE
CAPEC
References (5)
URL | Tag | Source |
---|---|---|
https://git.kernel.org/stable/c/067f44c8b4590c3f24d21a037578a478590f2175 | Vuln. Introduced | |
https://git.kernel.org/stable/c/8ba968ae672b3075794c8086aa164595b0175abe | Vuln. Introduced |
URL | Date | SRC |
---|
URL | Date | SRC |
---|
Affected Vendors, Products, and Versions
Vendor | Product | Version | Other | Status | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Vendor | Product | Version | Other | Status | <-- --> | Vendor | Product | Version | Other | Status |
Linux Search vendor "Linux" | Linux Kernel Search vendor "Linux" for product "Linux Kernel" | >= 5.15 < 5.15.94 Search vendor "Linux" for product "Linux Kernel" and version " >= 5.15 < 5.15.94" | en |
Affected
| ||||||
Linux Search vendor "Linux" | Linux Kernel Search vendor "Linux" for product "Linux Kernel" | >= 5.15 < 6.1.12 Search vendor "Linux" for product "Linux Kernel" and version " >= 5.15 < 6.1.12" | en |
Affected
| ||||||
Linux Search vendor "Linux" | Linux Kernel Search vendor "Linux" for product "Linux Kernel" | >= 5.15 < 6.2 Search vendor "Linux" for product "Linux Kernel" and version " >= 5.15 < 6.2" | en |
Affected
| ||||||
Linux Search vendor "Linux" | Linux Kernel Search vendor "Linux" for product "Linux Kernel" | 5.14.10 Search vendor "Linux" for product "Linux Kernel" and version "5.14.10" | en |
Affected
|