CVE-2024-36028 – mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
https://notcve.org/view.php?id=CVE-2024-36028
In the Linux kernel, the following vulnerability has been resolved: mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio() When I did memory failure tests recently, below warning occurs: DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0 Modules linked in: mce_inject hwpoison_inject CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire+0xccb/0x1ca0 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0 RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004 FS: 00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0 Call Trace: <TASK> lock_acquire+0xbe/0x2d0 _raw_spin_lock_irqsave+0x3a/0x60 hugepage_subpool_put_pages.part.0+0xe/0xc0 free_huge_folio+0x253/0x3f0 dissolve_free_huge_page+0x147/0x210 __page_handle_poison+0x9/0x70 memory_failure+0x4e6/0x8c0 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xbc/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff9f3114887 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00 </TASK> Kernel panic - not syncing: kernel: panic_on_warn set ... CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> panic+0x326/0x350 check_panic_on_warn+0x4f/0x50 __warn+0x98/0x190 report_bug+0x18e/0x1a0 handle_bug+0x3d/0x70 exc_invalid_op+0x18/0x70 asm_exc_invalid_op+0x1a/0x20 RIP: 0010:__lock_acquire+0xccb/0x1ca0 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0 RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004 lock_acquire+0xbe/0x2d0 _raw_spin_lock_irqsave+0x3a/0x60 hugepage_subpool_put_pages.part.0+0xe/0xc0 free_huge_folio+0x253/0x3f0 dissolve_free_huge_page+0x147/0x210 __page_handle_poison+0x9/0x70 memory_failure+0x4e6/0x8c0 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xbc/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff9f3114887 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00 </TASK> After git bisecting and digging into the code, I believe the root cause is that _deferred_list field of folio is unioned with _hugetlb_subpool field. In __update_and_free_hugetlb_folio(), folio->_deferred_ ---truncated--- En el kernel de Linux, se resolvió la siguiente vulnerabilidad: mm/hugetlb: corrija DEBUG_LOCKS_WARN_ON(1) cuando dissolve_free_hugetlb_folio() Cuando hice pruebas de falla de memoria recientemente, aparece la siguiente advertencia: DEBUG_LOCKS_WARN_ON(1) ADVERTENCIA: CPU: 8 PID: 1011 en kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0 Módulos vinculados en: mce_inject hwpoison_inject CPU: 8 PID: 1011 Comm: bash Kdump: cargado No contaminado 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3 Hardware nombre: PC estándar QEMU (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 01/04/2014 RIP: 0010:__lock_acquire+0xccb/0x1ca0 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0 RBP: 1c6865d3280 R08: fffffffb0f570a8 R09: 0000000000009ffb R10: 0000000000000286 R11: fffffffb0f2ad50 R12: ffffa1c6865d3d10 R13: ffffa1c6865d3c70 : 0000000000000000 R15: 0000000000000004 FS: 00007ff9f32aa740( 0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff9f3134ba0 CR3: 00000008484e 4000 CR4: 00000000000006f0 Seguimiento de llamadas: lock_acquire+0xbe/0x2d0 _raw_spin_lock_irqsave+0x3a/0x60 hugepage_subpool_put_pages. part.0+0xe/0xc0 free_huge_folio+0x253/0x3f0 dissolve_free_huge_page+0x147/0x210 __page_handle_poison+0x9/0x70 Memory_failure+0x4e6/0x8c0 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/ 0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xbc /0x1d0 Entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff9f3114887 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001 RBP: 0000564494164e10 00007ff9f31d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00 Pánico del kernel: no se sincroniza: kernel: pánico_on_warn configurado... CPU: 8 PID: 1011 Comm: bash Kdump: cargado No contaminado 6.9 .0-rc3-next-20240410-00012-gdb69f219f4be #3 Nombre del hardware: PC estándar QEMU (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 01/04/2014 Llamada Seguimiento: pánico+0x326/0x350 check_panic_on_warn+0x4f/0x50 __warn+0x98/0x190 report_bug+0x18e/0x1a0 handle_bug+0x3d/0x70 exc_invalid_op+0x18/0x70 asm_exc_invalid_op+0x1a/0x20 RIP: 0010:__lock_adquirir+0xccb/0x1ca0 RSP : 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8 RDX: 00000000ffffffd8 RSI: 027 RDI: ffffa1ce5fc1c9c0 RBP: ffffa1c6865d3280 R08: fffffffb0f570a8 R09: 0000000000009ffb R10: 00000000000000286 R11: fffffffb0f2ad50 R12: ffffa1c6865d 3d10 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15 : 0000000000000004 lock_acquire+0xbe/0x2d0 _raw_spin_lock_irqsave+0x3a/0x60 hugepage_subpool_put_pages.part.0+0xe/0xc0 free_huge_folio+0x253/0x3f0 dissolve_free_huge_page+0x147/0x210 __page_handle_ veneno+0x9/0x70 error_de_memoria+0x4e6/0x8c0 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/ 0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xbc/0x1d0 Entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff9f3114887 RSP: 002b:00007ffecbacb4 58 EFLAGS: 00000246 ORIG_RAX: 00000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887 RDX: 000000000000000c RSI : 0000564494164e10 RDI: 0000000000000001 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007ffffff R10: 0000000000000000 R11: 000000000000246 R12: 000000000000000c R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00 Después de que git divida e indague en el código, creo que la causa raíz es que el campo _deferred_list del folio está unido con el campo _hugetlb_subpool. En __update_and_free_hugetlb_folio(), folio->_deferred_ ---truncado--- • https://git.kernel.org/stable/c/1b4ce2952b4f33e198d5e993acff0611dff1e399 https://git.kernel.org/stable/c/32c877191e022b55fe3a374f3d7e9fb5741c514d https://git.kernel.org/stable/c/9a1a43a0e7e96911eaa00ad20b20f2edefb31d8a https://git.kernel.org/stable/c/2effe407f7563add41750fd7e03da4ea44b98099 https://git.kernel.org/stable/c/7e0a322877416e8c648819a8e441cf8c790b2cce https://git.kernel.org/stable/c/9c9b32d46afab2d911897914181c488954012300 https://git.kernel.org/stable/c/52ccdde16b6540abe43b6f8d8e1e1ec90b0983af •
CVE-2024-36027 – btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer
https://notcve.org/view.php?id=CVE-2024-36027
In the Linux kernel, the following vulnerability has been resolved: btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer Btrfs clears the content of an extent buffer marked as EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is introduced to prevent a write hole of an extent buffer, which is once allocated, marked dirty, but turns out unnecessary and cleaned up within one transaction operation. Currently, btrfs_clear_buffer_dirty() marks the extent buffer as EXTENT_BUFFER_ZONED_ZEROOUT, and skips the entry function. If this call happens while the buffer is under IO (with the WRITEBACK flag set, without the DIRTY flag), we can add the ZEROOUT flag and clear the buffer's content just before a bio submission. As a result: 1) it can lead to adding faulty delayed reference item which leads to a FS corrupted (EUCLEAN) error, and 2) it writes out cleared tree node on disk The former issue is previously discussed in [1]. The corruption happens when it runs a delayed reference update. • https://git.kernel.org/stable/c/aa6313e6ff2bfbf736a2739047bba355d8241584 https://git.kernel.org/stable/c/f4b994fccbb6f294c4b31a6ca0114b09f7245043 https://git.kernel.org/stable/c/68879386180c0efd5a11e800b0525a01068c9457 •
CVE-2024-36026 – drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11
https://notcve.org/view.php?id=CVE-2024-36026
In the Linux kernel, the following vulnerability has been resolved: drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11 While doing multiple S4 stress tests, GC/RLC/PMFW get into an invalid state resulting into hard hangs. Adding a GFX reset as workaround just before sending the MP1_UNLOAD message avoids this failure. En el kernel de Linux, se resolvió la siguiente vulnerabilidad: drm/amd/pm: corrige un bloqueo aleatorio en S4 para SMU v13.0.4/11 Al realizar múltiples pruebas de estrés de S4, GC/RLC/PMFW entra en un estado no válido, lo que resulta en cuelga duro. Agregar un reinicio de GFX como workaround justo antes de enviar el mensaje MP1_UNLOAD evita este error. • https://git.kernel.org/stable/c/bd9b94055c3deb2398ee4490c1dfdf03f53efb8f https://git.kernel.org/stable/c/1e3b8874d55c0c28378beb9007494a7a9269a5f5 https://git.kernel.org/stable/c/7521329e54931ede9e042bbf5f4f812b5bc4a01d https://git.kernel.org/stable/c/31729e8c21ecfd671458e02b6511eb68c2225113 •
CVE-2024-36025 – scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
https://notcve.org/view.php?id=CVE-2024-36025
In the Linux kernel, the following vulnerability has been resolved: scsi: qla2xxx: Fix off by one in qla_edif_app_getstats() The app_reply->elem[] array is allocated earlier in this function and it has app_req.num_ports elements. Thus this > comparison needs to be >= to prevent memory corruption. En el kernel de Linux, se resolvió la siguiente vulnerabilidad: scsi: qla2xxx: arreglada por uno en qla_edif_app_getstats() La matriz app_reply->elem[] se asignó anteriormente en esta función y tiene elementos app_req.num_ports. Por lo tanto, esta > comparación necesita ser >= para evitar la corrupción de la memoria. • https://git.kernel.org/stable/c/7878f22a2e03b69baf792f74488962981a1c9547 https://git.kernel.org/stable/c/8c820f7c8e9b46238d277c575392fe9930207aab https://git.kernel.org/stable/c/9fc74e367be4247a5ac39bb8ec41eaa73fade510 https://git.kernel.org/stable/c/60b87b5ecbe07d70897d35947b0bb3e76ccd1b3a https://git.kernel.org/stable/c/ea8ac95c22c93acecb710209a7fd10b851afe817 https://git.kernel.org/stable/c/4406e4176f47177f5e51b4cc7e6a7a2ff3dbfbbd https://access.redhat.com/security/cve/CVE-2024-36025 https://bugzilla.redhat.com/show_bug.cgi?id=2284421 • CWE-787: Out-of-bounds Write •
CVE-2024-36024 – drm/amd/display: Disable idle reallow as part of command/gpint execution
https://notcve.org/view.php?id=CVE-2024-36024
In the Linux kernel, the following vulnerability has been resolved: drm/amd/display: Disable idle reallow as part of command/gpint execution [Why] Workaroud for a race condition where DMCUB is in the process of committing to IPS1 during the handshake causing us to miss the transition into IPS2 and touch the INBOX1 RPTR causing a HW hang. [How] Disable the reallow to ensure that we have enough of a gap between entry and exit and we're not seeing back-to-back wake_and_executes. En el kernel de Linux, se resolvió la siguiente vulnerabilidad: drm/amd/display: deshabilite la reasignación inactiva como parte de la ejecución del comando/gpint [Por qué] Workaroud para una condición de ejecución en la que DMCUB está en el proceso de comprometerse con IPS1 durante el protocolo de enlace que causa Nos perdemos la transición a IPS2 y tocamos el RPTR de INBOX1 provocando un bloqueo del HW. [Cómo] Deshabilite la reallow para asegurarnos de que tengamos un espacio suficiente entre la entrada y la salida y que no veamos wake_and_executes consecutivos. • https://git.kernel.org/stable/c/2aac387445610d6dfd681f5214388e86f5677ef7 https://git.kernel.org/stable/c/6226a5aa77370329e01ee8abe50a95e60618ce97 •