CVE-2024-26671 – blk-mq: fix IO hang from sbitmap wakeup race
https://notcve.org/view.php?id=CVE-2024-26671
In the Linux kernel, the following vulnerability has been resolved: blk-mq: fix IO hang from sbitmap wakeup race In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered with the following blk_mq_get_driver_tag() in case of getting driver tag failure. Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime blk_mq_mark_tag_wait() can't get driver tag successfully. This issue can be reproduced by running the following test in loop, and fio hang can be observed in < 30min when running it on my test VM in laptop. modprobe -r scsi_debug modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4 dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename` fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \ --runtime=100 --numjobs=40 --time_based --name=test \ --ioengine=libaio Fix the issue by adding one explicit barrier in blk_mq_mark_tag_wait(), which is just fine in case of running out of tag. En el kernel de Linux, se resolvió la siguiente vulnerabilidad: blk-mq: corrige el bloqueo de IO de la carrera de activación del mapa de bits En blk_mq_mark_tag_wait(), __add_wait_queue() se puede reordenar con el siguiente blk_mq_get_driver_tag() en caso de que se produzca un error en la etiqueta del controlador. Luego, en __sbitmap_queue_wake_up(), waitqueue_active() puede no observar al camarero agregado en blk_mq_mark_tag_wait() y no despertar nada, mientras tanto, blk_mq_mark_tag_wait() no puede obtener la etiqueta del controlador correctamente. Este problema se puede reproducir ejecutando la siguiente prueba en bucle, y se puede observar un bloqueo de fio en < 30 minutos cuando lo ejecuto en mi máquina virtual de prueba en una computadora portátil. modprobe -r scsi_debug modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4 dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block /* | cabeza -1 | xargs basename` fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --io Depth=1 \ --runtime=100 --numjobs=40 --time_based - -name=test \ --ioengine=libaio Solucione el problema agregando una barrera explícita en blk_mq_mark_tag_wait(), lo cual está bien en caso de quedarse sin etiqueta. • https://git.kernel.org/stable/c/9525b38180e2753f0daa1a522b7767a2aa969676 https://git.kernel.org/stable/c/ecd7744a1446eb02ccc63e493e2eb6ede4ef1e10 https://git.kernel.org/stable/c/7610ba1319253225a9ba8a9d28d472fc883b4e2f https://git.kernel.org/stable/c/89e0e66682e1538aeeaa3109503473663cd24c8b https://git.kernel.org/stable/c/1d9c777d3e70bdc57dddf7a14a80059d65919e56 https://git.kernel.org/stable/c/6d8b01624a2540336a32be91f25187a433af53a0 https://git.kernel.org/stable/c/f1bc0d8163f8ee84a8d5affdf624cfad657df1d2 https://git.kernel.org/stable/c/5266caaf5660529e3da53004b8b7174ca • CWE-362: Concurrent Execution using Shared Resource with Improper Synchronization ('Race Condition') •
CVE-2023-52635 – PM / devfreq: Synchronize devfreq_monitor_[start/stop]
https://notcve.org/view.php?id=CVE-2023-52635
In the Linux kernel, the following vulnerability has been resolved: PM / devfreq: Synchronize devfreq_monitor_[start/stop] There is a chance if a frequent switch of the governor done in a loop result in timer list corruption where timer cancel being done from two place one from cancel_delayed_work_sync() and followed by expire_timers() can be seen from the traces[1]. while true do echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor echo "performance" > /sys/class/devfreq/1d84000.ufshc/governor done It looks to be issue with devfreq driver where device_monitor_[start/stop] need to synchronized so that delayed work should get corrupted while it is either being queued or running or being cancelled. Let's use polling flag and devfreq lock to synchronize the queueing the timer instance twice and work data being corrupted. [1] ... .. <idle>-0 [003] 9436.209662: timer_cancel timer=0xffffff80444f0428 <idle>-0 [003] 9436.209664: timer_expire_entry timer=0xffffff80444f0428 now=0x10022da1c function=__typeid__ZTSFvP10timer_listE_global_addr baseclk=0x10022da1c <idle>-0 [003] 9436.209718: timer_expire_exit timer=0xffffff80444f0428 kworker/u16:6-14217 [003] 9436.209863: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2b now=0x10022da1c flags=182452227 vendor.xxxyyy.ha-1593 [004] 9436.209888: timer_cancel timer=0xffffff80444f0428 vendor.xxxyyy.ha-1593 [004] 9436.216390: timer_init timer=0xffffff80444f0428 vendor.xxxyyy.ha-1593 [004] 9436.216392: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2c now=0x10022da1d flags=186646532 vendor.xxxyyy.ha-1593 [005] 9436.220992: timer_cancel timer=0xffffff80444f0428 xxxyyyTraceManag-7795 [004] 9436.261641: timer_cancel timer=0xffffff80444f0428 [2] 9436.261653][ C4] Unable to handle kernel paging request at virtual address dead00000000012a [ 9436.261664][ C4] Mem abort info: [ 9436.261666][ C4] ESR = 0x96000044 [ 9436.261669][ C4] EC = 0x25: DABT (current EL), IL = 32 bits [ 9436.261671][ C4] SET = 0, FnV = 0 [ 9436.261673][ C4] EA = 0, S1PTW = 0 [ 9436.261675][ C4] Data abort info: [ 9436.261677][ C4] ISV = 0, ISS = 0x00000044 [ 9436.261680][ C4] CM = 0, WnR = 1 [ 9436.261682][ C4] [dead00000000012a] address between user and kernel address ranges [ 9436.261685][ C4] Internal error: Oops: 96000044 [#1] PREEMPT SMP [ 9436.261701][ C4] Skip md ftrace buffer dump for: 0x3a982d0 ... [ 9436.262138][ C4] CPU: 4 PID: 7795 Comm: TraceManag Tainted: G S W O 5.10.149-android12-9-o-g17f915d29d0c #1 [ 9436.262141][ C4] Hardware name: Qualcomm Technologies, Inc. (DT) [ 9436.262144][ C4] pstate: 22400085 (nzCv daIf +PAN -UAO +TCO BTYPE=--) [ 9436.262161][ C4] pc : expire_timers+0x9c/0x438 [ 9436.262164][ C4] lr : expire_timers+0x2a4/0x438 [ 9436.262168][ C4] sp : ffffffc010023dd0 [ 9436.262171][ C4] x29: ffffffc010023df0 x28: ffffffd0636fdc18 [ 9436.262178][ C4] x27: ffffffd063569dd0 x26: ffffffd063536008 [ 9436.262182][ C4] x25: 0000000000000001 x24: ffffff88f7c69280 [ 9436.262185][ C4] x23: 00000000000000e0 x22: dead000000000122 [ 9436.262188][ C4] x21: 000000010022da29 x20: ffffff8af72b4e80 [ 9436.262191][ C4] x19: ffffffc010023e50 x18: ffffffc010025038 [ 9436.262195][ C4] x17: 0000000000000240 x16: 0000000000000201 [ 9436.262199][ C4] x15: ffffffffffffffff x14: ffffff889f3c3100 [ 9436.262203][ C4] x13: ffffff889f3c3100 x12: 00000000049f56b8 [ 9436.262207][ C4] x11: 00000000049f56b8 x10: 00000000ffffffff [ 9436.262212][ C4] x9 : ffffffc010023e50 x8 : dead000000000122 [ 9436.262216][ C4] x7 : ffffffffffffffff x6 : ffffffc0100239d8 [ 9436.262220][ C4] x5 : 0000000000000000 x4 : 0000000000000101 [ 9436.262223][ C4] x3 : 0000000000000080 x2 : ffffff8 ---truncated--- En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: PM / devfreq: Sincronizar devfreq_monitor_[start/stop] Existe la posibilidad de que un cambio frecuente del gobernador realizado en un bucle provoque corrupción en la lista de temporizadores, donde la cancelación del temporizador se realiza desde dos coloque uno de cancel_delayed_work_sync() y seguido de expire_timers() se puede ver en los rastros [1]. mientras que es verdadero, haga echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor echo "rendimiento" > /sys/class/devfreq/1d84000.ufshc/governor done Parece ser un problema con el controlador devfreq donde device_monitor_[start /stop] debe sincronizarse para que el trabajo retrasado se corrompa mientras está en cola, ejecutándose o cancelándose. Usemos el indicador de sondeo y el bloqueo devfreq para sincronizar la cola de la instancia del temporizador dos veces y los datos de trabajo que se corrompen. [1] ... .. -0 [003] 9436.209662: timer_cancel timer=0xffffff80444f0428 -0 [003] 9436.209664: timer_expire_entry timer=0xffffff80444f0428 now=0x10022da1c function=__typeid__ZTS FvP10timer_listE_global_addr baseclk=0x10022da1c -0 [003] 9436.209718: timer_expire_exit timer=0xffffff80444f0428 kworker/u16:6-14217 [003] 9436.209863: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expira = 0x10022da2b ahora = 0x10022da1c banderas = 182452227 proveedor.xxxyyy.ha-1593 [004] 9436.209888: timer_cancel temporizador=0xffffff80444f0428 proveedor.xxxyyy.ha-1593 [004] 9436.216390: timer_init temporizador=0xffffff80444f0428 proveedor.xxxyyy.ha-1593 [004] 9436.216392: timer_start temporizador=0xffffff80444f042 8 función=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2c ahora=0x10022da1d flags=186646532 proveedor.xxxyyy .ha-1593 [005] 9436.220992: timer_cancel timer=0xffffff80444f0428 xxxyyyTraceManag-7795 [004] 9436.261641: timer_cancel timer=0xffffff80444f0428 [2] 9436.261653][ C4] No se puede para manejar la solicitud de paginación del kernel en la dirección virtual dead00000000012a [ 9436.261664][ C4] Mem información de cancelación: [ 9436.261666][ C4] ESR = 0x96000044 [ 9436.261669][ C4] EC = 0x25: DABT (EL actual), IL = 32 bits [ 9436.261671][ C4] SET = 0, FnV = 0 [ 9436.261673][ C4 ] EA = 0, S1PTW = 0 [ 9436.261675][ C4] Información de cancelación de datos: [ 9436.261677][ C4] ISV = 0, ISS = 0x00000044 [ 9436.261680][ C4] CM = 0, WnR = 1 [ 9436.261682][ C4] [dead00000000012a] dirección entre los rangos de direcciones del usuario y del kernel [ 9436.261685][ C4] Error interno: Vaya: 96000044 [#1] SMP PREEMPT [ 9436.261701][ C4] Omitir el volcado del búfer md ftrace para: 0x3a982d0 ... [ 9436.262138][ C4 ] CPU: 4 PID: 7795 Comunicaciones: TraceManag Tainted: GSWO 5.10.149-android12-9-o-g17f915d29d0c #1 [ 9436.262141][ C4] Nombre de hardware: Qualcomm Technologies, Inc. (DT) [ 9436.262144][ C4] pstate : 22400085 (nzCv daIf +PAN -UAO +TCO BTYPE=--) [ 9436.262161][ C4] pc : expire_timers+0x9c/0x438 [ 9436.262164][ C4] lr : expire_timers+0x2a4/0x438 [ 9436.262168][ C4] sp : ffffffc010023dd0 [ 9436.262171][ C4] x29: ffffffc010023df0 x28: ffffffd0636fdc18 [ 9436.262178][ C4] x27: ffffffd063569dd0 x26: ffffffd063536008 [ 9436 .262182][ C4] x25: 0000000000000001 x24: ffffff88f7c69280 [ 9436.262185][ C4] x23: 00000000000000e0 x22: muerto000000000122 [ 9436.262188][ C4] x21: 000000010022da29 x20: ffffff8af72b4e80 [ 9436.262191][ C4] x19: ffffffc010023e50 x18: ffffffc010025038 [ 9436.262195][ C4] x 17: 0000000000000240 x16: 0000000000000201 [ 9436.262199][ C4] x15: ffffffffffffffff x14: ffffff889f3c3100 [ 9436.262203] [ C4] x13: ffffff889f3c3100 x12: 00000000049f56b8 [ 9436.262207][ C4] x11: 00000000049f56b8 x10: 00000000ffffffff [ 9436.262212][ C4] x9 : ffffffc0 10023e50 x8: muerto000000000122 [9436.262216][C4] x7: ffffffffffffffff x6: ffffffc0100239d8 [9436.262220][C4 ]---truncado--- A flaw was found in the Linux kernel resulting from race conditions and a lack of synchronization in handling the delayed work timers in the devfreq component. This issue can lead to inconsistencies and a corruption of the timer list. • https://git.kernel.org/stable/c/3399cc7013e761fee9d6eec795e9b31ab0cbe475 https://git.kernel.org/stable/c/099f6a9edbe30b142c1d97fe9a4748601d995675 https://git.kernel.org/stable/c/31569995fc65007b73a3fff605ec2b3401b435e9 https://git.kernel.org/stable/c/0aedb319ef3ed39e9e5a7b7726c8264ca627bbd9 https://git.kernel.org/stable/c/ae815e2fdc284ab31651d52460698bd89c0fce22 https://git.kernel.org/stable/c/aed5ed595960c6d301dcd4ed31aeaa7a8054c0c6 https://lists.debian.org/debian-lts-announce/2024/06/msg00017.html https://access.redhat.com/security/cve/CVE-2023 • CWE-414: Missing Lock Check •
CVE-2023-52634 – drm/amd/display: Fix disable_otg_wa logic
https://notcve.org/view.php?id=CVE-2023-52634
In the Linux kernel, the following vulnerability has been resolved: drm/amd/display: Fix disable_otg_wa logic [Why] When switching to another HDMI mode, we are unnecesarilly disabling/enabling FIFO causing both HPO and DIG registers to be set at the same time when only HPO is supposed to be set. This can lead to a system hang the next time we change refresh rates as there are cases when we don't disable OTG/FIFO but FIFO is enabled when it isn't supposed to be. [How] Removing the enable/disable FIFO entirely. En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: drm/amd/display: corrige la lógica enable_otg_wa [Por qué] Cuando cambiamos a otro modo HDMI, deshabilitamos/habilitamos FIFO innecesariamente, lo que hace que los registros HPO y DIG se configuren al mismo tiempo. momento en el que se supone que sólo se debe configurar HPO. Esto puede provocar que el sistema se cuelgue la próxima vez que cambiemos las frecuencias de actualización, ya que hay casos en los que no deshabilitamos OTG/FIFO pero FIFO está habilitado cuando no debería estarlo. [Cómo] Eliminar completamente la activación/desactivación de FIFO. • https://git.kernel.org/stable/c/ce29728ef6485a367934cc100249c66dd3cde5b6 https://git.kernel.org/stable/c/2ce156482a6fef349d2eba98e5070c412d3af662 https://access.redhat.com/security/cve/CVE-2023-52634 https://bugzilla.redhat.com/show_bug.cgi?id=2272806 • CWE-99: Improper Control of Resource Identifiers ('Resource Injection') •
CVE-2023-52633 – um: time-travel: fix time corruption
https://notcve.org/view.php?id=CVE-2023-52633
In the Linux kernel, the following vulnerability has been resolved: um: time-travel: fix time corruption In 'basic' time-travel mode (without =inf-cpu or =ext), we still get timer interrupts. These can happen at arbitrary points in time, i.e. while in timer_read(), which pushes time forward just a little bit. Then, if we happen to get the interrupt after calculating the new time to push to, but before actually finishing that, the interrupt will set the time to a value that's incompatible with the forward, and we'll crash because time goes backwards when we do the forwarding. Fix this by reading the time_travel_time, calculating the adjustment, and doing the adjustment all with interrupts disabled. En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: um: viaje en el tiempo: corrige la corrupción del tiempo En el modo de viaje en el tiempo 'básico' (sin =inf-cpu o =ext), todavía obtenemos interrupciones del temporizador. Esto puede suceder en momentos arbitrarios en el tiempo, es decir, mientras está en timer_read(), lo que adelanta un poco el tiempo. • https://git.kernel.org/stable/c/0c7478a2da3f5fe106b4658338873d50c86ac7ab https://git.kernel.org/stable/c/4f7dad73df4cdb2b7042103d3922745d040ad025 https://git.kernel.org/stable/c/de3e9d8e8d1ae0a4d301109d1ec140796901306c https://git.kernel.org/stable/c/b427f55e9d4185f6f17cc1e3296eb8d0c4425283 https://git.kernel.org/stable/c/abe4eaa8618bb36c2b33e9cdde0499296a23448c •
CVE-2023-52632 – drm/amdkfd: Fix lock dependency warning with srcu
https://notcve.org/view.php?id=CVE-2023-52632
In the Linux kernel, the following vulnerability has been resolved: drm/amdkfd: Fix lock dependency warning with srcu ====================================================== WARNING: possible circular locking dependency detected 6.5.0-kfd-yangp #2289 Not tainted ------------------------------------------------------ kworker/0:2/996 is trying to acquire lock: (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0 but task is already holding lock: ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}, at: process_one_work+0x211/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu] svm_range_set_attr+0xd6/0x14c0 [amdgpu] kfd_ioctl+0x1d1/0x630 [amdgpu] __x64_sys_ioctl+0x88/0xc0 -> #2 (&info->lock#2){+.+.}-{3:3}: __mutex_lock+0x99/0xc70 amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu] restore_process_helper+0x22/0x80 [amdgpu] restore_process_worker+0x2d/0xa0 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 -> #1 ((work_completion)(&(&process->restore_work)->work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 __cancel_work_timer+0x12c/0x1c0 kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu] __mmu_notifier_release+0xad/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 do_exit+0x322/0xb90 do_group_exit+0x37/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x38/0x80 -> #0 (srcu){.+.+}-{0:0}: __lock_acquire+0x1521/0x2510 lock_sync+0x5f/0x90 __synchronize_srcu+0x4f/0x1a0 __mmu_notifier_release+0x128/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 svm_range_deferred_list_work+0x19f/0x350 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 other info that might help us debug this: Chain exists of: srcu --> &info->lock#2 --> (work_completion)(&svms->deferred_list_work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&svms->deferred_list_work)); lock(&info->lock#2); lock((work_completion)(&svms->deferred_list_work)); sync(srcu); En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: drm/amdkfd: corrige la advertencia de dependencia de bloqueo con srcu ============================== ========================== ADVERTENCIA: posible dependencia de bloqueo circular detectada 6.5.0-kfd-yangp #2289 No contaminado ------ ------------------------------------------------ ktrabajador/ 0:2/996 está intentando adquirir el bloqueo: (srcu){.+.+}-{0:0}, en: __synchronize_srcu+0x5/0x1a0 pero la tarea ya mantiene el bloqueo: ((work_completion)(&svms->deferred_list_work )){+.+.}-{0:0}, en: Process_one_work+0x211/0x560 cuyo bloqueo ya depende del nuevo bloqueo. la cadena de dependencia existente (en orden inverso) es: -> #3 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 svm_range_list_lock_and_flush_work+0x3d/0x110 [ amdgpu] svm_range_set_attr+0xd6/0x14c0 [amdgpu] kfd_ioctl+0x1d1/0x630 [amdgpu] __x64_sys_ioctl+0x88/0xc0 -> #2 (&info->lock#2){+.+.}-{3:3}: __mutex_lock+ 0x99/0xc70 amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu] restaurar_proceso_helper+0x22/0x80 [amdgpu] restaurar_proceso_trabajador+0x2d/0xa0 [amdgpu] proceso_one_work+0x29b/0x560 trabajador_thread+0x3d/0x3d 0 -> #1 ((finalización_trabajo)(&(&proceso- >restore_work)->work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 __cancel_work_timer+0x12c/0x1c0 kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu] __mmu_notifier_release+0xad/0x240 exit_mmap+0x6a/0x3 a0 mmentrada +0x6a/0x120 do_exit+0x322/0xb90 do_group_exit+0x37/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x38/0x80 -> #0 (srcu){.+.+}-{0:0}: __lock_acquire+0x1521/0x2 510 lock_sync +0x5f/0x90 __synchronize_srcu+0x4f/0x1a0 __mmu_notifier_release+0x128/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 svm_range_deferred_list_work+0x19f/0x350 [amdgpu] Process_one_work+0x29b/0 x560 trabajador_thread+0x3d/0x3d0 otra información que podría ayudarnos a depurar esto : Existe cadena de: srcu --> &info->lock#2 --> (work_completion)(&svms->deferred_list_work) Posible escenario de bloqueo inseguro: CPU0 CPU1 ---- ---- lock((work_completion)(&svms- >lista_trabajo_diferido)); bloquear(&info->bloquear#2); lock((work_completion)(&svms->deferred_list_work)); sincronización(srcu); • https://git.kernel.org/stable/c/b602f098f716723fa5c6c96a486e0afba83b7b94 https://git.kernel.org/stable/c/752312f6a79440086ac0f9b08d7776870037323c https://git.kernel.org/stable/c/1556c242e64cdffe58736aa650b0b395854fe4d4 https://git.kernel.org/stable/c/2a9de42e8d3c82c6990d226198602be44f43f340 https://access.redhat.com/security/cve/CVE-2023-52632 https://bugzilla.redhat.com/show_bug.cgi?id=2272804 • CWE-667: Improper Locking •