CVE-2024-47745 – mm: call the security_mmap_file() LSM hook in remap_file_pages()
https://notcve.org/view.php?id=CVE-2024-47745
In the Linux kernel, the following vulnerability has been resolved: mm: call the security_mmap_file() LSM hook in remap_file_pages() The remap_file_pages syscall handler calls do_mmap() directly, which doesn't contain the LSM security check. And if the process has called personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for RW pages, this will actually result in remapping the pages to RWX, bypassing a W^X policy enforced by SELinux. So we should check prot by security_mmap_file LSM hook in the remap_file_pages syscall handler before do_mmap() is called. Otherwise, it potentially permits an attacker to bypass a W^X policy enforced by SELinux. The bypass is similar to CVE-2016-10044, which bypass the same thing via AIO and can be found in [1]. The PoC: $ cat > test.c int main(void) { size_t pagesz = sysconf(_SC_PAGE_SIZE); int mfd = syscall(SYS_memfd_create, "test", 0); const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE, MAP_SHARED, mfd, 0); unsigned int old = syscall(SYS_personality, 0xffffffff); syscall(SYS_personality, READ_IMPLIES_EXEC | old); syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0); syscall(SYS_personality, old); // show the RWX page exists even if W^X policy is enforced int fd = open("/proc/self/maps", O_RDONLY); unsigned char buf2[1024]; while (1) { int ret = read(fd, buf2, 1024); if (ret <= 0) break; write(1, buf2, ret); } close(fd); } $ gcc test.c -o test $ ./test | grep rwx 7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted) [PM: subject line tweaks] • https://git.kernel.org/stable/c/49d3a4ad57c57227c3b0fd6cd4188b2a5ebd6178 https://git.kernel.org/stable/c/3393fddbfa947c8e1fdcc4509226905ffffd8b89 https://git.kernel.org/stable/c/ce14f38d6ee9e88e37ec28427b4b93a7c33c70d3 https://git.kernel.org/stable/c/ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 •
CVE-2024-47744 – KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
https://notcve.org/view.php?id=CVE-2024-47744
In the Linux kernel, the following vulnerability has been resolved: KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock on x86 due to a chain of locks and SRCU synchronizations. Translating the below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the fairness of r/w semaphores). CPU0 CPU1 CPU2 1 lock(&kvm->slots_lock); 2 lock(&vcpu->mutex); 3 lock(&kvm->srcu); 4 lock(cpu_hotplug_lock); 5 lock(kvm_lock); 6 lock(&kvm->slots_lock); 7 lock(cpu_hotplug_lock); 8 sync(&kvm->srcu); Note, there are likely more potential deadlocks in KVM x86, e.g. the same pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with __kvmclock_cpufreq_notifier(): cpuhp_cpufreq_online() | -> cpufreq_online() | -> cpufreq_gov_performance_limits() | -> __cpufreq_driver_target() | -> __target_index() | -> cpufreq_freq_transition_begin() | -> cpufreq_notify_transition() | -> ... __kvmclock_cpufreq_notifier() But, actually triggering such deadlocks is beyond rare due to the combination of dependencies and timings involved. E.g. the cpufreq notifier is only used on older CPUs without a constant TSC, mucking with the NX hugepage mitigation while VMs are running is very uncommon, and doing so while also onlining/offlining a CPU (necessary to generate contention on cpu_hotplug_lock) would be even more unusual. The most robust solution to the general cpu_hotplug_lock issue is likely to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq notifier doesn't to take kvm_lock. For now, settle for fixing the most blatant deadlock, as switching to an RCU-protected list is a much more involved change, but add a comment in locking.rst to call out that care needs to be taken when walking holding kvm_lock and walking vm_list. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O ------------------------------------------------------ tee/35048 is trying to acquire lock: ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm] but task is already holding lock: ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (kvm_lock){+.+.}-{3:3}: __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 kvm_dev_ioctl+0x4fb/0xe50 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (cpu_hotplug_lock){++++}-{0:0}: cpus_read_lock+0x2e/0xb0 static_key_slow_inc+0x16/0x30 kvm_lapic_set_base+0x6a/0x1c0 [kvm] kvm_set_apic_base+0x8f/0xe0 [kvm] kvm_set_msr_common+0x9ae/0xf80 [kvm] vmx_set_msr+0xa54/0xbe0 [kvm_intel] __kvm_set_msr+0xb6/0x1a0 [kvm] kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm] kvm_vcpu_ioctl+0x485/0x5b0 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&kvm->srcu){.+.+}-{0:0}: __synchronize_srcu+0x44/0x1a0 ---truncated--- • https://git.kernel.org/stable/c/0bf50497f03b3d892c470c7d1a10a3e9c3c95821 https://git.kernel.org/stable/c/4777225ec89f52bb9ca16a33cfb44c189f1b7b47 https://git.kernel.org/stable/c/a2764afce521fd9fd7a5ff6ed52ac2095873128a https://git.kernel.org/stable/c/760a196e6dcb29580e468b44b5400171dae184d8 https://git.kernel.org/stable/c/44d17459626052a2390457e550a12cb973506b2f •
CVE-2024-47743 – KEYS: prevent NULL pointer dereference in find_asymmetric_key()
https://notcve.org/view.php?id=CVE-2024-47743
In the Linux kernel, the following vulnerability has been resolved: KEYS: prevent NULL pointer dereference in find_asymmetric_key() In find_asymmetric_key(), if all NULLs are passed in the id_{0,1,2} arguments, the kernel will first emit WARN but then have an oops because id_2 gets dereferenced anyway. Add the missing id_2 check and move WARN_ON() to the final else branch to avoid duplicate NULL checks. Found by Linux Verification Center (linuxtesting.org) with Svace static analysis tool. • https://git.kernel.org/stable/c/7d30198ee24f2ddcc4fefcd38a9b76bd8ab31360 https://git.kernel.org/stable/c/3322fa8f2aa40b0b3651034cd541647a600cc6c0 https://git.kernel.org/stable/c/a3765b497a4f5224cb2f7a6a2d3357d3066214ee https://git.kernel.org/stable/c/13b5b401ead95b5d8266f64904086c55b6024900 https://git.kernel.org/stable/c/0d3b0706ada15c333e6f9faf19590ff715e45d1e https://git.kernel.org/stable/c/70fd1966c93bf3bfe3fe6d753eb3d83a76597eef •
CVE-2024-47742 – firmware_loader: Block path traversal
https://notcve.org/view.php?id=CVE-2024-47742
In the Linux kernel, the following vulnerability has been resolved: firmware_loader: Block path traversal Most firmware names are hardcoded strings, or are constructed from fairly constrained format strings where the dynamic parts are just some hex numbers or such. However, there are a couple codepaths in the kernel where firmware file names contain string components that are passed through from a device or semi-privileged userspace; the ones I could find (not counting interfaces that require root privileges) are: - lpfc_sli4_request_firmware_update() seems to construct the firmware filename from "ModelName", a string that was previously parsed out of some descriptor ("Vital Product Data") in lpfc_fill_vpd() - nfp_net_fw_find() seems to construct a firmware filename from a model name coming from nfp_hwinfo_lookup(pf->hwinfo, "nffw.partno"), which I think parses some descriptor that was read from the device. (But this case likely isn't exploitable because the format string looks like "netronome/nic_%s", and there shouldn't be any *folders* starting with "netronome/nic_". The previous case was different because there, the "%s" is *at the start* of the format string.) - module_flash_fw_schedule() is reachable from the ETHTOOL_MSG_MODULE_FW_FLASH_ACT netlink command, which is marked as GENL_UNS_ADMIN_PERM (meaning CAP_NET_ADMIN inside a user namespace is enough to pass the privilege check), and takes a userspace-provided firmware name. (But I think to reach this case, you need to have CAP_NET_ADMIN over a network namespace that a special kind of ethernet device is mapped into, so I think this is not a viable attack path in practice.) Fix it by rejecting any firmware names containing ".." path components. For what it's worth, I went looking and haven't found any USB device drivers that use the firmware loader dangerously. • https://git.kernel.org/stable/c/abb139e75c2cdbb955e840d6331cb5863e409d0e https://git.kernel.org/stable/c/d1768e5535d3ded59f888637016e6f821f4e069f https://git.kernel.org/stable/c/9b1ca33ebd05b3acef5b976c04e5e791af93ce1b https://git.kernel.org/stable/c/c30558e6c5c9ad6c86459d9acce1520ceeab9ea6 https://git.kernel.org/stable/c/a77fc4acfd49fc6076e565445b2bc5fdc3244da4 https://git.kernel.org/stable/c/3d2411f4edcb649eaf232160db459bb4770b5251 https://git.kernel.org/stable/c/7420c1bf7fc784e587b87329cc6dfa3dca537aa4 https://git.kernel.org/stable/c/28f1cd94d3f1092728fb775a0fe26c5f1 •
CVE-2024-47741 – btrfs: fix race setting file private on concurrent lseek using same fd
https://notcve.org/view.php?id=CVE-2024-47741
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix race setting file private on concurrent lseek using same fd When doing concurrent lseek(2) system calls against the same file descriptor, using multiple threads belonging to the same process, we have a short time window where a race happens and can result in a memory leak. The race happens like this: 1) A program opens a file descriptor for a file and then spawns two threads (with the pthreads library for example), lets call them task A and task B; 2) Task A calls lseek with SEEK_DATA or SEEK_HOLE and ends up at file.c:find_desired_extent() while holding a read lock on the inode; 3) At the start of find_desired_extent(), it extracts the file's private_data pointer into a local variable named 'private', which has a value of NULL; 4) Task B also calls lseek with SEEK_DATA or SEEK_HOLE, locks the inode in shared mode and enters file.c:find_desired_extent(), where it also extracts file->private_data into its local variable 'private', which has a NULL value; 5) Because it saw a NULL file private, task A allocates a private structure and assigns to the file structure; 6) Task B also saw a NULL file private so it also allocates its own file private and then assigns it to the same file structure, since both tasks are using the same file descriptor. At this point we leak the private structure allocated by task A. Besides the memory leak, there's also the detail that both tasks end up using the same cached state record in the private structure (struct btrfs_file_private::llseek_cached_state), which can result in a use-after-free problem since one task can free it while the other is still using it (only one task took a reference count on it). Also, sharing the cached state is not a good idea since it could result in incorrect results in the future - right now it should not be a problem because it end ups being used only in extent-io-tree.c:count_range_bits() where we do range validation before using the cached state. Fix this by protecting the private assignment and check of a file while holding the inode's spinlock and keep track of the task that allocated the private, so that it's used only by that task in order to prevent user-after-free issues with the cached state record as well as potentially using it incorrectly in the future. • https://git.kernel.org/stable/c/3c32c7212f1639471ec0197ff1179b8ef2e0f3d3 https://git.kernel.org/stable/c/f56a6d9c267ec7fa558ede7755551c047b1034cd https://git.kernel.org/stable/c/a412ca489ac27b9d0e603499315b7139c948130d https://git.kernel.org/stable/c/33d1310d4496e904123dab9c28b2d8d2c1800f97 https://git.kernel.org/stable/c/7ee85f5515e86a4e2a2f51969795920733912bad •