In the Linux kernel, the following vulnerability has been resolved:
xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
KASAN reproduces a slab-use-after-free in __xfrm_state_delete()'s
hlist_del_rcu calls under syzkaller load on linux-6.12.y stable
(reproduced on 6.12.47, also reachable via the same code path on
torvalds/master and on the ipsec tree). Nine unique signatures cluster
in the xfrm_state lifecycle, the load-bearing one being:
BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
BUG: KASAN: slab-use-after-free in hlist_del_rcu include/linux/rculist.h:516 [inline]
BUG: KASAN: slab-use-after-free in __xfrm_state_delete net/xfrm/xfrm_state.c
Write of size 8 at addr ffff8881198bcb70 by task kworker/u8:9/435
Workqueue: netns cleanup_net
Call Trace:
__hlist_del / hlist_del_rcu
__xfrm_state_delete
xfrm_state_delete
xfrm_state_flush
xfrm_state_fini
ops_exit_list
cleanup_net
The other observed signatures hit the same slab object from
__xfrm_state_lookup, xfrm_alloc_spi, __xfrm_state_insert and an OOB
write variant of __xfrm_state_delete, all on the byseq/byspi
hash chains.
__xfrm_state_delete() guards its byseq and byspi unhashes with
value-based predicates:
if (x->km.seq)
hlist_del_rcu(&x->byseq);
if (x->id.spi)
hlist_del_rcu(&x->byspi);
while everywhere else in the file (e.g. state_cache, state_cache_input)
the safer hlist_unhashed() check is used. xfrm_alloc_spi() sets
x->id.spi = newspi inside xfrm_state_lock and then immediately inserts
into byspi, but a path that observes x->id.spi != 0 outside of
xfrm_state_lock can still skip-or-hit the byspi unhash inconsistently
with whether x is actually on the list. The same holds for x->km.seq
versus byseq, and the bydst/bysrc unhashes have no predicate at all,
so a second __xfrm_state_delete() on the same object writes through
LIST_POISON pprev.
The defensive change here:
- Use hlist_del_init_rcu() instead of hlist_del_rcu() on bydst,
bysrc, byseq and byspi so a second deletion is a no-op rather
than a write through LIST_POISON pprev. The byseq/byspi nodes
are already initialised in xfrm_state_alloc().
- Test hlist_unhashed() rather than the value predicate for
byseq/byspi, so the unhash decision tracks list state rather than
mutable scalar fields.
Empirical verification: applied this patch on top of v6.12.47, rebuilt,
and re-ran the same syzkaller harness for 1h16m on a previously-crashy
configuration that produced ~100 hits each of slab-use-after-free
Read in xfrm_alloc_spi / Read in __xfrm_state_lookup / Write in
__xfrm_state_delete. After the patch, 7.1M execs across 32 VMs at
~1550 exec/sec produced zero xfrm_state UAF/OOB hits. /proc/slabinfo
confirms the xfrm_state slab is actively allocated and freed during
the run (~143 KiB resident), so the fuzzer is still exercising those
code paths -- they just no longer crash.
Reproduction:
- Linux 6.12.47 x86_64 + KASAN_GENERIC + KASAN_INLINE + KCOV
- syzkaller @ 746545b8b1e4c3a128db8652b340d3df90ce61db
- 32 QEMU/KVM VMs x 2 vCPU on AWS c5.metal bare metal
- 9 unique signatures collected in ~9h, all within xfrm_state
lifecycle
In the Linux kernel, the following vulnerability has been resolved:
KVM: x86: Fix shadow paging use-after-free due to unexpected GFN
The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus
the SPTE index. This assumption breaks for shadow paging if the guest
page tables are modified between VM entries (similar to commit
aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even
when creating an MMIO SPTE", 2026-03-27). The flow is as follows:
- a PDE is installed for a 2MB mapping, and a page in that area is
accessed. KVM creates a kvm_mmu_page consisting of 512 4KB pages;
the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because
the guest's mapping is a huge page (and thus contiguous).
- the PDE mapping is changed from outside the guest.
- the guest accesses another page in the same 2MB area. KVM installs
a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN
(i.e. based on the new mapping, as changed in the previous step) but
that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore
the rmap entry cannot be found and removed when the kvm_mmu_page
is zapped.
- the memslot that covers the first 2MB mapping is deleted, and the
kvm_mmu_page for the now-invalid GPA is zapped. However, rmap_remove()
only looks at the [sp->gfn, sp->gfn + 511] range established in step 1,
and fails to find the rmap entry that was recorded by step 3.
- any operation that causes an rmap walk for the same page accessed
by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page.
This includes dirty logging or MMU notifier invalidations (e.g., from
MADV_DONTNEED).
The underlying issue is that KVM's walking of shadow PTEs assumes that
if a SPTE is present when KVM wants to install a non-leaf SPTE, then the
existing kvm_mmu_page must be for the correct gfn. Because the only way
for the gfn to be wrong is if KVM messed up and failed to zap a SPTE...
which shouldn't happen, but *actually* only happens in response to a
guest write.
That bug dates back literally forever, as even the first version of KVM
assumes that the GFN matches and walks into the "wrong" shadow page.
However, that was only an imprecision until 2032a93d66fa ("KVM: MMU:
Don't allocate gfns page for direct mmu pages") came along.
Fix it by checking for a target gfn mismatch and zapping the existing
SPTE. That way the old SP and rmap entries are gone, KVM installs
the rmap in the right location, and everyone is happy.
In the Linux kernel, the following vulnerability has been resolved:
ipmi:si: Return state to normal if message allocation fails
There were places where nothing would get started if a message
allocation failed, so the driver needs to return to normal state.
In the Linux kernel, the following vulnerability has been resolved:
ibmasm: fix heap over-read in ibmasm_send_i2o_message()
The ibmasm_send_i2o_message() function uses get_dot_command_size() to
compute the byte count for memcpy_toio(), but this value is derived from
user-controlled fields in the dot_command_header (command_size: u8,
data_size: u16) and is never validated against the actual allocation size.
A root user can write a small buffer with inflated header fields, causing
memcpy_toio() to read up to ~65 KB past the end of the allocation into
adjacent kernel heap, which is then forwarded to the service processor
over MMIO.
Silently clamping the copy size is not sufficient: if the header fields
claim a larger size than the buffer, the SP receives a dot command whose
own header is inconsistent with the I2O message length, which can cause
the SP to desynchronize. Reject such commands outright by returning
failure.
Validate command_size before calling get_mfa_inbound() to avoid leaking
an I2O message frame: reading INBOUND_QUEUE_PORT dequeues a hardware
frame from the controller's free pool, and returning without a
corresponding set_mfa_inbound() call would permanently exhaust it.
Additionally, clamp command_size to I2O_COMMAND_SIZE before the
memcpy_toio() so the MMIO write stays within the I2O message frame,
consistent with the clamping already performed by outgoing_message_size()
for the header field.
In the Linux kernel, the following vulnerability has been resolved:
mm: fix deferred split queue races during migration
migrate_folio_move() records the deferred split queue state from src and
replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
makes dst visible before it is requeued, so a concurrent rmap-removal path
can mark dst partially mapped and trip the WARN in deferred_split_folio().
Move the requeue before remove_migration_ptes() so dst is back on the
deferred split queue before it becomes visible again.
Because migration still holds dst locked at that point, teach
deferred_split_scan() to requeue a folio when folio_trylock() fails.
Otherwise a fully mapped underused folio can be dequeued by the shrinker
and silently lost from split_queue.
[ziy@nvidia.com: move the comment]
In the Linux kernel, the following vulnerability has been resolved:
misc: ibmasm: fix OOB MMIO read in ibmasm_handle_mouse_interrupt()
ibmasm_handle_mouse_interrupt() performs an out-of-bounds MMIO read
when the queue reader or writer index from hardware exceeds
REMOTE_QUEUE_SIZE (60).
A compromised service processor can trigger this by writing an
out-of-range value to the reader or writer MMIO register before
asserting an interrupt. Since writer is re-read from hardware on
every loop iteration, it can also be set to an out-of-range value
after the loop has already started.
The root cause is that get_queue_reader() and get_queue_writer() return
raw readl() values that are passed directly into get_queue_entry(),
which computes:
queue_begin + reader * sizeof(struct remote_input)
with no bounds check. This unchecked MMIO address is then passed to
memcpy_fromio(), reading 8 bytes from unintended device registers.
For sufficiently large values the address falls outside the PCI BAR
mapping entirely, triggering a machine check exception.
Fix by checking both indices against REMOTE_QUEUE_SIZE at the top of
the loop body, before any call to get_queue_entry(). On an out-of-range
value, reset the reader register to 0 via set_queue_reader() before
breaking, so that normal queue operation can resume if the corrupted
hardware state is transient.
In the Linux kernel, the following vulnerability has been resolved:
dm mirror: fix integer overflow in create_dirty_log()
The argument count calculation in create_dirty_log() performs
`*args_used = 2 + param_count` before validating against argc. When a
user provides a param_count close to UINT_MAX via the device mapper
table string, this unsigned addition wraps around to a small value,
causing the subsequent `argc < *args_used` check to be bypassed.
The overflowed param_count is then passed as argc to dm_dirty_log_create(),
where it can cause out-of-bounds reads on the argv array.
Fix by comparing param_count against argc - 2 before performing the
addition, following the same pattern used by parse_features() in the
same file. Since argc >= 2 is already guaranteed, the subtraction is
safe.
In the Linux kernel, the following vulnerability has been resolved:
ext2: reject inodes with zero i_nlink and valid mode in ext2_iget()
ext2_iget() already rejects inodes with i_nlink == 0 when i_mode is
zero or i_dtime is set, treating them as deleted. However, the case of
i_nlink == 0 with a non-zero mode and zero dtime slips through. Since
ext2 has no orphan list, such a combination can only result from
filesystem corruption - a legitimate inode deletion always sets either
i_dtime or clears i_mode before freeing the inode.
A crafted image can exploit this gap to present such an inode to the
VFS, which then triggers WARN_ON inside drop_nlink() (fs/inode.c) via
ext2_unlink(), ext2_rename() and ext2_rmdir():
WARNING: CPU: 3 PID: 609 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 3 UID: 0 PID: 609 Comm: syz-executor Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_unlink+0x26c/0x300 fs/ext2/namei.c:295
vfs_unlink+0x2fc/0x9b0 fs/namei.c:4477
do_unlinkat+0x53e/0x730 fs/namei.c:4541
__x64_sys_unlink+0xc6/0x110 fs/namei.c:4587
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
WARNING: CPU: 0 PID: 646 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 0 UID: 0 PID: 646 Comm: syz.0.17 Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_rename+0x35e/0x850 fs/ext2/namei.c:374
vfs_rename+0xf2f/0x2060 fs/namei.c:5021
do_renameat2+0xbe2/0xd50 fs/namei.c:5178
__x64_sys_rename+0x7e/0xa0 fs/namei.c:5223
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
WARNING: CPU: 0 PID: 634 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 0 UID: 0 PID: 634 Comm: syz-executor Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_rmdir+0xca/0x110 fs/ext2/namei.c:311
vfs_rmdir+0x204/0x690 fs/namei.c:4348
do_rmdir+0x372/0x3e0 fs/namei.c:4407
__x64_sys_unlinkat+0xf0/0x130 fs/namei.c:4577
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Extend the existing i_nlink == 0 check to also catch this case,
reporting the corruption via ext2_error() and returning -EFSCORRUPTED.
This rejects the inode at load time and prevents it from reaching any
of the namei.c paths.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
In the Linux kernel, the following vulnerability has been resolved:
ibmasm: fix OOB reads in command_file_write due to missing size checks
The command_file_write() handler allocates a kernel buffer of exactly
count bytes and copies user data into it, but does not validate the
buffer against the dot command protocol before passing it to
get_dot_command_size() and get_dot_command_timeout().
Since both the allocation size (count) and the header fields (command_size,
data_size) are independently user-controlled, an attacker can cause
get_dot_command_size() to return a value exceeding the allocation,
triggering OOB reads in get_dot_command_timeout() and an out-of-bounds
memcpy_toio() that leaks kernel heap memory to the service processor.
Fix with two guards: reject writes smaller than sizeof(struct
dot_command_header) before allocation, then after copying user data
reject commands where the buffer is smaller than the total size declared
by the header (sizeof(header) + command_size + data_size). This ensures
all subsequent header and payload field accesses stay within the buffer.
In the Linux kernel, the following vulnerability has been resolved:
hfsplus: return error when node already exists in hfs_bnode_create
When hfs_bnode_create() finds that a node is already hashed (which should
not happen in normal operation), it currently returns the existing node
without incrementing its reference count. This causes a reference count
inconsistency that leads to a kernel panic when the node is later freed
in hfs_bnode_put():
kernel BUG at fs/hfsplus/bnode.c:676!
BUG_ON(!atomic_read(&node->refcnt))
This scenario can occur when hfs_bmap_alloc() attempts to allocate a node
that is already in use (e.g., when node 0's bitmap bit is incorrectly
unset), or due to filesystem corruption.
Returning an existing node from a create path is not normal operation.
Fix this by returning ERR_PTR(-EEXIST) instead of the node when it's
already hashed. This properly signals the error condition to callers,
which already check for IS_ERR() return values.