Commit Graph

2593 Commits

Author SHA1 Message Date
Keir Fraser da662aecc8 FROMLIST: KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()
Device MMIO registration may happen quite frequently during VM boot,
and the SRCU synchronization each time has a measurable effect
on VM startup time. In our experiments it can account for around 25%
of a VM's startup time.

Replace the synchronization with a deferred free of the old kvm_io_bus
structure.

Bug: 395485007
Bug: 357781595
Link: https://lore.kernel.org/all/20250624092256.1105524-4-keirf@google.com/

Change-Id: I52af8db54259f4423e3dcdfa298573944796d88d
Signed-off-by: Keir Fraser <keirf@google.com>
2025-07-11 03:09:09 -07:00
Greg Kroah-Hartman dba4f359fc Merge 6.12.30 into android16-6.12-lts
GKI (arm64) relevant 18 out of 143 changes, affecting 32 files +213/-83
  10d1496f85 fs/xattr.c: fix simple_xattr_list to always include security.* xattrs [1 file, +24/-0]
  bc4c54cbb4 binfmt_elf: Move brk for static PIE even if ASLR disabled [1 file, +47/-24]
  f0d70d8dca cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks [1 file, +4/-2]
  517c11fe4f tracing: probes: Fix a possible race in trace_probe_log APIs [5 files, +27/-3]
  94e7272b63 HID: uclogic: Add NULL check in uclogic_input_configured() [1 file, +4/-3]
  28826a89fd Bluetooth: MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags [1 file, +6/-3]
  d1365ca80b net_sched: Flush gso_skb list too during ->change() [7 files, +21/-6]
  ddfa034da3 nvme-pci: make nvme_pci_npages_prp() __always_inline [1 file, +1/-1]
  a3c147040b nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable [1 file, +2/-0]
  c88f4ff535 ALSA: usb-audio: Add sample rate quirk for Audioengine D1 [1 file, +2/-0]
  93152dac0b ALSA: usb-audio: Add sample rate quirk for Microdia JP001 USB Camera [1 file, +2/-0]
  fe1bebd0ed dma-buf: insert memory barrier before updating num_fences [1 file, +3/-2]
  7d353da580 ftrace: Fix preemption accounting for stacktrace trigger command [1 file, +1/-1]
  bffc3038a2 scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES buffer [3 files, +7/-2]
  20d6e621be ring-buffer: Fix persistent buffer when commit page is the reader page [1 file, +5/-3]
  fe0756daad mm: userfaultfd: correct dirty flags set for both present and swap pte [1 file, +10/-2]
  74953f93f4 mm/page_alloc: fix race condition in unaccepted memory handling [1 file, +0/-23]
  5924b32446 usb: typec: ucsi: displayport: Fix deadlock [3 files, +47/-8]

Changes in 6.12.30
	arm64: dts: rockchip: Assign RT5616 MCLK rate on rk3588-friendlyelec-cm3588
	fs/xattr.c: fix simple_xattr_list to always include security.* xattrs
	drivers/platform/x86/amd: pmf: Check for invalid sideloaded Smart PC Policies
	drivers/platform/x86/amd: pmf: Check for invalid Smart PC Policies
	riscv: dts: sophgo: fix DMA data-width configuration for CV18xx
	binfmt_elf: Move brk for static PIE even if ASLR disabled
	platform/x86/amd/pmc: Declare quirk_spurious_8042 for MECHREVO Wujie 14XA (GX4HRXL)
	platform/x86: asus-wmi: Fix wlan_ctrl_by_user detection
	arm64: dts: imx8mp-var-som: Fix LDO5 shutdown causing SD card timeout
	cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
	tracing: probes: Fix a possible race in trace_probe_log APIs
	tpm: tis: Double the timeout B to 4s
	uio_hv_generic: Fix sysfs creation path for ring buffer
	KVM: Add member to struct kvm_gfn_range to indicate private/shared
	KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing
	iio: adc: ad7266: Fix potential timestamp alignment issue.
	iio: chemical: pms7003: use aligned_s64 for timestamp
	iio: pressure: mprls0025pa: use aligned_s64 for timestamp
	drm/amd: Add Suspend/Hibernate notification callback support
	Revert "drm/amd: Stop evicting resources on APUs in suspend"
	xhci: dbc: Improve performance by removing delay in transfer event polling.
	xhci: dbc: Avoid event polling busyloop if pending rx transfers are inactive.
	iio: adc: ad7768-1: Fix insufficient alignment of timestamp.
	iio: chemical: sps30: use aligned_s64 for timestamp
	virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
	virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx
	RDMA/rxe: Fix slab-use-after-free Read in rxe_queue_cleanup bug
	HID: thrustmaster: fix memory leak in thrustmaster_interrupts()
	HID: uclogic: Add NULL check in uclogic_input_configured()
	nfs: handle failure of nfs_get_lock_context in unlock path
	spi: loopback-test: Do not split 1024-byte hexdumps
	RDMA/core: Fix "KASAN: slab-use-after-free Read in ib_register_device" problem
	Bluetooth: MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags
	net_sched: Flush gso_skb list too during ->change()
	tools/net/ynl: ethtool: fix crash when Hardware Clock info is missing
	mctp: no longer rely on net->dev_index_head[]
	net: mctp: Don't access ifa_index when missing
	selftests: ncdevmem: Redirect all non-payload output to stderr
	selftests: ncdevmem: Separate out dmabuf provider
	selftests: ncdevmem: Unify error handling
	selftests: ncdevmem: Make client_ip optional
	selftests: ncdevmem: Switch to AF_INET6
	tests/ncdevmem: Fix double-free of queue array
	net: mctp: Ensure keys maintain only one ref to corresponding dev
	ALSA: seq: Fix delivery of UMP events to group ports
	ALSA: ump: Fix a typo of snd_ump_stream_msg_device_info
	net: cadence: macb: Fix a possible deadlock in macb_halt_tx.
	net: dsa: sja1105: discard incoming frames in BR_STATE_LISTENING
	nvme-pci: make nvme_pci_npages_prp() __always_inline
	nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable
	ALSA: sh: SND_AICA should depend on SH_DMA_API
	net: dsa: b53: prevent standalone from trying to forward to other ports
	vsock/test: Fix occasional failure in SIOCOUTQ tests
	net/mlx5e: Disable MACsec offload for uplink representor profile
	qlcnic: fix memory leak in qlcnic_sriov_channel_cfg_cmd()
	regulator: max20086: fix invalid memory access
	drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value
	netlink: specs: tc: fix a couple of attribute names
	netlink: specs: tc: all actions are indexed arrays
	octeontx2-pf: macsec: Fix incorrect max transmit size in TX secy
	net: ethernet: mtk_eth_soc: fix typo for declaration MT7988 ESW capability
	octeontx2-af: Fix CGX Receive counters
	octeontx2-pf: Do not reallocate all ntuple filters
	wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request
	mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devices
	net/tls: fix kernel panic when alloc_page failed
	tsnep: fix timestamping with a stacked DSA driver
	NFSv4/pnfs: Reset the layout state after a layoutreturn
	dmaengine: Revert "dmaengine: dmatest: Fix dmatest waiting less when interrupted"
	sched_ext: bpf_iter_scx_dsq_new() should always initialize iterator
	udf: Make sure i_lenExtents is uptodate on inode eviction
	HID: bpf: abort dispatch if device destroyed
	LoongArch: Prevent cond_resched() occurring within kernel-fpu
	LoongArch: Move __arch_cpu_idle() to .cpuidle.text section
	LoongArch: Save and restore CSR.CNTC for hibernation
	LoongArch: Fix MAX_REG_OFFSET calculation
	LoongArch: uprobes: Remove user_{en,dis}able_single_step()
	LoongArch: uprobes: Remove redundant code about resume_era
	btrfs: fix discard worker infinite loop after disabling discard
	btrfs: fix folio leak in submit_one_async_extent()
	btrfs: add back warning for mount option commit values exceeding 300
	Revert "drm/amd/display: Hardware cursor changes color when switched to software cursor"
	drm/amdgpu: fix incorrect MALL size for GFX1151
	drm/amdgpu: csa unmap use uninterruptible lock
	drm/amd/display: Correct the reply value when AUX write incomplete
	drm/amd/display: Avoid flooding unnecessary info messages
	MAINTAINERS: Update Alexey Makhalov's email address
	gpio: pca953x: fix IRQ storm on system wake up
	ACPI: PPTT: Fix processor subtable walk
	ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2()
	ALSA: usb-audio: Add sample rate quirk for Audioengine D1
	ALSA: usb-audio: Add sample rate quirk for Microdia JP001 USB Camera
	dma-buf: insert memory barrier before updating num_fences
	hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages
	hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
	hv_netvsc: Remove rmsg_pgcnt
	arm64: dts: amlogic: dreambox: fix missing clkc_audio node
	arm64: dts: rockchip: Remove overdrive-mode OPPs from RK3588J SoC dtsi
	Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges
	Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()
	kbuild: Disable -Wdefault-const-init-unsafe
	ftrace: Fix preemption accounting for stacktrace trigger command
	ftrace: Fix preemption accounting for stacktrace filter command
	tracing: samples: Initialize trace_array_printk() with the correct function
	phy: tegra: xusb: Use a bitmask for UTMI pad power state tracking
	phy: Fix error handling in tegra_xusb_port_init
	phy: renesas: rcar-gen3-usb2: Fix role detection on unbind/bind
	phy: renesas: rcar-gen3-usb2: Set timing registers only once
	scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES buffer
	smb: client: fix memory leak during error handling for POSIX mkdir
	spi: tegra114: Use value to check for invalid delays
	tpm: Mask TPM RC in tpm2_start_auth_session()
	wifi: mt76: disable napi on driver removal
	ring-buffer: Fix persistent buffer when commit page is the reader page
	net: qede: Initialize qede_ll_ops with designated initializer
	mm: userfaultfd: correct dirty flags set for both present and swap pte
	dmaengine: ti: k3-udma: Add missing locking
	dmaengine: ti: k3-udma: Use cap_mask directly from dma_device structure instead of a local copy
	dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs
	dmaengine: idxd: fix memory leak in error handling path of idxd_setup_engines
	dmaengine: idxd: fix memory leak in error handling path of idxd_setup_groups
	dmaengine: idxd: Add missing cleanup for early error out in idxd_setup_internals
	dmaengine: idxd: Add missing cleanups in cleanup internals
	dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call
	dmaengine: idxd: fix memory leak in error handling path of idxd_alloc
	dmaengine: idxd: fix memory leak in error handling path of idxd_pci_probe
	dmaengine: idxd: Refactor remove call with idxd_cleanup() helper
	CIFS: New mount option for cifs.upcall namespace resolution
	drm/xe/gsc: do not flush the GSC worker from the reset path
	mm/page_alloc: fix race condition in unaccepted memory handling
	accel/ivpu: Rename ivpu_log_level to fw_log_level
	accel/ivpu: Reset fw log on cold boot
	accel/ivpu: Refactor functions in ivpu_fw_log.c
	accel/ivpu: Fix fw log printing
	iio: light: opt3001: fix deadlock due to concurrent flag access
	Bluetooth: btnxpuart: Fix kernel panic during FW release
	drm/fbdev-dma: Support struct drm_driver.fbdev_probe
	drm/panel-mipi-dbi: Run DRM default client setup
	drm/tiny: panel-mipi-dbi: Use drm_client_setup_with_fourcc()
	usb: typec: ucsi: displayport: Fix deadlock
	phy: tegra: xusb: remove a stray unlock
	drm/amdgpu: fix pm notifier handling
	Linux 6.12.30

Change-Id: I4fefed85c02f1ed826b7ee014700b80c10300bb5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2025-06-05 11:53:12 +00:00
Isaku Yamahata 3d962ec543 KVM: Add member to struct kvm_gfn_range to indicate private/shared
[ Upstream commit dca6c88532322830d5d92486467fcc91b67a9ad8 ]

Add new members to strut kvm_gfn_range to indicate which mapping
(private-vs-shared) to operate on: enum kvm_gfn_range_filter
attr_filter. Update the core zapping operations to set them appropriately.

TDX utilizes two GPA aliases for the same memslots, one for memory that is
for private memory and one that is for shared. For private memory, KVM
cannot always perform the same operations it does on memory for default
VMs, such as zapping pages and having them be faulted back in, as this
requires guest coordination. However, some operations such as guest driven
conversion of memory between private and shared should zap private memory.

Internally to the MMU, private and shared mappings are tracked on separate
roots. Mapping and zapping operations will operate on the respective GFN
alias for each root (private or shared). So zapping operations will by
default zap both aliases. Add fields in struct kvm_gfn_range to allow
callers to specify which aliases so they can only target the aliases
appropriate for their specific operation.

There was feedback that target aliases should be specified such that the
default value (0) is to operate on both aliases. Several options were
considered. Several variations of having separate bools defined such
that the default behavior was to process both aliases. They either allowed
nonsensical configurations, or were confusing for the caller. A simple
enum was also explored and was close, but was hard to process in the
caller. Instead, use an enum with the default value (0) reserved as a
disallowed value. Catch ranges that didn't have the target aliases
specified by looking for that specific value.

Set target alias with enum appropriately for these MMU operations:
 - For KVM's mmu notifier callbacks, zap shared pages only because private
   pages won't have a userspace mapping
 - For setting memory attributes, kvm_arch_pre_set_memory_attributes()
   chooses the aliases based on the attribute.
 - For guest_memfd invalidations, zap private only.

Link: https://lore.kernel.org/kvm/ZivIF9vjKcuGie3s@google.com/
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Message-ID: <20240718211230.1492011-3-rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stable-dep-of: 9129633d568e ("KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22 14:29:36 +02:00
Greg Kroah-Hartman f56453cbd7 Merge 6.12.24 into android16-6.12-lts
GKI (arm64) relevant 98 out of 394 changes, affecting 131 files +1443/-762
  40bc55e4fc cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask() [1 file, +3/-3]
  9701dcbf5f cgroup/cpuset: Fix error handling in remote_partition_disable() [1 file, +20/-9]
  2dbd1b1660 cgroup/cpuset: Revert "Allow suppression of sched domain rebuild in update_cpumasks_hier()" [1 file, +14/-25]
  6b145f8b22 cgroup/cpuset: Enforce at most one rebuild_sched_domains_locked() call per operation [1 file, +33/-16]
  1b06f00eda cgroup/cpuset: Further optimize code if CONFIG_CPUSETS_V1 not set [1 file, +19/-20]
  cdb6e724e7 cgroup/cpuset: Fix race between newly created partition and dying one [4 files, +25/-4]
  179ef2f810 gpiolib: of: Fix the choice for Ingenic NAND quirk [1 file, +2/-0]
  cb8372e54f ublk: refactor recovery configuration flag helpers [1 file, +42/-20]
  caa5c8a235 ublk: fix handling recovery & reissue in ublk_abort_queue() [1 file, +26/-4]
  7c5957f790 tipc: fix memory leak in tipc_link_xmit [1 file, +1/-0]
  4d55144b12 codel: remove sch->q.qlen check before qdisc_tree_reduce_backlog() [2 files, +3/-8]
  b2f3c3d57a tc: Ensure we have enough buffer space when sending filter netlink notifications [1 file, +45/-21]
  a065b99605 net: ethtool: Don't call .cleanup_data when prepare_data fails [1 file, +5/-3]
  70449ca406 net_sched: sch_sfq: use a temporary work area for validating configuration [1 file, +44/-12]
  f86293adce net_sched: sch_sfq: move the limit validation [1 file, +6/-4]
  6d98cd6342 net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend() [1 file, +13/-13]
  a6ed6f8ec8 net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY [1 file, +29/-2]
  cc16f7402a ipv6: Align behavior across nexthops during path selection [1 file, +4/-4]
  c61feda373 perf/core: Add aux_pause, aux_resume, aux_start_paused [4 files, +110/-5]
  7ef5aa081f perf/core: Simplify the perf_event_alloc() error path [2 files, +78/-76]
  fa1827fa96 perf: Fix hang while freeing sigtrap event [2 files, +18/-47]
  52535688c2 fs: consistently deref the files table with rcu_dereference_raw() [1 file, +17/-9]
  67e85cfa95 umount: Allow superblock owners to force umount [1 file, +2/-1]
  1b3ebfb15d perf: arm_pmu: Don't disable counter in armpmu_add() [1 file, +3/-5]
  11ae4fec1f PM: hibernate: Avoid deadlock in hibernate_compressor_param_set() [1 file, +3/-3]
  ead1fc9f93 Flush console log from kernel_power_off() [3 files, +8/-3]
  cb58e90920 arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD [1 file, +2/-0]
  3c057a4904 media: uvcvideo: Add quirk for Actions UVC05 [1 file, +9/-0]
  cb1c6cb110 ALSA: usb-audio: Fix CME quirk for UF series keyboards [1 file, +74/-6]
  a6bf0fd322 net: page_pool: don't cast mp param to devmem [1 file, +1/-1]
  c6e50cb8bf f2fs: don't retry IO for corrupted data scenario [1 file, +4/-0]
  de94d0ca9e net: usb: asix_devices: add FiberGecko DeviceID [1 file, +17/-0]
  7204335d19 page_pool: avoid infinite loop to schedule delayed worker [1 file, +7/-1]
  ecc4613316 f2fs: fix to avoid out-of-bounds access in f2fs_truncate_inode_blocks() [1 file, +8/-1]
  5f815757e6 ext4: protect ext4_release_dquot against freezing [1 file, +17/-0]
  aa39d45071 Revert "f2fs: rebuild nat_bits during umount" [3 files, +59/-95]
  eb59cc31b6 ext4: ignore xattrs past end [1 file, +10/-1]
  a8a8076210 cdc_ether|r8152: ThinkPad Hybrid USB-C/A Dock quirk [3 files, +19/-0]
  299d7d27af net: vlan: don't propagate flags on open [1 file, +4/-27]
  40c70ff44b tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER [1 file, +3/-1]
  6b7a32fa9b Bluetooth: hci_uart: fix race during initialization [1 file, +2/-1]
  fe6f1f349d Bluetooth: hci_qca: use the power sequencer for wcn6750 [1 file, +1/-1]
  feed98579d Bluetooth: qca: simplify WCN399x NVM loading [1 file, +6/-7]
  035e1bffc0 Bluetooth: Add quirk for broken READ_VOICE_SETTING [3 files, +15/-0]
  09246dfb5c Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE [2 files, +10/-1]
  044c1b3528 drm: allow encoder mode_set even when connectors change for crtc [1 file, +1/-1]
  df33b535f0 drm: panel-orientation-quirks: Add support for AYANEO 2S [1 file, +2/-2]
  6fe4ed94ee drm: panel-orientation-quirks: Add quirks for AYA NEO Flip DS and KB [1 file, +18/-0]
  5dd6fdb889 drm: panel-orientation-quirks: Add quirk for AYA NEO Slide [1 file, +6/-0]
  a64e097426 drm: panel-orientation-quirks: Add new quirk for GPD Win 2 [1 file, +6/-0]
  ba5a998f84 drm: panel-orientation-quirks: Add quirk for OneXPlayer Mini (Intel) [1 file, +12/-0]
  f04612890c drm/debugfs: fix printk format for bridge index [1 file, +1/-1]
  b22cb42a5e drm/bridge: panel: forbid initializing a panel with unknown connector type [1 file, +4/-1]
  1c38108a49 drivers: base: devres: Allow to release group on device release [1 file, +7/-0]
  8feefd106a PCI: Enable Configuration RRS SV early [1 file, +5/-3]
  73d2b96250 PCI: Check BAR index for validity [4 files, +57/-10]
  9a6be23eb0 tracing: probe-events: Add comments about entry data storing code [1 file, +28/-0]
  7b9bdd7059 erofs: set error to bio if file-backed IO fails [1 file, +2/-0]
  806908d5d9 bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags [1 file, +44/-36]
  dd3edffae8 ext4: don't treat fhandle lookup of ea_inode as FS corruption [1 file, +48/-20]
  2ff58c5b26 arm64: cputype: Add MIDR_CORTEX_A76AE [1 file, +2/-0]
  4af2858435 arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list [1 file, +1/-0]
  3b0f2526c8 arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB [2 files, +102/-102]
  20c105f587 arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list [1 file, +3/-0]
  c322789613 KVM: arm64: Tear down vGIC on failed vCPU creation [1 file, +5/-1]
  baea1762cd media: v4l2-dv-timings: prevent possible overflow in v4l2_detect_gtf() [1 file, +2/-2]
  0828d6e9ad io_uring/net: fix accept multishot handling [1 file, +2/-0]
  b7c6d081c1 io_uring/net: fix io_req_post_cqe abuse by send bundle [3 files, +6/-2]
  3e0356857e io_uring/kbuf: reject zero sized provided buffers [1 file, +2/-0]
  16d9067f00 ext4: fix off-by-one error in do_split [1 file, +1/-1]
  a1dde7457d f2fs: fix to avoid atomicity corruption of atomic file [2 files, +5/-3]
  e6bba32857 i3c: Add NULL pointer check in i3c_master_queue_ibi() [1 file, +3/-0]
  9eaec071f1 jbd2: remove wrong sb->s_sequence check [1 file, +0/-1]
  eec737e17e arm64: mops: Do not dereference src reg for a set operation [1 file, +2/-2]
  1dd288783d arm64: mm: Correct the update of max_pfn [1 file, +2/-1]
  5f7f6abd92 net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod. [2 files, +43/-2]
  53dc6b00c0 mm/rmap: reject hugetlb folios in folio_make_device_exclusive() [1 file, +1/-1]
  83b6b5061e mm: make page_mapped_in_vma() hugetlb walk aware [1 file, +9/-4]
  6dd8d9440f mm: fix lazy mmu docs and usage [1 file, +8/-6]
  2532df0a9b mm/mremap: correctly handle partial mremap() of VMA starting at 0 [1 file, +5/-5]
  cc98577f91 mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock [1 file, +1/-1]
  14936034de mm/userfaultfd: fix release hang over concurrent GUP [1 file, +25/-26]
  65b259e3e0 mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper [3 files, +8/-4]
  9e7c37fadb sctp: detect and prevent references to a freed transport in sendmsg [3 files, +18/-9]
  474b3194c8 tracing: Do not add length to print format in synthetic events [1 file, +0/-1]
  74f01c2ca8 dm-verity: fix prefetch-vs-suspend race [1 file, +8/-0]
  fae0a8796c KVM: Allow building irqbypass.ko as as module when kvm.ko is a module [3 files, +7/-7]
  dc83eccc93 of/irq: Fix device node refcount leakage in API of_irq_parse_one() [1 file, +27/-32]
  3540164c75 of/irq: Fix device node refcount leakage in API of_irq_parse_raw() [1 file, +8/-0]
  29cb94963c of/irq: Fix device node refcount leakages in of_irq_count() [1 file, +3/-1]
  d0f25a9977 of/irq: Fix device node refcount leakage in API irq_of_parse_and_map() [1 file, +5/-1]
  712d84459a of/irq: Fix device node refcount leakages in of_irq_init() [1 file, +3/-0]
  d69ad6e1a5 PCI: Fix reference leak in pci_alloc_child_bus() [1 file, +4/-1]
  9707d0c932 PCI: Fix reference leak in pci_register_host_bridge() [1 file, +7/-2]
  869202291a PCI: Fix wrong length of devres array [1 file, +1/-1]
  92ca7270fe ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio() [1 file, +3/-2]
  9ca4fe3574 arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists [1 file, +14/-1]
  281782d2c6 Bluetooth: hci_uart: Fix another race during initialization [2 files, +15/-6]

Changes in 6.12.24
	ASoC: Intel: adl: add 2xrt1316 audio configuration
	cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask()
	cgroup/cpuset: Fix error handling in remote_partition_disable()
	cgroup/cpuset: Revert "Allow suppression of sched domain rebuild in update_cpumasks_hier()"
	cgroup/cpuset: Enforce at most one rebuild_sched_domains_locked() call per operation
	cgroup/cpuset: Further optimize code if CONFIG_CPUSETS_V1 not set
	cgroup/cpuset: Fix race between newly created partition and dying one
	gpiolib: of: Fix the choice for Ingenic NAND quirk
	selftests/futex: futex_waitv wouldblock test should fail
	ublk: refactor recovery configuration flag helpers
	ublk: fix handling recovery & reissue in ublk_abort_queue()
	drm/i915: Disable RPG during live selftest
	x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI
	drm/xe/hw_engine: define sysfs_ops on all directories
	ata: pata_pxa: Fix potential NULL pointer dereference in pxa_ata_probe()
	objtool: Fix INSN_CONTEXT_SWITCH handling in validate_unret()
	tipc: fix memory leak in tipc_link_xmit
	codel: remove sch->q.qlen check before qdisc_tree_reduce_backlog()
	net: tls: explicitly disallow disconnect
	octeontx2-pf: qos: fix VF root node parent queue index
	tc: Ensure we have enough buffer space when sending filter netlink notifications
	net: ethtool: Don't call .cleanup_data when prepare_data fails
	drm/tests: modeset: Fix drm_display_mode memory leak
	drm/tests: helpers: Create kunit helper to destroy a drm_display_mode
	drm/tests: cmdline: Fix drm_display_mode memory leak
	drm/tests: modes: Fix drm_display_mode memory leak
	drm/tests: probe-helper: Fix drm_display_mode memory leak
	net: libwx: handle page_pool_dev_alloc_pages error
	ata: sata_sx4: Add error handling in pdc20621_i2c_read()
	drm/i915/huc: Fix fence not released on early probe errors
	nvmet-fcloop: swap list_add_tail arguments
	net_sched: sch_sfq: use a temporary work area for validating configuration
	net_sched: sch_sfq: move the limit validation
	smb: client: fix UAF in decryption with multichannel
	net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend()
	net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY
	ipv6: Align behavior across nexthops during path selection
	net: ppp: Add bound checking for skb data on ppp_sync_txmung
	nft_set_pipapo: fix incorrect avx2 match of 5th field octet
	iommu/exynos: Fix suspend/resume with IDENTITY domain
	iommu/mediatek: Fix NULL pointer deference in mtk_iommu_device_group
	perf/core: Add aux_pause, aux_resume, aux_start_paused
	perf/core: Simplify the perf_event_alloc() error path
	perf: Fix hang while freeing sigtrap event
	fs: consistently deref the files table with rcu_dereference_raw()
	umount: Allow superblock owners to force umount
	pm: cpupower: bench: Prevent NULL dereference on malloc failure
	x86/mm: Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
	x86/percpu: Disable named address spaces for UBSAN_BOOL with KASAN for GCC < 14.2
	x86/ia32: Leave NULL selector values 0~3 unchanged
	x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine
	perf: arm_pmu: Don't disable counter in armpmu_add()
	perf/dwc_pcie: fix some unreleased resources
	PM: hibernate: Avoid deadlock in hibernate_compressor_param_set()
	Flush console log from kernel_power_off()
	arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD
	xen/mcelog: Add __nonstring annotations for unterminated strings
	zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
	platform/chrome: cros_ec_lpc: Match on Framework ACPI device
	ASoC: SOF: topology: Use krealloc_array() to replace krealloc()
	HID: pidff: Convert infinite length from Linux API to PID standard
	HID: pidff: Do not send effect envelope if it's empty
	HID: pidff: Add MISSING_DELAY quirk and its detection
	HID: pidff: Add MISSING_PBO quirk and its detection
	HID: pidff: Add PERMISSIVE_CONTROL quirk
	HID: pidff: Add hid_pidff_init_with_quirks and export as GPL symbol
	HID: pidff: Add FIX_WHEEL_DIRECTION quirk
	HID: Add hid-universal-pidff driver and supported device ids
	HID: pidff: Add PERIODIC_SINE_ONLY quirk
	HID: pidff: Fix null pointer dereference in pidff_find_fields
	ASoC: amd: ps: use macro for ACP6.3 pci revision id
	ALSA: hda: intel: Fix Optimus when GPU has no sound
	ALSA: hda: intel: Add Lenovo IdeaPad Z570 to probe denylist
	ASoC: fsl_audmix: register card device depends on 'dais' property
	media: uvcvideo: Add quirk for Actions UVC05
	media: s5p-mfc: Corrected NV12M/NV21M plane-sizes
	mmc: dw_mmc: add a quirk for accessing 64-bit FIFOs in two halves
	ALSA: usb-audio: Fix CME quirk for UF series keyboards
	ASoC: amd: Add DMI quirk for ACP6X mic support
	ASoC: amd: yc: update quirk data for new Lenovo model
	platform/x86: x86-android-tablets: Add select POWER_SUPPLY to Kconfig
	wifi: ath11k: Fix DMA buffer allocation to resolve SWIOTLB issues
	wifi: ath11k: fix memory leak in ath11k_xxx_remove()
	wifi: ath12k: fix memory leak in ath12k_pci_remove()
	wifi: ath12k: Fix invalid entry fetch in ath12k_dp_mon_srng_process
	ata: libata-core: Add 'external' to the libata.force kernel parameter
	scsi: mpi3mr: Avoid reply queue full condition
	scsi: mpi3mr: Synchronous access b/w reset and tm thread for reply queue
	net: page_pool: don't cast mp param to devmem
	f2fs: don't retry IO for corrupted data scenario
	wifi: mac80211: add strict mode disabling workarounds
	wifi: mac80211: ensure sdata->work is canceled before initialized.
	scsi: target: spc: Fix RSOC parameter data header size
	net: usb: asix_devices: add FiberGecko DeviceID
	page_pool: avoid infinite loop to schedule delayed worker
	can: flexcan: Add quirk to handle separate interrupt lines for mailboxes
	can: flexcan: add NXP S32G2/S32G3 SoC support
	jfs: Fix uninit-value access of imap allocated in the diMount() function
	fs/jfs: cast inactags to s64 to prevent potential overflow
	fs/jfs: Prevent integer overflow in AG size calculation
	jfs: Prevent copying of nlink with value 0 from disk inode
	jfs: add sanity check for agwidth in dbMount
	ata: libata-eh: Do not use ATAPI DMA for a device limited to PIO mode
	net: sfp: add quirk for 2.5G OEM BX SFP
	wifi: ath12k: Fix invalid data access in ath12k_dp_rx_h_undecap_nwifi
	f2fs: fix to avoid out-of-bounds access in f2fs_truncate_inode_blocks()
	net: sfp: add quirk for FS SFP-10GM-T copper SFP+ module
	ahci: add PCI ID for Marvell 88SE9215 SATA Controller
	ext4: protect ext4_release_dquot against freezing
	Revert "f2fs: rebuild nat_bits during umount"
	ext4: ignore xattrs past end
	cdc_ether|r8152: ThinkPad Hybrid USB-C/A Dock quirk
	scsi: st: Fix array overflow in st_setup()
	ahci: Marvell 88SE9215 controllers prefer DMA for ATAPI
	btrfs: harden block_group::bg_list against list_del() races
	wifi: mt76: mt76x2u: add TP-Link TL-WDN6200 ID to device table
	net: vlan: don't propagate flags on open
	tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER
	Bluetooth: btintel_pcie: Add device id of Whale Peak
	Bluetooth: hci_uart: fix race during initialization
	Bluetooth: btusb: Add 2 HWIDs for MT7922
	Bluetooth: hci_qca: use the power sequencer for wcn6750
	Bluetooth: qca: simplify WCN399x NVM loading
	Bluetooth: Add quirk for broken READ_VOICE_SETTING
	Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE
	drm: allow encoder mode_set even when connectors change for crtc
	drm/xe/bmg: Add new PCI IDs
	drm/xe/vf: Don't try to trigger a full GT reset if VF
	drm/amd/display: Update Cursor request mode to the beginning prefetch always
	drm/amdgpu: Unlocked unmap only clear page table leaves
	drm: panel-orientation-quirks: Add support for AYANEO 2S
	drm: panel-orientation-quirks: Add quirks for AYA NEO Flip DS and KB
	drm: panel-orientation-quirks: Add quirk for AYA NEO Slide
	drm: panel-orientation-quirks: Add new quirk for GPD Win 2
	drm: panel-orientation-quirks: Add quirk for OneXPlayer Mini (Intel)
	drm/debugfs: fix printk format for bridge index
	drm/bridge: panel: forbid initializing a panel with unknown connector type
	drm/amd/display: stop DML2 from removing pipes based on planes
	drivers: base: devres: Allow to release group on device release
	drm/amdkfd: clamp queue size to minimum
	drm/amdkfd: Fix mode1 reset crash issue
	drm/amdkfd: Fix pqm_destroy_queue race with GPU reset
	drm/amdkfd: debugfs hang_hws skip GPU with MES
	drm/xe/xelp: Move Wa_16011163337 from tunings to workarounds
	drm/mediatek: mtk_dpi: Move the input_2p_en bit to platform data
	drm/mediatek: mtk_dpi: Explicitly manage TVD clock in power on/off
	PCI: Add Rockchip Vendor ID
	drm/amdgpu: handle amdgpu_cgs_create_device() errors in amd_powerplay_create()
	PCI: Enable Configuration RRS SV early
	drm/amdgpu: Fix the race condition for draining retry fault
	PCI: Check BAR index for validity
	PCI: vmd: Make vmd_dev::cfg_lock a raw_spinlock_t type
	drm/amdgpu: grab an additional reference on the gang fence v2
	fbdev: omapfb: Add 'plane' value check
	tracing: probe-events: Add comments about entry data storing code
	ktest: Fix Test Failures Due to Missing LOG_FILE Directories
	tpm, tpm_tis: Workaround failed command reception on Infineon devices
	tpm: End any active auth session before shutdown
	pwm: mediatek: Prevent divide-by-zero in pwm_mediatek_config()
	pwm: rcar: Improve register calculation
	pwm: fsl-ftm: Handle clk_get_rate() returning 0
	erofs: set error to bio if file-backed IO fails
	bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
	ext4: don't treat fhandle lookup of ea_inode as FS corruption
	s390/pci: Fix s390_mmio_read/write syscall page fault handling
	HID: pidff: Clamp PERIODIC effect period to device's logical range
	HID: pidff: Stop all effects before enabling actuators
	HID: pidff: Completely rework and fix pidff_reset function
	HID: pidff: Simplify pidff_upload_effect function
	HID: pidff: Define values used in pidff_find_special_fields
	HID: pidff: Rescale time values to match field units
	HID: pidff: Factor out code for setting gain
	HID: pidff: Move all hid-pidff definitions to a dedicated header
	HID: pidff: Simplify pidff_rescale_signed
	HID: pidff: Use macros instead of hardcoded min/max values for shorts
	HID: pidff: Factor out pool report fetch and remove excess declaration
	HID: pidff: Make sure to fetch pool before checking SIMULTANEOUS_MAX
	HID: hid-universal-pidff: Add Asetek wheelbases support
	HID: pidff: Comment and code style update
	HID: pidff: Support device error response from PID_BLOCK_LOAD
	HID: pidff: Remove redundant call to pidff_find_special_keys
	HID: pidff: Rename two functions to align them with naming convention
	HID: pidff: Clamp effect playback LOOP_COUNT value
	HID: pidff: Compute INFINITE value instead of using hardcoded 0xffff
	HID: pidff: Fix 90 degrees direction name North -> East
	HID: pidff: Fix set_device_control()
	auxdisplay: hd44780: Fix an API misuse in hd44780.c
	dt-bindings: media: st,stmipid02: correct lane-polarities maxItems
	media: mediatek: vcodec: Fix a resource leak related to the scp device in FW initialization
	media: mtk-vcodec: venc: avoid -Wenum-compare-conditional warning
	media: uapi: rkisp1-config: Fix typo in extensible params example
	media: mgb4: Fix CMT registers update logic
	media: i2c: adv748x: Fix test pattern selection mask
	media: mgb4: Fix switched CMT frequency range "magic values" sets
	media: intel/ipu6: set the dev_parent of video device to pdev
	media: venus: hfi: add a check to handle OOB in sfr region
	media: venus: hfi: add check to handle incorrect queue size
	media: vim2m: print device name after registering device
	media: siano: Fix error handling in smsdvb_module_init()
	media: rockchip: rga: fix rga offset lookup
	xenfs/xensyms: respect hypervisor's "next" indication
	arm64: cputype: Add MIDR_CORTEX_A76AE
	arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
	arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
	arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
	KVM: arm64: Tear down vGIC on failed vCPU creation
	spi: cadence-qspi: Fix probe on AM62A LP SK
	mtd: rawnand: brcmnand: fix PM resume warning
	tpm, tpm_tis: Fix timeout handling when waiting for TPM status
	accel/ivpu: Fix PM related deadlocks in MS IOCTLs
	media: streamzap: prevent processing IR data on URB failure
	media: hi556: Fix memory leak (on error) in hi556_check_hwcfg()
	media: visl: Fix ERANGE error when setting enum controls
	media: platform: stm32: Add check for clk_enable()
	media: imx219: Adjust PLL settings based on the number of MIPI lanes
	media: v4l2-dv-timings: prevent possible overflow in v4l2_detect_gtf()
	Revert "media: imx214: Fix the error handling in imx214_probe()"
	media: i2c: ccs: Set the device's runtime PM status correctly in remove
	media: i2c: ccs: Set the device's runtime PM status correctly in probe
	media: i2c: ov7251: Set enable GPIO low in probe
	media: i2c: ov7251: Introduce 1 ms delay between regulators and en GPIO
	media: nuvoton: Fix reference handling of ece_node
	media: nuvoton: Fix reference handling of ece_pdev
	media: venus: hfi_parser: add check to avoid out of bound access
	media: venus: hfi_parser: refactor hfi packet parsing logic
	media: i2c: imx319: Rectify runtime PM handling probe and remove
	media: i2c: imx219: Rectify runtime PM handling in probe and remove
	media: i2c: imx214: Rectify probe error handling related to runtime PM
	media: chips-media: wave5: Fix gray color on screen
	media: chips-media: wave5: Avoid race condition in the interrupt handler
	media: chips-media: wave5: Fix a hang after seeking
	media: chips-media: wave5: Fix timeout while testing 10bit hevc fluster
	mptcp: sockopt: fix getting IPV6_V6ONLY
	mptcp: sockopt: fix getting freebind & transparent
	mtd: Add check for devm_kcalloc()
	net: dsa: mv88e6xxx: workaround RGMII transmit delay erratum for 6320 family
	net: dsa: mv88e6xxx: fix internal PHYs for 6320 family
	mtd: Replace kcalloc() with devm_kcalloc()
	clocksource/drivers/stm32-lptimer: Use wakeup capable instead of init wakeup
	wifi: mt76: Add check for devm_kstrdup()
	wifi: mac80211: fix integer overflow in hwmp_route_info_get()
	wifi: mt76: mt7925: ensure wow pattern command align fw format
	wifi: mt76: mt7925: fix country count limitation for CLC
	wifi: mt76: mt7925: fix the wrong link_idx when a p2p_device is present
	wifi: mt76: mt7925: fix the wrong simultaneous cap for MLO
	io_uring/net: fix accept multishot handling
	io_uring/net: fix io_req_post_cqe abuse by send bundle
	io_uring/kbuf: reject zero sized provided buffers
	ASoC: codecs: wcd937x: fix a potential memory leak in wcd937x_soc_codec_probe()
	ASoC: q6apm: add q6apm_get_hw_pointer helper
	ASoC: q6apm-dai: schedule all available frames to avoid dsp under-runs
	ASoC: q6apm-dai: make use of q6apm_get_hw_pointer
	ASoC: qdsp6: q6apm-dai: set 10 ms period and buffer alignment.
	ASoC: qdsp6: q6apm-dai: fix capture pipeline overruns.
	ASoC: qdsp6: q6asm-dai: fix q6asm_dai_compr_set_params error path
	ALSA: hda/realtek: Enable Mute LED on HP OMEN 16 Laptop xd000xx
	accel/ivpu: Fix warning in ivpu_ipc_send_receive_internal()
	accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
	bus: mhi: host: Fix race between unprepare and queue_buf
	ext4: fix off-by-one error in do_split
	f2fs: fix to avoid atomicity corruption of atomic file
	vdpa/mlx5: Fix oversized null mkey longer than 32bit
	udf: Fix inode_getblk() return value
	tpm: do not start chip while suspended
	svcrdma: do not unregister device for listeners
	soc: samsung: exynos-chipid: Add NULL pointer check in exynos_chipid_probe()
	smb311 client: fix missing tcon check when mounting with linux/posix extensions
	ima: limit the number of open-writers integrity violations
	ima: limit the number of ToMToU integrity violations
	i3c: master: svc: Use readsb helper for reading MDB
	i3c: Add NULL pointer check in i3c_master_queue_ibi()
	jbd2: remove wrong sb->s_sequence check
	kbuild: exclude .rodata.(cst|str)* when building ranges
	leds: rgb: leds-qcom-lpg: Fix pwm resolution max for Hi-Res PWMs
	leds: rgb: leds-qcom-lpg: Fix calculation of best period Hi-Res PWMs
	mfd: ene-kb3930: Fix a potential NULL pointer dereference
	mailbox: tegra-hsp: Define dimensioning masks in SoC data
	locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class()
	lib: scatterlist: fix sg_split_phys to preserve original scatterlist offsets
	mptcp: fix NULL pointer in can_accept_new_subflow
	mptcp: only inc MPJoinAckHMacFailure for HMAC failures
	mtd: inftlcore: Add error check for inftl_read_oob()
	mtd: rawnand: Add status chack in r852_ready()
	arm64: mops: Do not dereference src reg for a set operation
	arm64: tegra: Remove the Orin NX/Nano suspend key
	arm64: mm: Correct the update of max_pfn
	arm64: dts: mediatek: mt8173: Fix disp-pwm compatible string
	arm64: dts: exynos: gs101: disable pinctrl_gsacore node
	backlight: led_bl: Hold led_access lock when calling led_sysfs_disable()
	btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers
	btrfs: tests: fix chunk map leak after failure to add it to the tree
	btrfs: zoned: fix zone activation with missing devices
	btrfs: zoned: fix zone finishing with missing devices
	iommufd: Fix uninitialized rc in iommufd_access_rw()
	iommu/tegra241-cmdqv: Fix warnings due to dmam_free_coherent()
	iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled
	iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes
	iommu/vt-d: Fix possible circular locking dependency
	iommu/vt-d: Wire up irq_ack() to irq_move_irq() for posted MSIs
	sparc/mm: disable preemption in lazy mmu mode
	sparc/mm: avoid calling arch_enter/leave_lazy_mmu() in set_ptes
	net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod.
	mm/damon/ops: have damon_get_folio return folio even for tail pages
	mm/rmap: reject hugetlb folios in folio_make_device_exclusive()
	mm: make page_mapped_in_vma() hugetlb walk aware
	mm: fix lazy mmu docs and usage
	mm/mremap: correctly handle partial mremap() of VMA starting at 0
	mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock
	mm/userfaultfd: fix release hang over concurrent GUP
	mm/hwpoison: do not send SIGBUS to processes with recovered clean pages
	mm/hugetlb: move hugetlb_sysctl_init() to the __init section
	mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
	sctp: detect and prevent references to a freed transport in sendmsg
	x86/xen: fix balloon target initialization for PVH dom0
	tracing: fprobe events: Fix possible UAF on modules
	tracing: Do not add length to print format in synthetic events
	thermal/drivers/rockchip: Add missing rk3328 mapping entry
	CIFS: Propagate min offload along with other parameters from primary to secondary channels.
	cifs: avoid NULL pointer dereference in dbg call
	cifs: fix integer overflow in match_server()
	cifs: Ensure that all non-client-specific reparse points are processed by the server
	clk: renesas: r9a07g043: Fix HP clock source for RZ/Five
	clk: qcom: clk-branch: Fix invert halt status bit check for votable clocks
	clk: qcom: gdsc: Release pm subdomains in reverse add order
	clk: qcom: gdsc: Capture pm_genpd_add_subdomain result code
	clk: qcom: gdsc: Set retain_ff before moving to HW CTRL
	crypto: ccp - Fix check for the primary ASP device
	crypto: ccp - Fix uAPI definitions of PSP errors
	dlm: fix error if inactive rsb is not hashed
	dlm: fix error if active rsb is not hashed
	dm-ebs: fix prefetch-vs-suspend race
	dm-integrity: set ti->error on memory allocation failure
	dm-integrity: fix non-constant-time tag verification
	dm-verity: fix prefetch-vs-suspend race
	dt-bindings: coresight: qcom,coresight-tpda: Fix too many 'reg'
	dt-bindings: coresight: qcom,coresight-tpdm: Fix too many 'reg'
	ftrace: Add cond_resched() to ftrace_graph_set_hash()
	ftrace: Properly merge notrace hashes
	gpio: tegra186: fix resource handling in ACPI probe path
	gpio: zynq: Fix wakeup source leaks on device unbind
	gve: handle overflow when reporting TX consumed descriptors
	KVM: Allow building irqbypass.ko as as module when kvm.ko is a module
	KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests
	KVM: x86: Explicitly zero-initialize on-stack CPUID unions
	KVM: x86: Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
	landlock: Move code to ease future backports
	landlock: Add the errata interface
	landlock: Add erratum for TCP fix
	landlock: Always allow signals between threads of the same process
	landlock: Prepare to add second errata
	selftests/landlock: Split signal_scoping_threads tests
	selftests/landlock: Add a new test for setuid()
	misc: pci_endpoint_test: Fix displaying 'irq_type' after 'request_irq' error
	net: mana: Switch to page pool for jumbo frames
	ntb: use 64-bit arithmetic for the MSI doorbell mask
	of/irq: Fix device node refcount leakage in API of_irq_parse_one()
	of/irq: Fix device node refcount leakage in API of_irq_parse_raw()
	of/irq: Fix device node refcount leakages in of_irq_count()
	of/irq: Fix device node refcount leakage in API irq_of_parse_and_map()
	of/irq: Fix device node refcount leakages in of_irq_init()
	PCI: brcmstb: Fix missing of_node_put() in brcm_pcie_probe()
	PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
	PCI: pciehp: Avoid unnecessary device replacement check
	PCI: Fix reference leak in pci_alloc_child_bus()
	PCI: Fix reference leak in pci_register_host_bridge()
	PCI: Fix wrong length of devres array
	phy: freescale: imx8m-pcie: assert phy reset and perst in power off
	pinctrl: qcom: Clear latched interrupt status when changing IRQ type
	pinctrl: samsung: add support for eint_fltcon_offset
	ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
	s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
	s390/virtio_ccw: Don't allocate/assign airqs for non-existing queues
	s390: Fix linker error when -no-pie option is unavailable
	sched_ext: create_dsq: Return -EEXIST on duplicate request
	selftests: mptcp: close fd_in before returning in main_loop
	selftests: mptcp: fix incorrect fd checks in main_loop
	thermal/drivers/mediatek/lvts: Disable monitor mode during suspend
	thermal/drivers/mediatek/lvts: Disable Stage 3 thermal threshold
	arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
	iommufd: Make attach_handle generic than fault specific
	iommufd: Fail replace if device has not been attached
	x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
	ACPI: platform-profile: Fix CFI violation when accessing sysfs files
	NFSD: fix decoding in nfs4_xdr_dec_cb_getattr
	NFSD: Fix CB_GETATTR status fix
	nfsd: don't ignore the return code of svc_proc_register()
	x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions()
	libbpf: Prevent compiler warnings/errors
	kbuild: Add '-fno-builtin-wcslen'
	media: mediatek: vcodec: mark vdec_vp9_slice_map_counts_eob_coef noinline
	Bluetooth: hci_uart: Fix another race during initialization
	s390/cpumf: Fix double free on error in cpumf_pmu_event_init()
	HSI: ssi_protocol: Fix use after free vulnerability in ssi_protocol Driver Due to Race Condition
	Linux 6.12.24

Change-Id: I272e8aac67399f2eb57ca25e05cded24172d2d76
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2025-05-10 12:51:04 +00:00
Sean Christopherson fae0a8796c KVM: Allow building irqbypass.ko as as module when kvm.ko is a module
commit 459a35111b0a890172a78d51c01b204e13a34a18 upstream.

Convert HAVE_KVM_IRQ_BYPASS into a tristate so that selecting
IRQ_BYPASS_MANAGER follows KVM={m,y}, i.e. doesn't force irqbypass.ko to
be built-in.

Note, PPC allows building KVM as a module, but selects HAVE_KVM_IRQ_BYPASS
from a boolean Kconfig, i.e. KVM PPC unnecessarily forces irqbpass.ko to
be built-in.  But that flaw is a longstanding PPC specific issue.

Fixes: 61df71ee99 ("kvm: move "select IRQ_BYPASS_MANAGER" to common code")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250315024623.2363994-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20 10:15:54 +02:00
Mostafa Saleh b0fccd926d ANDROID: kvm/vfio: pviommu: Add set config IOCTL
Add KVM_PVIOMMU_SET_CONFIG that allows user space to map an index in
the device SID to a virtual SID.

Bug: 357781595
Bug: 348382247
Bug: 236685427
Change-Id: I276c504573ed2564becd572e57a0b2710a1cf147
Signed-off-by: Mostafa Saleh <smostafa@google.com>
2025-04-16 13:39:54 +00:00
Mostafa Saleh e581022df4 ANDROID: kvm/vfio: Add pviommu group and attach/get attrs
pvIOMMU would configure a paravirtualised IOMMU exposed to guests
that is controlled by the hypervisor.

This pvIOMMU would be used with VFIO assigned devices. As this
requires both KVM and VFIO interaction, it was added in KVM-VFIO
device.

This patch adds pviommu_attach operation which attaches a pvIOMMU
to a VM.

pviommu_attach would use the KVM VM for this device and return a
fd for the guest pvIOMMU.

It also adds attribute KVM_DEV_VFIO_PVIOMMU_GET_INFO to probe info about
iommu for a VFIO device fd, for now only number of SIDs is returned.
This can be used by the VMM to then configure the virtual SID (vSID) for
the IOMMU IDs.

Bug: 357781595
Bug: 348382247
Bug: 236685427
Change-Id: I030f04e84fa8fd1c54c2d035f5c09f021f648d22
Signed-off-by: Mostafa Saleh <smostafa@google.com>
2025-04-16 13:39:54 +00:00
Mostafa Saleh 89f90d9773 ANDROID: KVM: Add arch function for device assignment
Add arch specific functions to notify KVM about device/group assignment
to KVM.
pKVM would use this to assign devices at once, as this is a requirement
for pVM device passthrough, that all devices have to be donated first
before the VM using them.
That allow the hypervisor to properly reset the device, and prevent
situations where a devices is not fully owned by pVM.

Bug: 357781595
Bug: 348382247
Change-Id: Icedc6b72eff713702361d424c702eb6c395267dd
Signed-off-by: Mostafa Saleh <smostafa@google.com>
2025-02-06 05:16:47 -08:00
Greg Kroah-Hartman 144a77d178 Merge 6.12.2 into android16-6.12
GKI (arm64) relevant 159 out of 815 changes, affecting 226 files +2434/-822
  a11e7e3d2a arm64: probes: Disable kprobes/uprobes on MOPS instructions [2 files, +6/-2]
  78de864e8a block/fs: Pass an iocb to generic_atomic_write_valid() [3 files, +7/-7]
  cfe3e04e9a fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() [3 files, +19/-14]
  63dfd728b3 brd: defer automatic disk creation until module initialization succeeds [1 file, +44/-22]
  26cc5063e3 ext4: avoid remount errors with 'abort' mount option [1 file, +8/-3]
  fb83b093f7 initramfs: avoid filename buffer overrun [1 file, +15/-0]
  de56fa09a0 arm64: Expose ID_AA64ISAR1_EL1.XS to sanitised feature consumers [1 file, +1/-0]
  582d9ed999 nvme-pci: fix freeing of the HMB descriptor table [1 file, +9/-7]
  8ae5e37357 loop: fix type of block size [1 file, +3/-3]
  7044259018 block: take chunk_sectors into account in bio_split_write_zeroes [1 file, +23/-12]
  731d5bdc74 block: fix bio_split_rw_at to take zone_write_granularity into account [1 file, +9/-1]
  61832ee7fa ext4: fix race in buffer_head read fault injection [10 files, +29/-29]
  9abae59243 nvme-pci: reverse request order in nvme_queue_rqs [1 file, +17/-22]
  475404eac5 virtio_blk: reverse request order in virtio_queue_rqs [1 file, +21/-25]
  d01d9005fb thermal: core: Initialize thermal zones before registering them [1 file, +1/-1]
  933ef9360a thermal: core: Rearrange PM notification code [1 file, +46/-42]
  f52dc3c757 thermal: core: Represent suspend-related thermal zone flags as bits [2 files, +12/-10]
  39a5ad6b63 thermal: core: Mark thermal zones as initializing to start with [2 files, +14/-3]
  79cb9952e1 thermal: core: Fix race between zone registration and system suspend [1 file, +16/-2]
  5ced426d97 rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu [1 file, +12/-2]
  94f4e7b0eb cleanup: Remove address space of returned pointer [1 file, +2/-2]
  5c3a9f6f7f time: Partially revert cleanup on msecs_to_jiffies() documentation [1 file, +1/-1]
  b62a8825d3 time: Fix references to _msecs_to_jiffies() handling of values [2 files, +2/-2]
  1dfa6c5200 timers: Add missing READ_ONCE() in __run_timer_base() [1 file, +2/-1]
  8dbd7603e4 kcsan, seqlock: Fix incorrect assumption in read_seqbegin() [1 file, +1/-11]
  0916201308 sched/ext: Remove sched_fork() hack [2 files, +1/-7]
  351bb7f9ec soc: qcom: geni-se: fix array underflow in geni_se_clk_tbl_get() [1 file, +2/-1]
  c365c1456e efi/libstub: fix efi_parse_options() ignoring the default command line [1 file, +1/-1]
  30e42ac0bd tpm: fix signed/unsigned bug when checking event logs [1 file, +9/-8]
  dd6ade970d Revert "cgroup: Fix memory leak caused by missing cgroup_bpf_offline" [1 file, +1/-3]
  f390525e49 cgroup/bpf: only cgroup v2 can be attached by bpf programs [1 file, +11/-6]
  a5b7bc2747 regmap: irq: Set lockdep class for hierarchical IRQ domains [1 file, +4/-0]
  9beaff47bc firmware: arm_scpi: Check the DVFS OPP count returned by the firmware [1 file, +3/-0]
  e369246067 pwm: Assume a disabled PWM to emit a constant inactive output [1 file, +7/-3]
  f9aaa841ac drm/mm: Mark drm_mm_interval_tree*() functions with __maybe_unused [1 file, +1/-1]
  4d8621151b bpf, arm64: Remove garbage frame for struct_ops trampoline [1 file, +31/-16]
  aa51be3fa9 bpf: Tighten tail call checks for lingering locks, RCU, preempt_disable [1 file, +15/-0]
  3634d4a310 bpf: Mark raw_tp arguments with PTR_MAYBE_NULL [4 files, +87/-9]
  e883c475c4 drm: use ATOMIC64_INIT() for atomic64_t [1 file, +1/-1]
  bc23584768 netlink: typographical error in nlmsg_type constants definition [1 file, +1/-1]
  ce06c450ac bpf, sockmap: Several fixes to bpf_msg_push_data [1 file, +33/-20]
  275a9f3ef8 bpf, sockmap: Several fixes to bpf_msg_pop_data [1 file, +9/-6]
  08baa3f0a1 bpf, sockmap: Fix sk_msg_reset_curr [1 file, +9/-11]
  0e4c6faaef ipv6: release nexthop on device removal [1 file, +3/-3]
  36ede57f0c bpf: Allow return values 0 and 1 for kprobe session [1 file, +9/-0]
  9ff1b95ccd bpf: Force uprobe bpf program to always return 0 [1 file, +2/-3]
  34a949e7a0 ipv6: Fix soft lockups in fib6_select_path under high next hop churn [4 files, +297/-19]
  9c44c06123 bpf: Use function pointers count as struct_ops links count [1 file, +25/-10]
  449b1a7178 bpf: Add kernel symbol for struct_ops trampoline [4 files, +89/-5]
  3397001cfa Bluetooth: btbcm: fix missing of_node_put() in btbcm_get_board_name() [1 file, +1/-3]
  a58d0f5dac Bluetooth: ISO: Use kref to track lifetime of iso_conn [1 file, +71/-17]
  67ead8f86a Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pending [4 files, +139/-40]
  91d19383b7 Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending [5 files, +125/-16]
  1360e5b6ce Bluetooth: ISO: Send BIG Create Sync via hci_sync [2 files, +25/-1]
  7b277bd569 Bluetooth: fix use-after-free in device_for_each_child() [1 file, +4/-11]
  d5d346deb6 xsk: Free skb when TX metadata options are invalid [1 file, +6/-5]
  5036f2f024 erofs: fix file-backed mounts over FUSE [2 files, +7/-4]
  679d8537e5 erofs: fix blksize < PAGE_SIZE for file-backed mounts [1 file, +5/-1]
  daaf68fef4 erofs: handle NONHEAD !delta[1] lclusters gracefully [1 file, +9/-8]
  6b8b9d9b06 netpoll: Use rcu_access_pointer() in netpoll_poll_lock [1 file, +1/-1]
  f84c5ef6ca bpf: fix recursive lock when verdict program return SK_PASS [1 file, +2/-2]
  89933f8ab3 unicode: Fix utf8_load() error path [1 file, +1/-1]
  b746a3afa2 trace/trace_event_perf: remove duplicate samples on the first tracepoint event [1 file, +6/-0]
  e691826a3d kasan: move checks to do_strncpy_from_user [1 file, +3/-2]
  843d366ff1 zram: fix NULL pointer in comp_algorithm_show() [1 file, +2/-3]
  fe6fae61f3 PCI: Fix reset_method_store() memory leak [1 file, +3/-2]
  a26ace3c6e f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks [1 file, +1/-1]
  9e11b1d5fd f2fs: fix null-ptr-deref in f2fs_submit_page_bio() [1 file, +6/-6]
  2bc07714dc virtiofs: use pages instead of pointer for kernel direct IO [3 files, +50/-19]
  c31c7b81c1 i3c: master: Remove i3c_dev_disable_ibi_locked(olddev) on device hotjoin [1 file, +9/-4]
  9b57a3f7e3 f2fs: fix the wrong f2fs_bug_on condition in f2fs_do_replace_block [1 file, +1/-1]
  abfd2c13ba f2fs: check curseg->inited before write_sum_page in change_curseg [1 file, +2/-1]
  5e69964251 f2fs: Fix not used variable 'index' [1 file, +2/-2]
  801092a2c9 f2fs: fix to avoid potential deadlock in f2fs_record_stop_reason() [3 files, +9/-9]
  a16e6f7966 PCI: qcom: Enable MSI interrupts together with Link up if 'Global IRQ' is supported [1 file, +3/-1]
  c631207897 f2fs: fix race in concurrent f2fs_stop_gc_thread [1 file, +6/-3]
  a10cc033b2 f2fs: fix to map blocks correctly for direct write [1 file, +2/-1]
  bbe48e47b9 f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode [1 file, +5/-1]
  45438da843 f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow [2 files, +4/-4]
  f7f7f22dbb power: supply: core: Remove might_sleep() from power_supply_put() [1 file, +0/-2]
  28af028a71 netlink: fix false positive warning in extack during dumps [1 file, +11/-10]
  ad6e5bdca3 exfat: fix file being changed by unaligned direct write [1 file, +10/-0]
  a487cc8986 net/l2tp: fix warning in l2tp_exit_net found by syzbot [1 file, +19/-3]
  cb74207ef9 net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged [1 file, +29/-12]
  87819234aa Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync [1 file, +9/-2]
  cac34e4428 Bluetooth: MGMT: Fix possible deadlocks [1 file, +18/-9]
  6d84502860 tcp: Fix use-after-free of nreq in reqsk_timer_handler(). [1 file, +1/-1]
  ead7c1263e ip6mr: fix tables suspicious RCU usage [1 file, +27/-11]
  14367e7170 usb: gadget: uvc: wake pump everytime we update the free list [1 file, +4/-0]
  f380f895db firmware_loader: Fix possible resource leak in fw_log_firmware_info() [1 file, +2/-3]
  98b8725630 f2fs: fix fiemap failure issue when page size is 16KB [1 file, +1/-21]
  d716851d0a net_sched: sch_fq: don't follow the fast path if Tx is behind now [1 file, +6/-0]
  b521b53ac6 ALSA: usb-audio: Fix potential out-of-bound accesses for Extigy and Mbox devices [1 file, +21/-6]
  096bb5b43e ALSA: usb-audio: Fix out of bounds reads when finding clock sources [1 file, +23/-1]
  ec56ada623 ext4: supress data-race warnings in ext4_free_inodes_{count,set}() [1 file, +4/-4]
  ad34d9c738 ext4: fix FS_IOC_GETFSMAP handling [3 files, +68/-5]
  f594a5683a KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR [1 file, +6/-1]
  8b6916f4cf KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status [1 file, +0/-1]
  d0571c3add KVM: arm64: Don't retire aborted MMIO instruction [1 file, +30/-2]
  eabd7ef140 KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE [1 file, +5/-1]
  fe425d5239 KVM: arm64: Get rid of userspace_irqchip_in_use [3 files, +4/-19]
  46018a04c5 KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* [2 files, +31/-12]
  1059f1e5f5 KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device [1 file, +4/-2]
  69d2ceac11 PCI: Fix use-after-free of slot->bus on hot remove [1 file, +3/-1]
  a6b283526b fsnotify: fix sending inotify event with unexpected filename [1 file, +13/-10]
  83af1cfa10 fsnotify: Fix ordering of iput() and watched_objects decrement [1 file, +9/-3]
  53bbfa6896 tty: ldsic: fix tty_ldisc_autoload sysctl's proc_handler [1 file, +1/-1]
  88232a223a locking/lockdep: Avoid creating new name string literals in lockdep_set_subclass() [1 file, +1/-1]
  a4fc6966d8 fcntl: make F_DUPFD_QUERY associative [1 file, +3/-0]
  c500b0cca2 exfat: fix uninit-value in __exfat_get_dentry_set [1 file, +1/-0]
  3ddd1cb2b4 exfat: fix out-of-bounds access of directory entries [1 file, +16/-4]
  9258c9ed32 xhci: Fix control transfer error on Etron xHCI host [1 file, +14/-0]
  218796e95a xhci: Combine two if statements for Etron xHCI host [1 file, +2/-6]
  a92cd42097 xhci: Don't perform Soft Retry for Etron xHCI host [1 file, +1/-0]
  827f963a0b xhci: Don't issue Reset Device command to Etron xHCI host [3 files, +21/-0]
  33209e6f29 Bluetooth: Fix type of len in rfcomm_sock_getsockopt{,_old}() [1 file, +6/-4]
  2637e8c5fb usb: xhci: Limit Stop Endpoint retries [3 files, +27/-4]
  61bce2d8ff usb: xhci: Fix TD invalidation under pending Set TR Dequeue [1 file, +13/-5]
  6debdd82c3 usb: xhci: Avoid queuing redundant Stop Endpoint commands [3 files, +29/-4]
  13111945c2 Revert "fs: don't block i_writecount during exec" [5 files, +49/-14]
  0586f12896 Revert "f2fs: remove unreachable lazytime mount option parsing" [1 file, +10/-0]
  ca82b37c47 Revert "usb: gadget: composite: fix OS descriptors w_value logic" [1 file, +15/-3]
  d222fd21dd io_uring: fix corner case forgetting to vunmap [1 file, +3/-1]
  aaa90844af io_uring: check for overflows in io_pin_pages [1 file, +6/-1]
  37c2ca4e89 blk-settings: round down io_opt to physical_block_size [1 file, +7/-0]
  97b68bda72 spi: Fix acpi deferred irq probe [1 file, +10/-3]
  e881c8f0a7 serial: amba-pl011: Fix RX stall when DMA is used [1 file, +5/-0]
  1fae444a61 serial: amba-pl011: fix build regression [1 file, +2/-0]
  7baf942326 Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()" [2 files, +5/-2]
  43375e9bb7 block: Prevent potential deadlock in blk_revalidate_disk_zones() [1 file, +10/-4]
  4ef8b6f7c4 ublk: fix ublk_ch_mmap() for 64K page size [1 file, +12/-3]
  e4d1f38bc0 arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled [1 file, +1/-1]
  8b25c0a165 block: fix missing dispatching request when queue is started or unquiesced [1 file, +2/-0]
  2094bd1b52 block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding [1 file, +34/-13]
  4170346747 block: fix ordering between checking BLK_MQ_S_STOPPED request adding [2 files, +19/-0]
  aeb420ebdf blk-mq: Make blk_mq_quiesce_tagset() hold the tag list mutex less long [1 file, +2/-1]
  28d4191e19 HID: wacom: Interpret tilt data from Intuos Pro BT as signed values [1 file, +2/-2]
  343e3e903c netdev-genl: Hold rcu_read_lock in napi_get [1 file, +2/-0]
  df225df839 ALSA: rawmidi: Fix kvfree() call in spinlock [1 file, +3/-1]
  0c4c9bf5ea ALSA: pcm: Add sanity NULL check for the default mmap fault handler [1 file, +4/-2]
  068aab9564 usb: dwc3: ep0: Don't clear ep0 DWC3_EP_TRANSFER_STARTED [1 file, +1/-1]
  2f6c3acece usb: dwc3: gadget: Fix checking for number of TRBs left [1 file, +6/-3]
  70777a23a5 usb: dwc3: gadget: Fix looping of queued SG entries [1 file, +3/-3]
  31e45c09a8 ublk: fix error code for unsupported command [1 file, +1/-1]
  9517bc76ff lib: string_helpers: silence snprintf() output truncation warning [1 file, +1/-1]
  0a5c8b3fbf f2fs: fix to do sanity check on node blkaddr in truncate_node() [1 file, +10/-0]
  e77bce0a8c rtc: check if __rtc_read_time was successful in rtc_timer_do_work() [1 file, +6/-1]
  4eaa19c620 nvme/multipath: Fix RCU list traversal to use SRCU primitive [1 file, +14/-7]
  9c3d53f113 blk-mq: add non_owner variant of start_freeze/unfreeze queue APIs [2 files, +22/-0]
  a6fc2ba1c7 block: model freeze & enter queue as lock for supporting lockdep [5 files, +81/-13]
  61092568f2 block: fix uaf for flush rq while iterating tags [2 files, +5/-10]
  5e15cc7a1d block: return unsigned int from bdev_io_min [1 file, +1/-1]
  5416b76a81 nvme-fabrics: fix kernel crash while shutting down controller [1 file, +5/-0]
  d5457349e5 block: Don't allow an atomic write be truncated in blkdev_write_iter() [1 file, +4/-1]
  e70c21daad modpost: remove incorrect code in do_eisa_entry() [1 file, +1/-4]
  01a853faae block, bfq: fix bfqq uaf in bfq_limit_depth() [1 file, +24/-13]
  fbc342372a brd: decrease the number of allocated pages which discarded [1 file, +3/-1]
  b12cfcae8a block: always verify unfreeze lock on the owner task [4 files, +61/-10]
  6cea47849d block: don't verify IO lock for freeze/unfreeze in elevator_init_mq() [1 file, +8/-2]

Changes in 6.12.2
	MAINTAINERS: appoint myself the XFS maintainer for 6.12 LTS
	drm/amd/display: Skip Invalid Streams from DSC Policy
	drm/amd/display: Fix incorrect DSC recompute trigger
	s390/facilities: Fix warning about shadow of global variable
	s390/virtio_ccw: Fix dma_parm pointer not set up
	efs: fix the efs new mount api implementation
	arm64: probes: Disable kprobes/uprobes on MOPS instructions
	kselftest/arm64: hwcap: fix f8dp2 cpuinfo name
	kselftest/arm64: mte: fix printf type warnings about __u64
	kselftest/arm64: mte: fix printf type warnings about longs
	block/fs: Pass an iocb to generic_atomic_write_valid()
	fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
	s390/cio: Do not unregister the subchannel based on DNV
	s390/pageattr: Implement missing kernel_page_present()
	x86/pvh: Call C code via the kernel virtual mapping
	brd: defer automatic disk creation until module initialization succeeds
	ext4: avoid remount errors with 'abort' mount option
	mips: asm: fix warning when disabling MIPS_FP_SUPPORT
	s390/cpum_sf: Fix and protect memory allocation of SDBs with mutex
	initramfs: avoid filename buffer overrun
	arm64: Expose ID_AA64ISAR1_EL1.XS to sanitised feature consumers
	kselftest/arm64: Fix encoding for SVE B16B16 test
	nvme-pci: fix freeing of the HMB descriptor table
	m68k: mvme147: Fix SCSI controller IRQ numbers
	m68k: mvme147: Reinstate early console
	arm64: fix .data.rel.ro size assertion when CONFIG_LTO_CLANG
	acpi/arm64: Adjust error handling procedure in gtdt_parse_timer_block()
	loop: fix type of block size
	cachefiles: Fix incorrect length return value in cachefiles_ondemand_fd_write_iter()
	cachefiles: Fix missing pos updates in cachefiles_ondemand_fd_write_iter()
	cachefiles: Fix NULL pointer dereference in object->file
	netfs/fscache: Add a memory barrier for FSCACHE_VOLUME_CREATING
	block: take chunk_sectors into account in bio_split_write_zeroes
	block: fix bio_split_rw_at to take zone_write_granularity into account
	s390/syscalls: Avoid creation of arch/arch/ directory
	hfsplus: don't query the device logical block size multiple times
	ext4: fix race in buffer_head read fault injection
	nvme-pci: reverse request order in nvme_queue_rqs
	virtio_blk: reverse request order in virtio_queue_rqs
	crypto: mxs-dcp - Fix AES-CBC with hardware-bound keys
	crypto: caam - Fix the pointer passed to caam_qi_shutdown()
	crypto: qat - remove check after debugfs_create_dir()
	crypto: qat/qat_420xx - fix off by one in uof_get_name()
	crypto: qat/qat_4xxx - fix off by one in uof_get_name()
	firmware: google: Unregister driver_info on failure
	EDAC/bluefield: Fix potential integer overflow
	crypto: qat - remove faulty arbiter config reset
	thermal: core: Initialize thermal zones before registering them
	thermal: core: Rearrange PM notification code
	thermal: core: Represent suspend-related thermal zone flags as bits
	thermal: core: Mark thermal zones as initializing to start with
	thermal: core: Fix race between zone registration and system suspend
	EDAC/fsl_ddr: Fix bad bit shift operations
	EDAC/skx_common: Differentiate memory error sources
	EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
	crypto: pcrypt - Call crypto layer directly when padata_do_parallel() return -EBUSY
	crypto: cavium - Fix the if condition to exit loop after timeout
	cpufreq/amd-pstate: Don't update CPPC request in amd_pstate_cpu_boost_update()
	amd-pstate: Set min_perf to nominal_perf for active mode performance gov
	crypto: hisilicon/qm - disable same error report before resetting
	EDAC/igen6: Avoid segmentation fault on module unload
	crypto: qat - Fix missing destroy_workqueue in adf_init_aer()
	crypto: inside-secure - Fix the return value of safexcel_xcbcmac_cra_init()
	sched/cpufreq: Ensure sd is rebuilt for EAS check
	doc: rcu: update printed dynticks counter bits
	rcu/srcutiny: don't return before reenabling preemption
	rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu
	rcu/nocb: Fix missed RCU barrier on deoffloading
	hwmon: (pmbus/core) clear faults after setting smbalert mask
	hwmon: (nct6775-core) Fix overflows seen when writing limit attributes
	ACPI: CPPC: Fix _CPC register setting issue
	thermal: testing: Use DEFINE_FREE() and __free() to simplify code
	thermal: testing: Initialize some variables annoteded with _free()
	crypto: caam - add error check to caam_rsa_set_priv_key_form
	crypto: bcm - add error check in the ahash_hmac_init function
	crypto: cavium - Fix an error handling path in cpt_ucode_load_fw()
	rcuscale: Do a proper cleanup if kfree_scale_init() fails
	tools/lib/thermal: Make more generic the command encoding function
	thermal/lib: Fix memory leak on error in thermal_genl_auto()
	x86/unwind/orc: Fix unwind for newly forked tasks
	Revert "scripts/faddr2line: Check only two symbols when calculating symbol size"
	cleanup: Remove address space of returned pointer
	time: Partially revert cleanup on msecs_to_jiffies() documentation
	time: Fix references to _msecs_to_jiffies() handling of values
	timers: Add missing READ_ONCE() in __run_timer_base()
	locking/atomic/x86: Use ALT_OUTPUT_SP() for __alternative_atomic64()
	locking/atomic/x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu()
	kcsan, seqlock: Support seqcount_latch_t
	kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
	sched/ext: Remove sched_fork() hack
	locking/rt: Add sparse annotation PREEMPT_RT's sleeping locks.
	rust: helpers: Avoid raw_spin_lock initialization for PREEMPT_RT
	clocksource/drivers:sp804: Make user selectable
	clocksource/drivers/timer-ti-dm: Fix child node refcount handling
	irqchip/riscv-aplic: Prevent crash when MSI domain is missing
	regulator: qcom-smd: make smd_vreg_rpm static
	spi: spi-fsl-lpspi: Use IRQF_NO_AUTOEN flag in request_irq()
	arm64: dts: qcom: qcs6390-rb3gen2: use modem.mbn for modem DSP
	ARM: dts: renesas: genmai: Fix partition size for QSPI NOR Flash
	drivers: soc: xilinx: add the missing kfree in xlnx_add_cb_for_suspend()
	microblaze: Export xmb_manager functions
	arm64: dts: mediatek: mt8188: Fix wrong clock provider in MFG1 power domain
	arm64: dts: mediatek: mt8395-genio-1200-evk: Fix dtbs_check error for phy
	arm64: dts: mt8195: Fix dtbs_check error for mutex node
	arm64: dts: mt8195: Fix dtbs_check error for infracfg_ao node
	arm64: dts: mediatek: mt8183-kukui: Disable DPI display interface
	arm64: dts: mt8183: Add port node to dpi node
	soc: ti: smartreflex: Use IRQF_NO_AUTOEN flag in request_irq()
	soc: qcom: geni-se: fix array underflow in geni_se_clk_tbl_get()
	arm64: dts: qcom: sm6350: Fix GPU frequencies missing on some speedbins
	arm64: dts: qcom: sda660-ifc6560: fix l10a voltage ranges
	ARM: dts: microchip: sam9x60: Add missing property atmel,usart-mode
	mmc: mmc_spi: drop buggy snprintf()
	scripts/kernel-doc: Do not track section counter across processed files
	arm64: dts: qcom: x1e80100-slim7x: Drop orientation-switch from USB SS[0-1] QMP PHYs
	arm64: dts: qcom: x1e80100-vivobook-s15: Drop orientation-switch from USB SS[0-1] QMP PHYs
	openrisc: Implement fixmap to fix earlycon
	efi/libstub: fix efi_parse_options() ignoring the default command line
	tpm: fix signed/unsigned bug when checking event logs
	media: i2c: max96717: clean up on error in max96717_subdev_init()
	media: i2c: vgxy61: Fix an error handling path in vgxy61_detect()
	media: i2c: ds90ub960: Fix missing return check on ub960_rxport_read call
	arm64: dts: mt8183: krane: Fix the address of eeprom at i2c4
	arm64: dts: mt8183: kukui: Fix the address of eeprom at i2c4
	arm64: dts: qcom: x1e80100: Resize GIC Redistributor register region
	kernel-doc: allow object-like macros in ReST output
	arm64: dts: ti: k3-am62x-phyboard-lyra: Drop unnecessary McASP AFIFOs
	gpio: sloppy-logic-analyzer remove reference to rcu_momentary_dyntick_idle()
	arm64: dts: mediatek: mt8173-elm-hana: Add vdd-supply to second source trackpad
	arm64: dts: mediatek: mt8188: Fix USB3 PHY port default status
	arm64: dts: mediatek: mt8195-cherry: Use correct audio codec DAI
	Revert "cgroup: Fix memory leak caused by missing cgroup_bpf_offline"
	cgroup/bpf: only cgroup v2 can be attached by bpf programs
	regulator: rk808: Restrict DVS GPIOs to the RK808 variant only
	power: sequencing: make the QCom PMU pwrseq driver depend on CONFIG_OF
	arm64: tegra: p2180: Add mandatory compatible for WiFi node
	arm64: dts: rockchip: Remove 'enable-active-low' from two boards
	arm64: dts: mt8183: fennel: add i2c2's i2c-scl-internal-delay-ns
	arm64: dts: mt8183: burnet: add i2c2's i2c-scl-internal-delay-ns
	arm64: dts: mt8183: cozmo: add i2c2's i2c-scl-internal-delay-ns
	arm64: dts: mt8183: Damu: add i2c2's i2c-scl-internal-delay-ns
	pwm: imx27: Workaround of the pwm output bug when decrease the duty cycle
	ARM: dts: cubieboard4: Fix DCDC5 regulator constraints
	arm64: dts: ti: k3-j7200: Fix register map for main domain pmx
	arm64: dts: ti: k3-j7200: Fix clock ids for MCSPI instances
	arm64: dts: ti: k3-j721e: Fix clock IDs for MCSPI instances
	arm64: dts: ti: k3-j721s2: Fix clock IDs for MCSPI instances
	watchdog: Add HAS_IOPORT dependency for SBC8360 and SBC7240
	arm64: dts: qcom: x1e80100: Update C4/C5 residency/exit numbers
	dt-bindings: cache: qcom,llcc: Fix X1E80100 reg entries
	of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
	pmdomain: ti-sci: Add missing of_node_put() for args.np
	spi: tegra210-quad: Avoid shift-out-of-bounds
	spi: zynqmp-gqspi: Undo runtime PM changes at driver exit time​
	regmap: irq: Set lockdep class for hierarchical IRQ domains
	arm64: dts: renesas: hihope: Drop #sound-dai-cells
	arm64: dts: imx8mn-tqma8mqnl-mba8mx-usbot: fix coexistence of output-low and output-high in GPIO
	arm64: dts: mediatek: mt6358: fix dtbs_check error
	arm64: dts: mediatek: mt8183-kukui-jacuzzi: Fix DP bridge supply names
	arm64: dts: mediatek: mt8183-kukui-jacuzzi: Add supplies for fixed regulators
	selftests/resctrl: Print accurate buffer size as part of MBM results
	selftests/resctrl: Fix memory overflow due to unhandled wraparound
	selftests/resctrl: Protect against array overrun during iMC config parsing
	firmware: arm_scpi: Check the DVFS OPP count returned by the firmware
	media: ipu6: Fix DMA and physical address debugging messages for 32-bit
	media: ipu6: not override the dma_ops of device in driver
	media: ipu6: remove architecture DMA ops dependency in Kconfig
	pwm: Assume a disabled PWM to emit a constant inactive output
	media: atomisp: Add check for rgby_data memory allocation failure
	arm64: dts: rockchip: correct analog audio name on Indiedroid Nova
	sched_ext: scx_bpf_dispatch_from_dsq_set_*() are allowed from unlocked context
	HID: hyperv: streamline driver probe to avoid devres issues
	platform/x86: asus-wmi: Fix inconsistent use of thermal policies
	platform/x86/intel/pmt: allow user offset for PMT callbacks
	platform/x86: panasonic-laptop: Return errno correctly in show callback
	drm/imagination: Convert to use time_before macro
	drm/imagination: Use pvr_vm_context_get()
	drm/mm: Mark drm_mm_interval_tree*() functions with __maybe_unused
	drm/vc4: hvs: Don't write gamma luts on 2711
	drm/vc4: hdmi: Avoid hang with debug registers when suspended
	drm/vc4: hvs: Fix dlist debug not resetting the next entry pointer
	drm/vc4: hvs: Remove incorrect limit from hvs_dlist debugfs function
	drm/vc4: hvs: Correct logic on stopping an HVS channel
	wifi: ath9k: add range check for conn_rsp_epid in htc_connect_service()
	drm/omap: Fix possible NULL dereference
	drm/omap: Fix locking in omap_gem_new_dmabuf()
	drm/v3d: Appease lockdep while updating GPU stats
	wifi: p54: Use IRQF_NO_AUTOEN flag in request_irq()
	wifi: mwifiex: Use IRQF_NO_AUTOEN flag in request_irq()
	udmabuf: change folios array from kmalloc to kvmalloc
	udmabuf: fix vmap_udmabuf error page set
	drm/imx/dcss: Use IRQF_NO_AUTOEN flag in request_irq()
	drm/imx/ipuv3: Use IRQF_NO_AUTOEN flag in request_irq()
	drm/panel: nt35510: Make new commands optional
	drm/v3d: Address race-condition in MMU flush
	drm/v3d: Flush the MMU before we supply more memory to the binner
	drm/amdgpu: Fix JPEG v4.0.3 register write
	wifi: ath10k: fix invalid VHT parameters in supported_vht_mcs_rate_nss1
	wifi: ath10k: fix invalid VHT parameters in supported_vht_mcs_rate_nss2
	wifi: ath12k: Skip Rx TID cleanup for self peer
	dt-bindings: vendor-prefixes: Add NeoFidelity, Inc
	ASoC: fsl_micfil: fix regmap_write_bits usage
	ASoC: dt-bindings: mt6359: Update generic node name and dmic-mode
	drm/amdgpu/gfx9: Add Cleaner Shader Deinitialization in gfx_v9_0 Module
	ASoC: fsl-asoc-card: Add missing handling of {hp,mic}-dt-gpios
	drm/bridge: anx7625: Drop EDID cache on bridge power off
	drm/bridge: it6505: Drop EDID cache on bridge power off
	libbpf: Fix expected_attach_type set handling in program load callback
	libbpf: Fix output .symtab byte-order during linking
	selftests/bpf: Fix uprobe_multi compilation error
	dlm: fix swapped args sb_flags vs sb_status
	wifi: rtl8xxxu: Perform update_beacon_work when beaconing is enabled
	ASoC: amd: acp: fix for inconsistent indenting
	ASoC: amd: acp: fix for cpu dai index logic
	drm/amd/display: fix a memleak issue when driver is removed
	wifi: ath12k: fix use-after-free in ath12k_dp_cc_cleanup()
	wifi: ath12k: fix one more memcpy size error
	libbpf: Add missing per-arch include path
	selftests: bpf: Add missing per-arch include path
	bpf: Fix the xdp_adjust_tail sample prog issue
	selftests/bpf: Fix backtrace printing for selftests crashes
	wifi: ath11k: Fix CE offset address calculation for WCN6750 in SSR
	selftests/bpf: add missing header include for htons
	wifi: cfg80211: check radio iface combination for multi radio per wiphy
	ice: consistently use q_idx in ice_vc_cfg_qs_msg()
	drm/vc4: hdmi: Increase audio MAI fifo dreq threshold
	drm/vc4: Introduce generation number enum
	drm/vc4: Match drm_dev_enter and exit calls in vc4_hvs_lut_load
	drm/vc4: Match drm_dev_enter and exit calls in vc4_hvs_atomic_flush
	drm/vc4: Correct generation check in vc4_hvs_lut_load
	libbpf: fix sym_is_subprog() logic for weak global subprogs
	accel/ivpu: Prevent recovery invocation during probe and resume
	ASoC: rt722-sdca: Remove logically deadcode in rt722-sdca.c
	libbpf: never interpret subprogs in .text as entry programs
	netdevsim: copy addresses for both in and out paths
	drm/bridge: tc358767: Fix link properties discovery
	drm/panic: Select ZLIB_DEFLATE for DRM_PANIC_SCREEN_QR_CODE
	selftests/bpf: Fix msg_verify_data in test_sockmap
	selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap
	wifi: mwifiex: add missing locking for cfg80211 calls
	wifi: wilc1000: Set MAC after operation mode
	wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_config_scan()
	drm: fsl-dcu: enable PIXCLK on LS1021A
	drm: panel: nv3052c: correct spi_device_id for RG35XX panel
	drm/msm/dpu: on SDM845 move DSPP_3 to LM_5 block
	drm/msm/dpu: drop LM_3 / LM_4 on SDM845
	drm/msm/dpu: drop LM_3 / LM_4 on MSM8998
	octeontx2-pf: handle otx2_mbox_get_rsp errors in otx2_common.c
	octeontx2-pf: handle otx2_mbox_get_rsp errors in otx2_ethtool.c
	octeontx2-pf: handle otx2_mbox_get_rsp errors in otx2_flows.c
	octeontx2-pf: handle otx2_mbox_get_rsp errors in cn10k.c
	octeontx2-pf: handle otx2_mbox_get_rsp errors in otx2_dmac_flt.c
	octeontx2-pf: handle otx2_mbox_get_rsp errors in otx2_dcbnl.c
	selftests/bpf: fix test_spin_lock_fail.c's global vars usage
	libbpf: move global data mmap()'ing into bpf_object__load()
	wifi: rtw89: rename rtw89_vif to rtw89_vif_link ahead for MLO
	wifi: rtw89: rename rtw89_sta to rtw89_sta_link ahead for MLO
	wifi: rtw89: read bss_conf corresponding to the link
	wifi: rtw89: read link_sta corresponding to the link
	wifi: rtw89: refactor VIF related func ahead for MLO
	wifi: rtw89: refactor STA related func ahead for MLO
	wifi: rtw89: tweak driver architecture for impending MLO support
	wifi: rtw89: Fix TX fail with A2DP after scanning
	wifi: rtw89: unlock on error path in rtw89_ops_unassign_vif_chanctx()
	drm/panfrost: Remove unused id_mask from struct panfrost_model
	bpf, arm64: Remove garbage frame for struct_ops trampoline
	drm/msm/adreno: Use IRQF_NO_AUTOEN flag in request_irq()
	drm/msm/gpu: Check the status of registration to PM QoS
	drm/xe/hdcp: Fix gsc structure check in fw check status
	drm/etnaviv: Request pages from DMA32 zone on addressing_limited
	drm/etnaviv: hold GPU lock across perfmon sampling
	drm/amd/display: Increase idle worker HPD detection time
	drm/amd/display: Reduce HPD Detection Interval for IPS
	drm/nouveau/gr/gf100: Fix missing unlock in gf100_gr_chan_new()
	drm: zynqmp_kms: Unplug DRM device before removal
	drm: xlnx: zynqmp_disp: layer may be null while releasing
	wifi: wfx: Fix error handling in wfx_core_init()
	wifi: cw1200: Fix potential NULL dereference
	drm/msm/dpu: cast crtc_clk calculation to u64 in _dpu_core_perf_calc_clk()
	bpf, bpftool: Fix incorrect disasm pc
	bpf: Tighten tail call checks for lingering locks, RCU, preempt_disable
	drm/vkms: Drop unnecessary call to drm_crtc_cleanup()
	drm/amdgpu: Fix the memory allocation issue in amdgpu_discovery_get_nps_info()
	drm/amdkfd: Use dynamic allocation for CU occupancy array in 'kfd_get_cu_occupancy()'
	bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
	drm: use ATOMIC64_INIT() for atomic64_t
	netfilter: nf_tables: avoid false-positive lockdep splat on rule deletion
	netfilter: nf_tables: must hold rcu read lock while iterating expression type list
	netfilter: nf_tables: must hold rcu read lock while iterating object type list
	netlink: typographical error in nlmsg_type constants definition
	wifi: rtw89: coex: check NULL return of kmalloc in btc_fw_set_monreg()
	drm/panfrost: Add missing OPP table refcnt decremental
	drm/panthor: introduce job cycle and timestamp accounting
	drm/panthor: record current and maximum device clock frequencies
	drm/panthor: Fix OPP refcnt leaks in devfreq initialisation
	isofs: avoid memory leak in iocharset
	selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap
	selftests/bpf: Fix SENDPAGE data logic in test_sockmap
	selftests/bpf: Fix total_bytes in msg_loop_rx in test_sockmap
	selftests/bpf: Add push/pop checking for msg_verify_data in test_sockmap
	bpf, sockmap: Several fixes to bpf_msg_push_data
	bpf, sockmap: Several fixes to bpf_msg_pop_data
	bpf, sockmap: Fix sk_msg_reset_curr
	ipv6: release nexthop on device removal
	selftests: net: really check for bg process completion
	wifi: cfg80211: Remove the Medium Synchronization Delay validity check
	wifi: iwlwifi: allow fast resume on ax200
	wifi: iwlwifi: mvm: tell iwlmei when we finished suspending
	drm/amdgpu: fix ACA bank count boundary check error
	drm/amdgpu: Fix map/unmap queue logic
	drm/amdkfd: Fix wrong usage of INIT_WORK()
	bpf: Allow return values 0 and 1 for kprobe session
	bpf: Force uprobe bpf program to always return 0
	selftests/bpf: skip the timer_lockup test for single-CPU nodes
	ipv6: Fix soft lockups in fib6_select_path under high next hop churn
	net: rfkill: gpio: Add check for clk_enable()
	Revert "wifi: iwlegacy: do not skip frames with bad FCS"
	bpf: Use function pointers count as struct_ops links count
	bpf: Add kernel symbol for struct_ops trampoline
	ALSA: usx2y: Use snd_card_free_when_closed() at disconnection
	ALSA: us122l: Use snd_card_free_when_closed() at disconnection
	ALSA: caiaq: Use snd_card_free_when_closed() at disconnection
	ALSA: 6fire: Release resources at card release
	i2c: dev: Fix memory leak when underlying adapter does not support I2C
	selftests: netfilter: Fix missing return values in conntrack_dump_flush
	Bluetooth: btintel_pcie: Add handshake between driver and firmware
	Bluetooth: btintel: Do no pass vendor events to stack
	Bluetooth: btmtk: adjust the position to init iso data anchor
	Bluetooth: btbcm: fix missing of_node_put() in btbcm_get_board_name()
	Bluetooth: ISO: Use kref to track lifetime of iso_conn
	Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pending
	Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending
	Bluetooth: ISO: Send BIG Create Sync via hci_sync
	Bluetooth: fix use-after-free in device_for_each_child()
	xsk: Free skb when TX metadata options are invalid
	erofs: fix file-backed mounts over FUSE
	erofs: fix blksize < PAGE_SIZE for file-backed mounts
	erofs: handle NONHEAD !delta[1] lclusters gracefully
	dlm: fix dlm_recover_members refcount on error
	eth: fbnic: don't disable the PCI device twice
	net: txgbe: remove GPIO interrupt controller
	net: txgbe: fix null pointer to pcs
	netpoll: Use rcu_access_pointer() in netpoll_poll_lock
	wireguard: selftests: load nf_conntrack if not present
	bpf: fix recursive lock when verdict program return SK_PASS
	unicode: Fix utf8_load() error path
	cppc_cpufreq: Use desired perf if feedback ctrs are 0 or unchanged
	RDMA/core: Provide rdma_user_mmap_disassociate() to disassociate mmap pages
	RDMA/hns: Disassociate mmap pages for all uctx when HW is being reset
	pinctrl: renesas: rzg2l: Fix missing return in rzg2l_pinctrl_register()
	clk: mediatek: drop two dead config options
	trace/trace_event_perf: remove duplicate samples on the first tracepoint event
	pinctrl: zynqmp: drop excess struct member description
	pinctrl: renesas: Select PINCTRL_RZG2L for RZ/V2H(P) SoC
	clk: qcom: videocc-sm8550: depend on either gcc-sm8550 or gcc-sm8650
	iommu/s390: Implement blocking domain
	scsi: hisi_sas: Enable all PHYs that are not disabled by user during controller reset
	powerpc/vdso: Flag VDSO64 entry points as functions
	mfd: tps65010: Use IRQF_NO_AUTOEN flag in request_irq() to fix race
	mfd: da9052-spi: Change read-mask to write-mask
	mfd: intel_soc_pmic_bxtwc: Use IRQ domain for USB Type-C device
	mfd: intel_soc_pmic_bxtwc: Use IRQ domain for TMU device
	mfd: intel_soc_pmic_bxtwc: Use IRQ domain for PMIC devices
	mfd: intel_soc_pmic_bxtwc: Fix IRQ domain names duplication
	cpufreq: loongson2: Unregister platform_driver on failure
	powerpc/fadump: Refactor and prepare fadump_cma_init for late init
	powerpc/fadump: Move fadump_cma_init to setup_arch() after initmem_init()
	mtd: hyperbus: rpc-if: Add missing MODULE_DEVICE_TABLE
	mtd: rawnand: atmel: Fix possible memory leak
	clk: Allow kunit tests to run without OF_OVERLAY enabled
	powerpc/mm/fault: Fix kfence page fault reporting
	iommu/tegra241-cmdqv: Staticize cmdqv_debugfs_dir
	clk: sophgo: avoid integer overflow in sg2042_pll_recalc_rate()
	mtd: spi-nor: spansion: Use nor->addr_nbytes in octal DTR mode in RD_ANY_REG_OP
	powerpc/pseries: Fix dtl_access_lock to be a rw_semaphore
	cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw()
	cpufreq: CPPC: Fix possible null-ptr-deref for cppc_get_cpu_cost()
	iommu/amd/pgtbl_v2: Take protection domain lock before invalidating TLB
	RDMA/hns: Fix an AEQE overflow error caused by untimely update of eq_db_ci
	RDMA/hns: Fix flush cqe error when racing with destroy qp
	RDMA/hns: Modify debugfs name
	RDMA/hns: Use dev_* printings in hem code instead of ibdev_*
	RDMA/hns: Fix cpu stuck caused by printings during reset
	RDMA/rxe: Fix the qp flush warnings in req
	RDMA/bnxt_re: Check cqe flags to know imm_data vs inv_irkey
	clk: sunxi-ng: d1: Fix PLL_AUDIO0 preset
	clk: renesas: rzg2l: Fix FOUTPOSTDIV clk
	RDMA/rxe: Set queue pair cur_qp_state when being queried
	RDMA/mlx5: Call dev_put() after the blocking notifier
	RDMA/core: Implement RoCE GID port rescan and export delete function
	RDMA/mlx5: Ensure active slave attachment to the bond IB device
	RISC-V: KVM: Fix APLIC in_clrip and clripnum write emulation
	riscv: kvm: Fix out-of-bounds array access
	clk: imx: lpcg-scu: SW workaround for errata (e10858)
	clk: imx: fracn-gppll: correct PLL initialization flow
	clk: imx: fracn-gppll: fix pll power up
	clk: imx: clk-scu: fix clk enable state save and restore
	clk: imx: imx8-acm: Fix return value check in clk_imx_acm_attach_pm_domains()
	iommu/vt-d: Fix checks and print in dmar_fault_dump_ptes()
	iommu/vt-d: Fix checks and print in pgtable_walk()
	checkpatch: always parse orig_commit in fixes tag
	mfd: rt5033: Fix missing regmap_del_irq_chip()
	leds: max5970: Fix unreleased fwnode_handle in probe function
	leds: ktd2692: Set missing timing properties
	fs/proc/kcore.c: fix coccinelle reported ERROR instances
	scsi: target: Fix incorrect function name in pscsi_create_type_disk()
	scsi: bfa: Fix use-after-free in bfad_im_module_exit()
	scsi: fusion: Remove unused variable 'rc'
	scsi: qedf: Fix a possible memory leak in qedf_alloc_and_init_sb()
	scsi: qedi: Fix a possible memory leak in qedi_alloc_and_init_sb()
	scsi: sg: Enable runtime power management
	x86/tdx: Introduce wrappers to read and write TD metadata
	x86/tdx: Rename tdx_parse_tdinfo() to tdx_setup()
	x86/tdx: Dynamically disable SEPT violations from causing #VEs
	powerpc/fadump: allocate memory for additional parameters early
	fadump: reserve param area if below boot_mem_top
	RDMA/hns: Fix out-of-order issue of requester when setting FENCE
	RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg()
	cpufreq: loongson3: Check for error code from devm_mutex_init() call
	cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_cost()
	cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_power()
	kasan: move checks to do_strncpy_from_user
	kunit: skb: use "gfp" variable instead of hardcoding GFP_KERNEL
	ocfs2: fix uninitialized value in ocfs2_file_read_iter()
	zram: ZRAM_DEF_COMP should depend on ZRAM
	iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift
	dax: delete a stale directory pmem
	KVM: PPC: Book3S HV: Stop using vc->dpdes for nested KVM guests
	KVM: PPC: Book3S HV: Avoid returning to nested hypervisor on pending doorbells
	powerpc/sstep: make emulate_vsx_load and emulate_vsx_store static
	RDMA/hns: Fix different dgids mapping to the same dip_idx
	KVM: PPC: Book3S HV: Fix kmv -> kvm typo
	powerpc/kexec: Fix return of uninitialized variable
	fbdev: sh7760fb: Fix a possible memory leak in sh7760fb_alloc_mem()
	RDMA/mlx5: Move events notifier registration to be after device registration
	clk: clk-apple-nco: Add NULL check in applnco_probe
	clk: ralink: mtmips: fix clock plan for Ralink SoC RT3883
	clk: ralink: mtmips: fix clocks probe order in oldest ralink SoCs
	clk: en7523: remove REG_PCIE*_{MEM,MEM_MASK} configuration
	clk: en7523: move clock_register in hw_init callback
	clk: en7523: introduce chip_scu regmap
	clk: en7523: fix estimation of fixed rate for EN7581
	dt-bindings: clock: axi-clkgen: include AXI clk
	clk: clk-axi-clkgen: make sure to enable the AXI bus clock
	zram: permit only one post-processing operation at a time
	zram: fix NULL pointer in comp_algorithm_show()
	RDMA/bnxt_re: Correct the sequence of device suspend
	arm64: dts: qcom: sc8180x: Add a SoC-specific compatible to cpufreq-hw
	pinctrl: k210: Undef K210_PC_DEFAULT
	rtla/timerlat: Do not set params->user_workload with -U
	smb: cached directories can be more than root file handle
	mailbox: mtk-cmdq: fix wrong use of sizeof in cmdq_get_clocks()
	mailbox: arm_mhuv2: clean up loop in get_irq_chan_comb()
	x86: fix off-by-one in access_ok()
	perf cs-etm: Don't flush when packet_queue fills up
	gfs2: Rename GLF_VERIFY_EVICT to GLF_VERIFY_DELETE
	gfs2: Allow immediate GLF_VERIFY_DELETE work
	gfs2: Fix unlinked inode cleanup
	perf mem: Fix printing PERF_MEM_LVLNUM_{L2_MHB|MSC}
	dt-bindings: PCI: mediatek-gen3: Allow exact number of clocks only
	PCI: Fix reset_method_store() memory leak
	perf jevents: Don't stop at the first matched pmu when searching a events table
	perf stat: Close cork_fd when create_perf_stat_counter() failed
	perf stat: Fix affinity memory leaks on error path
	perf trace: Keep exited threads for summary
	perf test attr: Add back missing topdown events
	rust: rbtree: fix `SAFETY` comments that should be `# Safety` sections
	f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks
	f2fs: fix null-ptr-deref in f2fs_submit_page_bio()
	mailbox, remoteproc: k3-m4+: fix compile testing
	f2fs: fix to account dirty data in __get_secs_required()
	perf dso: Fix symtab_type for kmod compression
	perf disasm: Fix capstone memory leak
	perf probe: Fix libdw memory leak
	perf probe: Correct demangled symbols in C++ program
	rust: kernel: fix THIS_MODULE header path in ThisModule doc comment
	rust: macros: fix documentation of the paste! macro
	PCI: cpqphp: Fix PCIBIOS_* return value confusion
	rust: block: fix formatting of `kernel::block::mq::request` module
	perf disasm: Use disasm_line__free() to properly free disasm_line
	perf disasm: Fix not cleaning up disasm_line in symbol__disassemble_raw()
	virtiofs: use pages instead of pointer for kernel direct IO
	perf ftrace latency: Fix unit on histogram first entry when using --use-nsec
	i3c: master: Remove i3c_dev_disable_ibi_locked(olddev) on device hotjoin
	f2fs: fix the wrong f2fs_bug_on condition in f2fs_do_replace_block
	f2fs: check curseg->inited before write_sum_page in change_curseg
	f2fs: Fix not used variable 'index'
	f2fs: fix to avoid potential deadlock in f2fs_record_stop_reason()
	f2fs: fix to avoid use GC_AT when setting gc_mode as GC_URGENT_LOW or GC_URGENT_MID
	PCI: qcom: Enable MSI interrupts together with Link up if 'Global IRQ' is supported
	PCI: qcom-ep: Move controller cleanups to qcom_pcie_perst_deassert()
	PCI: tegra194: Move controller cleanups to pex_ep_event_pex_rst_deassert()
	PCI: j721e: Deassert PERST# after a delay of PCIE_T_PVPERL_MS milliseconds
	perf build: Add missing cflags when building with custom libtraceevent
	f2fs: fix race in concurrent f2fs_stop_gc_thread
	f2fs: fix to map blocks correctly for direct write
	f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
	perf trace: avoid garbage when not printing a trace event's arguments
	m68k: mcfgpio: Fix incorrect register offset for CONFIG_M5441x
	m68k: coldfire/device.c: only build FEC when HW macros are defined
	svcrdma: Address an integer overflow
	nfsd: drop inode parameter from nfsd4_change_attribute()
	perf list: Fix topic and pmu_name argument order
	perf trace: Fix tracing itself, creating feedback loops
	perf trace: Do not lose last events in a race
	perf trace: Avoid garbage when not printing a syscall's arguments
	remoteproc: qcom: pas: Remove subdevs on the error path of adsp_probe()
	remoteproc: qcom: adsp: Remove subdevs on the error path of adsp_probe()
	remoteproc: qcom: pas: add minidump_id to SM8350 resources
	rpmsg: glink: use only lower 16-bits of param2 for CMD_OPEN name length
	remoteproc: qcom_q6v5_mss: Re-order writes to the IMEM region
	PCI: endpoint: epf-mhi: Avoid NULL dereference if DT lacks 'mmio'
	NFSD: Prevent NULL dereference in nfsd4_process_cb_update()
	NFSD: Cap the number of bytes copied by nfs4_reset_recoverydir()
	nfsd: release svc_expkey/svc_export with rcu_work
	svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init()
	NFSD: Fix nfsd4_shutdown_copy()
	nfs_common: must not hold RCU while calling nfsd_file_put_local
	f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
	perf bpf-filter: Return -ENOMEM directly when pfi allocation fails
	hwmon: (tps23861) Fix reporting of negative temperatures
	hwmon: (aquacomputer_d5next) Fix length of speed_input array
	phy: airoha: Fix REG_CSR_2L_PLL_CMN_RESERVE0 config in airoha_pcie_phy_init_clk_out()
	phy: airoha: Fix REG_PCIE_PMA_TX_RESET config in airoha_pcie_phy_init_csr_2l()
	phy: airoha: Fix REG_CSR_2L_JCPLL_SDM_HREN config in airoha_pcie_phy_init_ssc_jcpll()
	phy: airoha: Fix REG_CSR_2L_RX{0,1}_REV0 definitions
	vdpa/mlx5: Fix suboptimal range on iotlb iteration
	vfio/mlx5: Fix an unwind issue in mlx5vf_add_migration_pages()
	vfio/mlx5: Fix unwind flows in mlx5vf_pci_save/resume_device_data()
	selftests/mount_setattr: Fix failures on 64K PAGE_SIZE kernels
	gpio: zevio: Add missed label initialisation
	vfio/pci: Properly hide first-in-list PCIe extended capability
	fs_parser: update mount_api doc to match function signature
	LoongArch: Fix build failure with GCC 15 (-std=gnu23)
	LoongArch: BPF: Sign-extend return values
	power: supply: core: Remove might_sleep() from power_supply_put()
	power: supply: bq27xxx: Fix registers of bq27426
	power: supply: rt9471: Fix wrong WDT function regfield declaration
	power: supply: rt9471: Use IC status regfield to report real charger status
	fs/ntfs3: Equivalent transition from page to folio
	power: reset: ep93xx: add AUXILIARY_BUS dependency
	net: usb: lan78xx: Fix double free issue with interrupt buffer allocation
	net: usb: lan78xx: Fix memory leak on device unplug by freeing PHY device
	tg3: Set coherent DMA mask bits to 31 for BCM57766 chipsets
	net: usb: lan78xx: Fix refcounting and autosuspend on invalid WoL configuration
	net: microchip: vcap: Add typegroup table terminators in kunit tests
	netlink: fix false positive warning in extack during dumps
	exfat: fix file being changed by unaligned direct write
	net/l2tp: fix warning in l2tp_exit_net found by syzbot
	s390/iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()
	rtase: Refactor the rtase_check_mac_version_valid() function
	rtase: Correct the speed for RTL907XD-V1
	rtase: Corrects error handling of the rtase_check_mac_version_valid()
	net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged
	net: mdio-ipq4019: add missing error check
	marvell: pxa168_eth: fix call balance of pep->clk handling routines
	net: stmmac: dwmac-socfpga: Set RX watchdog interrupt as broken
	octeontx2-af: RPM: Fix mismatch in lmac type
	octeontx2-af: RPM: Fix low network performance
	octeontx2-af: RPM: fix stale RSFEC counters
	octeontx2-af: RPM: fix stale FCFEC counters
	octeontx2-af: Quiesce traffic before NIX block reset
	spi: atmel-quadspi: Fix register name in verbose logging function
	net: hsr: fix hsr_init_sk() vs network/transport headers.
	bnxt_en: Reserve rings after PCIe AER recovery if NIC interface is down
	bnxt_en: Set backplane link modes correctly for ethtool
	bnxt_en: Fix queue start to update vnic RSS table
	bnxt_en: Fix receive ring space parameters when XDP is active
	bnxt_en: Refactor bnxt_ptp_init()
	bnxt_en: Unregister PTP during PCI shutdown and suspend
	Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
	Bluetooth: MGMT: Fix possible deadlocks
	llc: Improve setsockopt() handling of malformed user input
	rxrpc: Improve setsockopt() handling of malformed user input
	tcp: Fix use-after-free of nreq in reqsk_timer_handler().
	ip6mr: fix tables suspicious RCU usage
	ipmr: fix tables suspicious RCU usage
	iio: light: al3010: Fix an error handling path in al3010_probe()
	usb: using mutex lock and supporting O_NONBLOCK flag in iowarrior_read()
	usb: yurex: make waiting on yurex_write interruptible
	USB: chaoskey: fail open after removal
	USB: chaoskey: Fix possible deadlock chaoskey_list_lock
	misc: apds990x: Fix missing pm_runtime_disable()
	devres: Fix page faults when tracing devres from unloaded modules
	usb: gadget: uvc: wake pump everytime we update the free list
	interconnect: qcom: icc-rpmh: probe defer incase of missing QoS clock dependency
	iio: backend: fix wrong pointer passed to IS_ERR()
	iio: adc: ad4000: fix reading unsigned data
	iio: adc: ad4000: Check for error code from devm_mutex_init() call
	iio: adc: pac1921: Check for error code from devm_mutex_init() call
	iio: accel: adxl380: fix raw sample read
	phy: realtek: usb: fix NULL deref in rtk_usb2phy_probe
	phy: realtek: usb: fix NULL deref in rtk_usb3phy_probe
	counter: stm32-timer-cnt: Add check for clk_enable()
	counter: ti-ecap-capture: Add check for clk_enable()
	bus: mhi: host: Switch trace_mhi_gen_tre fields to native endian
	usb: typec: fix potential array underflow in ucsi_ccg_sync_control()
	firmware_loader: Fix possible resource leak in fw_log_firmware_info()
	ALSA: hda/realtek: Update ALC256 depop procedure
	drm/radeon: Fix spurious unplug event on radeon HDMI
	drm/amd/display: Fix null check for pipe_ctx->plane_state in dcn20_program_pipe
	drm/amd/display: Fix null check for pipe_ctx->plane_state in hwss_setup_dpp
	ASoC: imx-audmix: Add NULL check in imx_audmix_probe
	drm/xe/ufence: Wake up waiters after setting ufence->signalled
	apparmor: fix 'Do simple duplicate message elimination'
	ALSA: core: Fix possible NULL dereference caused by kunit_kzalloc()
	ASoC: amd: yc: Fix for enabling DMIC on acp6x via _DSD entry
	ASoC: mediatek: Check num_codecs is not zero to avoid panic during probe
	s390/pci: Fix potential double remove of hotplug slot
	f2fs: fix fiemap failure issue when page size is 16KB
	net_sched: sch_fq: don't follow the fast path if Tx is behind now
	xen: Fix the issue of resource not being properly released in xenbus_dev_probe()
	ALSA: usb-audio: Fix potential out-of-bound accesses for Extigy and Mbox devices
	ALSA: usb-audio: Fix out of bounds reads when finding clock sources
	usb: ehci-spear: fix call balance of sehci clk handling routines
	usb: typec: ucsi: glink: fix off-by-one in connector_status
	xfs: fix simplify extent lookup in xfs_can_free_eofblocks
	ext4: supress data-race warnings in ext4_free_inodes_{count,set}()
	ext4: fix FS_IOC_GETFSMAP handling
	MAINTAINERS: update location of media main tree
	docs: media: update location of the media patches
	jfs: xattr: check invalid xattr size more strictly
	ASoC: amd: yc: Add a quirk for microfone on Lenovo ThinkPad P14s Gen 5 21MES00B00
	ASoC: codecs: Fix atomicity violation in snd_soc_component_get_drvdata()
	ASoC: da7213: Populate max_register to regmap_config
	perf/x86/intel/pt: Fix buffer full but size is 0 case
	crypto: x86/aegis128 - access 32-bit arguments as 32-bit
	KVM: x86: switch hugepage recovery thread to vhost_task
	KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE
	KVM: x86: add back X86_LOCAL_APIC dependency
	KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
	powerpc/pseries: Fix KVM guest detection for disabling hardlockup detector
	KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR
	KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status
	Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
	KVM: arm64: Don't retire aborted MMIO instruction
	KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE
	KVM: arm64: Get rid of userspace_irqchip_in_use
	KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*
	KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device
	Compiler Attributes: disable __counted_by for clang < 19.1.3
	PCI: Fix use-after-free of slot->bus on hot remove
	LoongArch: Explicitly specify code model in Makefile
	clk: clk-loongson2: Fix memory corruption bug in struct loongson2_clk_provider
	clk: clk-loongson2: Fix potential buffer overflow in flexible-array member access
	fsnotify: fix sending inotify event with unexpected filename
	fsnotify: Fix ordering of iput() and watched_objects decrement
	comedi: Flush partial mappings in error case
	apparmor: test: Fix memory leak for aa_unpack_strdup()
	iio: dac: adi-axi-dac: fix wrong register bitfield
	tty: ldsic: fix tty_ldisc_autoload sysctl's proc_handler
	locking/lockdep: Avoid creating new name string literals in lockdep_set_subclass()
	tools/nolibc: s390: include std.h
	fcntl: make F_DUPFD_QUERY associative
	pinctrl: qcom: spmi: fix debugfs drive strength
	dt-bindings: pinctrl: samsung: Fix interrupt constraint for variants with fallbacks
	dt-bindings: iio: dac: ad3552r: fix maximum spi speed
	exfat: fix uninit-value in __exfat_get_dentry_set
	exfat: fix out-of-bounds access of directory entries
	xhci: Fix control transfer error on Etron xHCI host
	xhci: Combine two if statements for Etron xHCI host
	xhci: Don't perform Soft Retry for Etron xHCI host
	xhci: Don't issue Reset Device command to Etron xHCI host
	Bluetooth: Fix type of len in rfcomm_sock_getsockopt{,_old}()
	usb: xhci: Limit Stop Endpoint retries
	usb: xhci: Fix TD invalidation under pending Set TR Dequeue
	usb: xhci: Avoid queuing redundant Stop Endpoint commands
	ARM: dts: omap36xx: declare 1GHz OPP as turbo again
	wifi: ath12k: fix warning when unbinding
	wifi: rtlwifi: Drastically reduce the attempts to read efuse in case of failures
	wifi: nl80211: fix bounds checker error in nl80211_parse_sched_scan
	wifi: ath12k: fix crash when unbinding
	wifi: brcmfmac: release 'root' node in all execution paths
	Revert "fs: don't block i_writecount during exec"
	Revert "f2fs: remove unreachable lazytime mount option parsing"
	Revert "usb: gadget: composite: fix OS descriptors w_value logic"
	serial: sh-sci: Clean sci_ports[0] after at earlycon exit
	Revert "serial: sh-sci: Clean sci_ports[0] after at earlycon exit"
	io_uring: fix corner case forgetting to vunmap
	io_uring: check for overflows in io_pin_pages
	blk-settings: round down io_opt to physical_block_size
	gpio: exar: set value when external pull-up or pull-down is present
	netfilter: ipset: add missing range check in bitmap_ip_uadt
	spi: Fix acpi deferred irq probe
	mtd: spi-nor: core: replace dummy buswidth from addr to data
	cpufreq: mediatek-hw: Fix wrong return value in mtk_cpufreq_get_cpu_power()
	cifs: support mounting with alternate password to allow password rotation
	parisc/ftrace: Fix function graph tracing disablement
	RISC-V: Scalar unaligned access emulated on hotplug CPUs
	RISC-V: Check scalar unaligned access on all CPUs
	ksmbd: fix use-after-free in SMB request handling
	smb: client: fix NULL ptr deref in crypto_aead_setkey()
	platform/chrome: cros_ec_typec: fix missing fwnode reference decrement
	irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
	x86/CPU/AMD: Terminate the erratum_1386_microcode array
	ubi: wl: Put source PEB into correct list if trying locking LEB failed
	um: ubd: Do not use drvdata in release
	um: net: Do not use drvdata in release
	dt-bindings: serial: rs485: Fix rs485-rts-delay property
	serial: 8250_fintek: Add support for F81216E
	serial: 8250: omap: Move pm_runtime_get_sync
	serial: amba-pl011: Fix RX stall when DMA is used
	serial: amba-pl011: fix build regression
	Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()"
	mtd: ubi: fix unreleased fwnode_handle in find_volume_fwnode()
	block: Prevent potential deadlock in blk_revalidate_disk_zones()
	um: vector: Do not use drvdata in release
	sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
	iio: gts: Fix uninitialized symbol 'ret'
	ublk: fix ublk_ch_mmap() for 64K page size
	arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
	block: fix missing dispatching request when queue is started or unquiesced
	block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding
	block: fix ordering between checking BLK_MQ_S_STOPPED request adding
	blk-mq: Make blk_mq_quiesce_tagset() hold the tag list mutex less long
	gve: Flow steering trigger reset only for timeout error
	HID: wacom: Interpret tilt data from Intuos Pro BT as signed values
	i40e: Fix handling changed priv flags
	media: wl128x: Fix atomicity violation in fmc_send_cmd()
	media: intel/ipu6: do not handle interrupts when device is disabled
	arm64: dts: mediatek: mt8186-corsola-voltorb: Merge speaker codec nodes
	netdev-genl: Hold rcu_read_lock in napi_get
	soc: fsl: cpm1: qmc: Set the ret error code on platform_get_irq() failure
	soc: fsl: rcpm: fix missing of_node_put() in copy_ippdexpcr1_setting()
	media: v4l2-core: v4l2-dv-timings: check cvt/gtf result
	x86/mm: Carve out INVLPG inline asm for use by others
	x86/microcode/AMD: Flush patch buffer mapping after application
	ALSA: rawmidi: Fix kvfree() call in spinlock
	ALSA: ump: Fix evaluation of MIDI 1.0 FB info
	ALSA: pcm: Add sanity NULL check for the default mmap fault handler
	ALSA: hda/realtek: Update ALC225 depop procedure
	ALSA: hda/realtek: Enable speaker pins for Medion E15443 platform
	ALSA: hda/realtek: Set PCBeep to default value for ALC274
	ALSA: hda/realtek: Fix Internal Speaker and Mic boost of Infinix Y4 Max
	ALSA: hda/realtek: fix mute/micmute LEDs don't work for EliteBook X G1i
	ALSA: hda/realtek: Apply quirk for Medion E15433
	fs/smb/client: implement chmod() for SMB3 POSIX Extensions
	smb: client: fix use-after-free of signing key
	smb3: request handle caching when caching directories
	smb: client: handle max length for SMB symlinks
	smb: Don't leak cfid when reconnect races with open_cached_dir
	smb: prevent use-after-free due to open_cached_dir error paths
	smb: During unmount, ensure all cached dir instances drop their dentry
	usb: misc: ljca: set small runtime autosuspend delay
	usb: misc: ljca: move usb_autopm_put_interface() after wait for response
	usb: dwc3: ep0: Don't clear ep0 DWC3_EP_TRANSFER_STARTED
	usb: musb: Fix hardware lockup on first Rx endpoint request
	usb: dwc3: gadget: Add missing check for single port RAM in TxFIFO resizing logic
	usb: dwc3: gadget: Fix checking for number of TRBs left
	usb: dwc3: gadget: Fix looping of queued SG entries
	staging: vchiq_arm: Fix missing refcount decrement in error path for fw_node
	counter: stm32-timer-cnt: fix device_node handling in probe_encoder()
	ublk: fix error code for unsupported command
	lib: string_helpers: silence snprintf() output truncation warning
	f2fs: fix to do sanity check on node blkaddr in truncate_node()
	ipc: fix memleak if msg_init_ns failed in create_ipc_ns
	Input: cs40l50 - fix wrong usage of INIT_WORK()
	NFSD: Prevent a potential integer overflow
	SUNRPC: make sure cache entry active before cache_show
	um: Fix potential integer overflow during physmem setup
	um: Fix the return value of elf_core_copy_task_fpregs
	kfifo: don't include dma-mapping.h in kfifo.h
	um: ubd: Initialize ubd's disk pointer in ubd_add
	um: Always dump trace for specified task in show_stack
	NFSv4.0: Fix a use-after-free problem in the asynchronous open()
	nfs/localio: must clear res.replen in nfs_local_read_done
	rtc: st-lpc: Use IRQF_NO_AUTOEN flag in request_irq()
	rtc: abx80x: Fix WDT bit position of the status register
	rtc: check if __rtc_read_time was successful in rtc_timer_do_work()
	ubi: fastmap: wl: Schedule fm_work if wear-leveling pool is empty
	ubifs: Correct the total block count by deducting journal reservation
	ubi: fastmap: Fix duplicate slab cache names while attaching
	ubifs: authentication: Fix use-after-free in ubifs_tnc_end_commit
	jffs2: fix use of uninitialized variable
	hostfs: Fix the NULL vs IS_ERR() bug for __filemap_get_folio()
	net/9p/usbg: fix handling of the failed kzalloc() memory allocation
	rtc: rzn1: fix BCD to rtc_time conversion errors
	Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"
	nvme/multipath: Fix RCU list traversal to use SRCU primitive
	blk-mq: add non_owner variant of start_freeze/unfreeze queue APIs
	block: model freeze & enter queue as lock for supporting lockdep
	block: fix uaf for flush rq while iterating tags
	block: return unsigned int from bdev_io_min
	nvme-fabrics: fix kernel crash while shutting down controller
	9p/xen: fix init sequence
	9p/xen: fix release of IRQ
	perf/arm-smmuv3: Fix lockdep assert in ->event_init()
	perf/arm-cmn: Ensure port and device id bits are set properly
	smb: client: disable directory caching when dir_cache_timeout is zero
	x86/Documentation: Update algo in init_size description of boot protocol
	cifs: Fix parsing native symlinks relative to the export
	cifs: Fix parsing reparse point with native symlink in SMB1 non-UNICODE session
	rtc: ab-eoz9: don't fail temperature reads on undervoltage notification
	Rename .data.unlikely to .data..unlikely
	Rename .data.once to .data..once to fix resetting WARN*_ONCE
	kbuild: deb-pkg: Don't fail if modules.order is missing
	smb: Initialize cfid->tcon before performing network ops
	block: Don't allow an atomic write be truncated in blkdev_write_iter()
	modpost: remove incorrect code in do_eisa_entry()
	cifs: during remount, make sure passwords are in sync
	cifs: unlock on error in smb3_reconfigure()
	nfs: ignore SB_RDONLY when mounting nfs
	sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport
	SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT
	sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket
	nfs/blocklayout: Don't attempt unregister for invalid block device
	nfs/blocklayout: Limit repeat device registration on failure
	block, bfq: fix bfqq uaf in bfq_limit_depth()
	brd: decrease the number of allocated pages which discarded
	sh: intc: Fix use-after-free bug in register_intc_controller()
	tools/power turbostat: Fix trailing '\n' parsing
	tools/power turbostat: Fix child's argument forwarding
	block: always verify unfreeze lock on the owner task
	block: don't verify IO lock for freeze/unfreeze in elevator_init_mq()
	Linux 6.12.2

Change-Id: Ifebddb35b5a6a6ff2a65eb795a912633639aca9a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2024-12-13 10:11:11 -08:00
Paolo Bonzini 91248a2e41 KVM: x86: switch hugepage recovery thread to vhost_task
commit d96c77bd4eeba469bddbbb14323d2191684da82a upstream.

kvm_vm_create_worker_thread() is meant to be used for kthreads that
can consume significant amounts of CPU time on behalf of a VM or in
response to how the VM behaves (for example how it accesses its memory).
Therefore it wants to charge the CPU time consumed by that work to
the VM's container.

However, because of these threads, cgroups which have kvm instances
inside never complete freezing.  This can be trivially reproduced:

  root@test ~# mkdir /sys/fs/cgroup/test
  root@test ~# echo $$ > /sys/fs/cgroup/test/cgroup.procs
  root@test ~# qemu-system-x86_64 -nographic -enable-kvm

and in another terminal:

  root@test ~# echo 1 > /sys/fs/cgroup/test/cgroup.freeze
  root@test ~# cat /sys/fs/cgroup/test/cgroup.events
  populated 1
  frozen 0

The cgroup freezing happens in the signal delivery path but
kvm_nx_huge_page_recovery_worker, while joining non-root cgroups, never
calls into the signal delivery path and thus never gets frozen. Because
the cgroup freezer determines whether a given cgroup is frozen by
comparing the number of frozen threads to the total number of threads
in the cgroup, the cgroup never becomes frozen and users waiting for
the state transition may hang indefinitely.

Since the worker kthread is tied to a user process, it's better if
it behaves similarly to user tasks as much as possible, including
being able to send SIGSTOP and SIGCONT.  In fact, vhost_task is all
that kvm_vm_create_worker_thread() wanted to be and more: not only it
inherits the userspace process's cgroups, it has other niceties like
being parented properly in the process tree.  Use it instead of the
homegrown alternative.

Incidentally, the new code is also better behaved when you flip recovery
back and forth to disabled and back to enabled.  If your recovery period
is 1 minute, it will run the next recovery after 1 minute independent
of how many times you flipped the parameter.

(Commit message based on emails from Tejun).

Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Luca Boccassi <bluca@debian.org>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Luca Boccassi <bluca@debian.org>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05 14:02:43 +01:00
Fuad Tabba fe6737c5ba ANDROID: KVM: arm64: Introduce gfn_to_memslot_prot()
Returns the memslot and whether it's writable without requiring a
userspace address at the host.

The userspace address isn't needed to get this information.
Future patches, where the userspace address might not be known,
would need access to the memslot and whether it's writeable.

No functional change intended.

Bug: 357781595
Change-Id: I490c6d41b19825e0a8d05e3a5af660ce31a894d4
Signed-off-by: Fuad Tabba <tabba@google.com>
2024-11-14 15:18:52 +00:00
Linus Torvalds d129377639 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix the guest view of the ID registers, making the relevant fields
     writable from userspace (affecting ID_AA64DFR0_EL1 and
     ID_AA64PFR1_EL1)

   - Correcly expose S1PIE to guests, fixing a regression introduced in
     6.12-rc1 with the S1POE support

   - Fix the recycling of stage-2 shadow MMUs by tracking the context
     (are we allowed to block or not) as well as the recycling state

   - Address a couple of issues with the vgic when userspace
     misconfigures the emulation, resulting in various splats. Headaches
     courtesy of our Syzkaller friends

   - Stop wasting space in the HYP idmap, as we are dangerously close to
     the 4kB limit, and this has already exploded in -next

   - Fix another race in vgic_init()

   - Fix a UBSAN error when faking the cache topology with MTE enabled

  RISCV:

   - RISCV: KVM: use raw_spinlock for critical section in imsic

  x86:

   - A bandaid for lack of XCR0 setup in selftests, which causes trouble
     if the compiler is configured to have x86-64-v3 (with AVX) as the
     default ISA. Proper XCR0 setup will come in the next merge window.

   - Fix an issue where KVM would not ignore low bits of the nested CR3
     and potentially leak up to 31 bytes out of the guest memory's
     bounds

   - Fix case in which an out-of-date cached value for the segments
     could by returned by KVM_GET_SREGS.

   - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL

   - Override MTRR state for KVM confidential guests, making it WB by
     default as is already the case for Hyper-V guests.

  Generic:

   - Remove a couple of unused functions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
  RISCV: KVM: use raw_spinlock for critical section in imsic
  KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
  KVM: selftests: x86: Avoid using SSE/AVX instructions
  KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
  KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
  KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
  KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
  KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
  x86/kvm: Override default caching mode for SEV-SNP and TDX
  KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
  KVM: Remove unused kvm_vcpu_gfn_to_pfn
  KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
  KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
  KVM: arm64: Fix shift-out-of-bounds bug
  KVM: arm64: Shave a few bytes from the EL2 idmap code
  KVM: arm64: Don't eagerly teardown the vgic on init error
  KVM: arm64: Expose S1PIE to guests
  KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
  KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
  KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
  ...
2024-10-21 11:22:04 -07:00
Dr. David Alan Gilbert bc07eea2f3 KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit
1bbc60d0c7 ("KVM: x86/mmu: Remove MMU auditing")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Message-ID: <20241001141354.18009-3-linux@treblig.org>
[Adjust Documentation/virt/kvm/locking.rst. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:05:51 -04:00
Dr. David Alan Gilbert 88a387cf9e KVM: Remove unused kvm_vcpu_gfn_to_pfn
The last use of kvm_vcpu_gfn_to_pfn was removed by commit
b1624f99aa ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Message-ID: <20241001141354.18009-2-linux@treblig.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:04:52 -04:00
Peter Zijlstra cd9626e9eb sched/fair: Fix external p->on_rq users
Sean noted that ever since commit 152e11f6df ("sched/fair: Implement
delayed dequeue") KVM's preemption notifiers have started
mis-classifying preemption vs blocking.

Notably p->on_rq is no longer sufficient to determine if a task is
runnable or blocked -- the aforementioned commit introduces tasks that
remain on the runqueue even through they will not run again, and
should be considered blocked for many cases.

Add the task_is_runnable() helper to classify things and audit all
external users of the p->on_rq state. Also add a few comments.

Fixes: 152e11f6df ("sched/fair: Implement delayed dequeue")
Reported-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20241010091843.GK33184@noisy.programming.kicks-ass.net
2024-10-14 09:14:35 +02:00
Linus Torvalds 3efc57369a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini:
 "x86:

   - KVM currently invalidates the entirety of the page tables, not just
     those for the memslot being touched, when a memslot is moved or
     deleted.

     This does not traditionally have particularly noticeable overhead,
     but Intel's TDX will require the guest to re-accept private pages
     if they are dropped from the secure EPT, which is a non starter.

     Actually, the only reason why this is not already being done is a
     bug which was never fully investigated and caused VM instability
     with assigned GeForce GPUs, so allow userspace to opt into the new
     behavior.

   - Advertise AVX10.1 to userspace (effectively prep work for the
     "real" AVX10 functionality that is on the horizon)

   - Rework common MSR handling code to suppress errors on userspace
     accesses to unsupported-but-advertised MSRs

     This will allow removing (almost?) all of KVM's exemptions for
     userspace access to MSRs that shouldn't exist based on the vCPU
     model (the actual cleanup is non-trivial future work)

   - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
     splits the 64-bit value into the legacy ICR and ICR2 storage,
     whereas Intel (APICv) stores the entire 64-bit value at the ICR
     offset

   - Fix a bug where KVM would fail to exit to userspace if one was
     triggered by a fastpath exit handler

   - Add fastpath handling of HLT VM-Exit to expedite re-entering the
     guest when there's already a pending wake event at the time of the
     exit

   - Fix a WARN caused by RSM entering a nested guest from SMM with
     invalid guest state, by forcing the vCPU out of guest mode prior to
     signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
     nested guest)

   - Overhaul the "unprotect and retry" logic to more precisely identify
     cases where retrying is actually helpful, and to harden all retry
     paths against putting the guest into an infinite retry loop

   - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
     rmaps in the shadow MMU

   - Refactor pieces of the shadow MMU related to aging SPTEs in
     prepartion for adding multi generation LRU support in KVM

   - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
     enabled, i.e. when the CPU has already flushed the RSB

   - Trace the per-CPU host save area as a VMCB pointer to improve
     readability and cleanup the retrieval of the SEV-ES host save area

   - Remove unnecessary accounting of temporary nested VMCB related
     allocations

   - Set FINAL/PAGE in the page fault error code for EPT violations if
     and only if the GVA is valid. If the GVA is NOT valid, there is no
     guest-side page table walk and so stuffing paging related metadata
     is nonsensical

   - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
     instead of emulating posted interrupt delivery to L2

   - Add a lockdep assertion to detect unsafe accesses of vmcs12
     structures

   - Harden eVMCS loading against an impossible NULL pointer deref
     (really truly should be impossible)

   - Minor SGX fix and a cleanup

   - Misc cleanups

  Generic:

   - Register KVM's cpuhp and syscore callbacks when enabling
     virtualization in hardware, as the sole purpose of said callbacks
     is to disable and re-enable virtualization as needed

   - Enable virtualization when KVM is loaded, not right before the
     first VM is created

     Together with the previous change, this simplifies a lot the logic
     of the callbacks, because their very existence implies
     virtualization is enabled

   - Fix a bug that results in KVM prematurely exiting to userspace for
     coalesced MMIO/PIO in many cases, clean up the related code, and
     add a testcase

   - Fix a bug in kvm_clear_guest() where it would trigger a buffer
     overflow _if_ the gpa+len crosses a page boundary, which thankfully
     is guaranteed to not happen in the current code base. Add WARNs in
     more helpers that read/write guest memory to detect similar bugs

  Selftests:

   - Fix a goof that caused some Hyper-V tests to be skipped when run on
     bare metal, i.e. NOT in a VM

   - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
     guest

   - Explicitly include one-off assets in .gitignore. Past Sean was
     completely wrong about not being able to detect missing .gitignore
     entries

   - Verify userspace single-stepping works when KVM happens to handle a
     VM-Exit in its fastpath

   - Misc cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  Documentation: KVM: fix warning in "make htmldocs"
  s390: Enable KVM_S390_UCONTROL config in debug_defconfig
  selftests: kvm: s390: Add VM run test case
  KVM: SVM: let alternatives handle the cases when RSB filling is required
  KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
  KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
  KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
  KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
  KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
  KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
  KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
  KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
  KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
  KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
  KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
  KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
  KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
  KVM: x86: Update retry protection fields when forcing retry on emulation failure
  KVM: x86: Apply retry protection to "unprotect on failure" path
  KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
  ...
2024-09-28 09:20:14 -07:00
Al Viro cb787f4ac0 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 08:18:43 -07:00
Linus Torvalds f8ffbc365f Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct fd' updates from Al Viro:
 "Just the 'struct fd' layout change, with conversion to accessor
  helpers"

* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  add struct fd constructors, get rid of __to_fd()
  struct fd: representation change
  introduce fd_file(), convert all accessors to it.
2024-09-23 09:35:36 -07:00
Paolo Bonzini 7056c4e2a1 Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD
KVK generic changes for 6.12:

 - Fix a bug that results in KVM prematurely exiting to userspace for coalesced
   MMIO/PIO in many cases, clean up the related code, and add a testcase.

 - Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
   the gpa+len crosses a page boundary, which thankfully is guaranteed to not
   happen in the current code base.  Add WARNs in more helpers that read/write
   guest memory to detect similar bugs.
2024-09-17 11:38:22 -04:00
Peter Xu 5731aacd54 KVM: use follow_pfnmap API
Use the new pfnmap API to allow huge MMIO mappings for VMs.  The rest work
is done perfectly on the other side (host_pfn_mapping_level()).

Link: https://lkml.kernel.org/r/20240826204353.2228736-11-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 01:06:59 -07:00
Sean Christopherson 025dde582b KVM: Harden guest memory APIs against out-of-bounds accesses
When reading or writing a guest page, WARN and bail if offset+len would
result in a read to a different page so that KVM bugs are more likely to
be detected, and so that any such bugs are less likely to escalate to an
out-of-bounds access.  E.g. if userspace isn't using guard pages and the
target page is at the end of a memslot.

Note, KVM already hardens itself in similar APIs, e.g. in the "cached"
variants, it's just the vanilla APIs that are playing with fire.

Link: https://lore.kernel.org/r/20240829191413.900740-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09 20:15:34 -07:00
Sean Christopherson ec495f2ab1 KVM: Write the per-page "segment" when clearing (part of) a guest page
Pass "seg" instead of "len" when writing guest memory in kvm_clear_guest(),
as "seg" holds the number of bytes to write for the current page, while
"len" holds the total bytes remaining.

Luckily, all users of kvm_clear_guest() are guaranteed to not cross a page
boundary, and so the bug is unhittable in the current code base.

Fixes: 2f5414423e ("KVM: remove kvm_clear_guest_page")
Reported-by: zyr_ms@outlook.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219104
Link: https://lore.kernel.org/r/20240829191413.900740-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09 20:15:34 -07:00
Sean Christopherson b67107a251 KVM: Add arch hooks for enabling/disabling virtualization
Add arch hooks that are invoked when KVM enables/disable virtualization.
x86 will use the hooks to register an "emergency disable" callback, which
is essentially an x86-specific shutdown notifier that is used when the
kernel is doing an emergency reboot/shutdown/kexec.

Add comments for the declarations to help arch code understand exactly
when the callbacks are invoked.  Alternatively, the APIs themselves could
communicate most of the same info, but kvm_arch_pre_enable_virtualization()
and kvm_arch_post_disable_virtualization() are a bit cumbersome, and make
it a bit less obvious that they are intended to be implemented as a pair.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-04 11:02:33 -04:00
Sean Christopherson b4886fab6f KVM: Add a module param to allow enabling virtualization when KVM is loaded
Add an on-by-default module param, enable_virt_at_load, to let userspace
force virtualization to be enabled in hardware when KVM is initialized,
i.e. just before /dev/kvm is exposed to userspace.  Enabling virtualization
during KVM initialization allows userspace to avoid the additional latency
when creating/destroying the first/last VM (or more specifically, on the
0=>1 and 1=>0 edges of creation/destruction).

Now that KVM uses the cpuhp framework to do per-CPU enabling, the latency
could be non-trivial as the cpuhup bringup/teardown is serialized across
CPUs, e.g. the latency could be problematic for use case that need to spin
up VMs quickly.

Prior to commit 10474ae894 ("KVM: Activate Virtualization On Demand"),
KVM _unconditionally_ enabled virtualization during load, i.e. there's no
fundamental reason KVM needs to dynamically toggle virtualization.  These
days, the only known argument for not enabling virtualization is to allow
KVM to be autoloaded without blocking other out-of-tree hypervisors, and
such use cases can simply change the module param, e.g. via command line.

Note, the aforementioned commit also mentioned that enabling SVM (AMD's
virtualization extensions) can result in "using invalid TLB entries".
It's not clear whether the changelog was referring to a KVM bug, a CPU
bug, or something else entirely.  Regardless, leaving virtualization off
by default is not a robust "fix", as any protection provided is lost the
instant userspace creates the first VM.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-04 11:02:33 -04:00
Sean Christopherson 071f24ad28 KVM: Rename arch hooks related to per-CPU virtualization enabling
Rename the per-CPU hooks used to enable virtualization in hardware to
align with the KVM-wide helpers in kvm_main.c, and to better capture that
the callbacks are invoked on every online CPU.

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240830043600.127750-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-04 11:02:33 -04:00
Sean Christopherson 70c0194337 KVM: Rename symbols related to enabling virtualization hardware
Rename the various functions (and a variable) that enable virtualization
to prepare for upcoming changes, and to clean up artifacts of KVM's
previous behavior, which required manually juggling locks around
kvm_usage_count.

Drop the "nolock" qualifier from per-CPU functions now that there are no
"nolock" implementations of the "all" variants, i.e. now that calling a
non-nolock function from a nolock function isn't confusing (unlike this
sentence).

Drop "all" from the outer helpers as they no longer manually iterate
over all CPUs, and because it might not be obvious what "all" refers to.

In lieu of the above qualifiers, append "_cpu" to the end of the functions
that are per-CPU helpers for the outer APIs.

Opportunistically prepend "kvm" to all functions to help make it clear
that they are KVM helpers, but mostly because there's no reason not to.

Lastly, use "virtualization" instead of "hardware", because while the
functions do enable virtualization in hardware, there are a _lot_ of
things that KVM enables in hardware.

Defer renaming the arch hooks to future patches, purely to reduce the
amount of churn in a single commit.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-04 11:02:33 -04:00
Sean Christopherson 9a798b1337 KVM: Register cpuhp and syscore callbacks when enabling hardware
Register KVM's cpuhp and syscore callback when enabling virtualization
in hardware instead of registering the callbacks during initialization,
and let the CPU up/down framework invoke the inner enable/disable
functions.  Registering the callbacks during initialization makes things
more complex than they need to be, as KVM needs to be very careful about
handling races between enabling CPUs being onlined/offlined and hardware
being enabled/disabled.

Intel TDX support will require KVM to enable virtualization during KVM
initialization, i.e. will add another wrinkle to things, at which point
sorting out the potential races with kvm_usage_count would become even
more complex.

Note, using the cpuhp framework has a subtle behavioral change: enabling
will be done serially across all CPUs, whereas KVM currently sends an IPI
to all CPUs in parallel.  While serializing virtualization enabling could
create undesirable latency, the issue is limited to the 0=>1 transition of
VM creation.  And even that can be mitigated, e.g. by letting userspace
force virtualization to be enabled when KVM is initialized.

Cc: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-04 11:02:33 -04:00
Sean Christopherson 44d1745962 KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
on x86 due to a chain of locks and SRCU synchronizations.  Translating the
below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
fairness of r/w semaphores).

    CPU0                     CPU1                     CPU2
1   lock(&kvm->slots_lock);
2                                                     lock(&vcpu->mutex);
3                                                     lock(&kvm->srcu);
4                            lock(cpu_hotplug_lock);
5                            lock(kvm_lock);
6                            lock(&kvm->slots_lock);
7                                                     lock(cpu_hotplug_lock);
8   sync(&kvm->srcu);

Note, there are likely more potential deadlocks in KVM x86, e.g. the same
pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
__kvmclock_cpufreq_notifier():

  cpuhp_cpufreq_online()
  |
  -> cpufreq_online()
     |
     -> cpufreq_gov_performance_limits()
        |
        -> __cpufreq_driver_target()
           |
           -> __target_index()
              |
              -> cpufreq_freq_transition_begin()
                 |
                 -> cpufreq_notify_transition()
                    |
                    -> ... __kvmclock_cpufreq_notifier()

But, actually triggering such deadlocks is beyond rare due to the
combination of dependencies and timings involved.  E.g. the cpufreq
notifier is only used on older CPUs without a constant TSC, mucking with
the NX hugepage mitigation while VMs are running is very uncommon, and
doing so while also onlining/offlining a CPU (necessary to generate
contention on cpu_hotplug_lock) would be even more unusual.

The most robust solution to the general cpu_hotplug_lock issue is likely
to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq
notifier doesn't to take kvm_lock.  For now, settle for fixing the most
blatant deadlock, as switching to an RCU-protected list is a much more
involved change, but add a comment in locking.rst to call out that care
needs to be taken when walking holding kvm_lock and walking vm_list.

  ======================================================
  WARNING: possible circular locking dependency detected
  6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S         O
  ------------------------------------------------------
  tee/35048 is trying to acquire lock:
  ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm]

  but task is already holding lock:
  ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]

  which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

  -> #3 (kvm_lock){+.+.}-{3:3}:
         __mutex_lock+0x6a/0xb40
         mutex_lock_nested+0x1f/0x30
         kvm_dev_ioctl+0x4fb/0xe50 [kvm]
         __se_sys_ioctl+0x7b/0xd0
         __x64_sys_ioctl+0x21/0x30
         x64_sys_call+0x15d0/0x2e60
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x76/0x7e

  -> #2 (cpu_hotplug_lock){++++}-{0:0}:
         cpus_read_lock+0x2e/0xb0
         static_key_slow_inc+0x16/0x30
         kvm_lapic_set_base+0x6a/0x1c0 [kvm]
         kvm_set_apic_base+0x8f/0xe0 [kvm]
         kvm_set_msr_common+0x9ae/0xf80 [kvm]
         vmx_set_msr+0xa54/0xbe0 [kvm_intel]
         __kvm_set_msr+0xb6/0x1a0 [kvm]
         kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm]
         kvm_vcpu_ioctl+0x485/0x5b0 [kvm]
         __se_sys_ioctl+0x7b/0xd0
         __x64_sys_ioctl+0x21/0x30
         x64_sys_call+0x15d0/0x2e60
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x76/0x7e

  -> #1 (&kvm->srcu){.+.+}-{0:0}:
         __synchronize_srcu+0x44/0x1a0
         synchronize_srcu_expedited+0x21/0x30
         kvm_swap_active_memslots+0x110/0x1c0 [kvm]
         kvm_set_memslot+0x360/0x620 [kvm]
         __kvm_set_memory_region+0x27b/0x300 [kvm]
         kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm]
         kvm_vm_ioctl+0x295/0x650 [kvm]
         __se_sys_ioctl+0x7b/0xd0
         __x64_sys_ioctl+0x21/0x30
         x64_sys_call+0x15d0/0x2e60
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x76/0x7e

  -> #0 (&kvm->slots_lock){+.+.}-{3:3}:
         __lock_acquire+0x15ef/0x2e30
         lock_acquire+0xe0/0x260
         __mutex_lock+0x6a/0xb40
         mutex_lock_nested+0x1f/0x30
         set_nx_huge_pages+0x179/0x1e0 [kvm]
         param_attr_store+0x93/0x100
         module_attr_store+0x22/0x40
         sysfs_kf_write+0x81/0xb0
         kernfs_fop_write_iter+0x133/0x1d0
         vfs_write+0x28d/0x380
         ksys_write+0x70/0xe0
         __x64_sys_write+0x1f/0x30
         x64_sys_call+0x281b/0x2e60
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x76/0x7e

Cc: Chao Gao <chao.gao@intel.com>
Fixes: 0bf50497f0 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
Cc: stable@vger.kernel.org
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-04 11:02:33 -04:00
Sean Christopherson e027ba1b83 KVM: Clean up coalesced MMIO ring full check
Fold coalesced_mmio_has_room() into its sole caller, coalesced_mmio_write(),
as it's really just a single line of code, has a goofy return value, and
is unnecessarily brittle.

E.g. if coalesced_mmio_has_room() were to check ring->last directly, or
the caller failed to use READ_ONCE(), KVM would be susceptible to TOCTOU
attacks from userspace.

Opportunistically add a comment explaining why on earth KVM leaves one
entry free, which may not be obvious to readers that aren't familiar with
ring buffers.

No functional change intended.

Reviewed-by: Ilias Stamatis <ilstam@amazon.com>
Cc: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240828181446.652474-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 19:38:33 -07:00
Ilias Stamatis 92f6d41304 KVM: Fix coalesced_mmio_has_room() to avoid premature userspace exit
The following calculation used in coalesced_mmio_has_room() to check
whether the ring buffer is full is wrong and results in premature exits if
the start of the valid entries is in the first half of the ring buffer.

  avail = (ring->first - last - 1) % KVM_COALESCED_MMIO_MAX;
  if (avail == 0)
	  /* full */

Because negative values are handled using two's complement, and KVM
computes the result as an unsigned value, the above will get a false
positive if "first < last" and the ring is half-full.

The above might have worked as expected in python for example:
  >>> (-86) % 170
  84

However it doesn't work the same way in C.

  printf("avail: %d\n", (-86) % 170);
  printf("avail: %u\n", (-86) % 170);
  printf("avail: %u\n", (-86u) % 170u);

Using gcc-11 these print:

  avail: -86
  avail: 4294967210
  avail: 0

For illustration purposes, given a 4-bit integer and a ring size of 0xA
(unsigned), 0xA == 0x1010 == -6, and thus (-6u % 0xA) == 0.

Fix the calculation and allow all but one entries in the buffer to be
used as originally intended.

Note, KVM's behavior is self-healing to some extent, as KVM will allow the
entire buffer to be used if ring->first is beyond the halfway point.  In
other words, in the unlikely scenario that a use case benefits from being
able to coalesce more than 86 entries at once, KVM will still provide such
behavior, sometimes.

Note #2, the % operator in C is not the modulo operator but the remainder
operator. Modulo and remainder operators differ with respect to negative
values.  But, the relevant values in KVM are all unsigned, so it's a moot
point in this case anyway.

Note #3, this is almost a pure revert of the buggy commit, plus a
READ_ONCE() to provide additional safety.  Thue buggy commit justified the
change with "it paves the way for making this function lockless", but it's
not at all clear what was intended, nor is there any evidence that the
buggy code was somehow safer.  (a) the fields in question were already
accessed locklessly, from the perspective that they could be modified by
userspace at any time, and (b) the lock guarding the ring itself was
changed, but never dropped, i.e. whatever lockless scheme (SRCU?) was
planned never landed.

Fixes: 105f8d40a7 ("KVM: Calculate available entries in coalesced mmio ring")
Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240718193543.624039-2-ilstam@amazon.com
[sean: rework changelog to clarify behavior, call out weirdness of buggy commit]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-23 09:50:36 -07:00
Sean Christopherson 66155de93b KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
directly emulate instructions for ES/SNP, and instead the guest must
explicitly request emulation.  Unless the guest explicitly requests
emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
SPTE, with the subsequent #NPF being reflected into the guest as a #VC.

But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
because except for ES/SNP, doing so requires setting reserved bits in the
SPTE, i.e. the SPTE can't be readable while also generating a #VC on
writes.  Because KVM never creates MMIO SPTEs and jumps directly to
emulation, the guest never gets a #VC.  And since KVM simply resumes the
guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
into an infinite #NPF loop if the vCPU attempts to write read-only memory.

Disallow read-only memory for all VMs with protected state, i.e. for
upcoming TDX VMs as well as ES/SNP VMs.  For TDX, it's actually possible
to support read-only memory, as TDX uses EPT Violation #VE to reflect the
fault into the guest, e.g. KVM could configure read-only SPTEs with RX
protections and SUPPRESS_VE=0.  But there is no strong use case for
supporting read-only memslots on TDX, e.g. the main historical usage is
to emulate option ROMs, but TDX disallows executing from shared memory.
And if someone comes along with a legitimate, strong use case, the
restriction can always be lifted for TDX.

Don't bother trying to retroactively apply the restriction to SEV-ES
VMs that are created as type KVM_X86_DEFAULT_VM.  Read-only memslots can't
possibly work for SEV-ES, i.e. disallowing such memslots is really just
means reporting an error to userspace instead of silently hanging vCPUs.
Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
isn't worth the marginal benefit it would provide userspace.

Fixes: 26c44aa9e0 ("KVM: SEV: define VM types for SEV and SEV-ES")
Fixes: 1dfe571c12 ("KVM: SEV: Add initial SEV-SNP support")
Cc: Peter Gonda <pgonda@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerly Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240809190319.1710470-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-14 12:28:24 -04:00
Li RongQing c9b35a6f4e KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
When hot-unplug a device which has many queues, and guest CPU will has
huge jitter, and unplugging is very slow.

It turns out synchronize_srcu() in irqfd_shutdown() caused the guest
jitter and unplugging latency, so replace synchronize_srcu() with
synchronize_srcu_expedited(), to accelerate the unplugging, and reduce
the guest OS jitter, this accelerates the VM reboot too.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Message-ID: <20240711121130.38917-1-lirongqing@baidu.com>
[Call it just once in irqfd_resampler_shutdown. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13 12:09:35 -04:00
Al Viro 1da91ea87a introduce fd_file(), convert all accessors to it.
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
	Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
	This commit converts (almost) all of f.file to
fd_file(f).  It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).

	NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).

[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-08-12 22:00:43 -04:00
Paolo Bonzini 66a644c09f KVM: guest_memfd: abstract how prepared folios are recorded
Right now, large folios are not supported in guest_memfd, and therefore the order
used by kvm_gmem_populate() is always 0.  In this scenario, using the up-to-date
bit to track prepared-ness is nice and easy because we have one bit available
per page.

In the future, however, we might have large pages that are partially populated;
for example, in the case of SEV-SNP, if a large page has both shared and private
areas inside, it is necessary to populate it at a granularity that is smaller
than that of the guest_memfd's backing store.  In that case we will have
to track preparedness at a 4K level, probably as a bitmap.

In preparation for that, do not use explicitly folio_test_uptodate() and
folio_mark_uptodate().  Return the state of the page directly from
__kvm_gmem_get_pfn(), so that it is expected to apply to 2^N pages
with N=*max_order.  The function to mark a range as prepared for now
takes just a folio, but is expected to take also an index and order
(or something like that) when large pages are introduced.

Thanks to Michael Roth for pointing out the issue with large pages.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:15 -04:00
Paolo Bonzini e4ee544792 KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns
This check is currently performed by sev_gmem_post_populate(), but it
applies to all callers of kvm_gmem_populate(): the point of the function
is that the memory is being encrypted and some work has to be done
on all the gfns in order to encrypt them.

Therefore, check the KVM_MEMORY_ATTRIBUTE_PRIVATE attribute prior
to invoking the callback, and stop the operation if a shared page
is encountered.  Because CONFIG_KVM_PRIVATE_MEM in principle does
not require attributes, this makes kvm_gmem_populate() depend on
CONFIG_KVM_GENERIC_PRIVATE_MEM (which does require them).

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:15 -04:00
Paolo Bonzini 4b5f67120a KVM: extend kvm_range_has_memory_attributes() to check subset of attributes
While currently there is no other attribute than KVM_MEMORY_ATTRIBUTE_PRIVATE,
KVM code such as kvm_mem_is_private() is written to expect their existence.
Allow using kvm_range_has_memory_attributes() as a multi-page version of
kvm_mem_is_private(), without it breaking later when more attributes are
introduced.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:15 -04:00
Paolo Bonzini e300614f10 KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()
Use a guard to simplify early returns, and add two more easy
shortcuts.  If the requested attributes are invalid, the attributes
xarray will never show them as set.  And if testing a single page,
kvm_get_memory_attributes() is more efficient.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini de80252414 KVM: guest_memfd: move check for already-populated page to common code
Do not allow populating the same page twice with startup data.  In the
case of SEV-SNP, for example, the firmware does not allow it anyway,
since the launch-update operation is only possible on pages that are
still shared in the RMP.

Even if it worked, kvm_gmem_populate()'s callback is meant to have side
effects such as updating launch measurements, and updating the same
page twice is unlikely to have the desired results.

Races between calls to the ioctl are not possible because
kvm_gmem_populate() holds slots_lock and the VM should not be running.
But again, even if this worked on other confidential computing technology,
it doesn't matter to guest_memfd.c whether this is something fishy
such as missing synchronization in userspace, or rather something
intentional.  One of the racers wins, and the page is initialized by
either kvm_gmem_prepare_folio() or kvm_gmem_populate().

Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use
the same errno that kvm_gmem_populate() is using.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini 7239ed7467 KVM: remove kvm_arch_gmem_prepare_needed()
It is enough to return 0 if a guest need not do any preparation.
This is in fact how sev_gmem_prepare() works for non-SNP guests,
and it extends naturally to Intel hosts: the x86 callback for
gmem_prepare is optional and returns 0 if not defined.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini 6dd761d92f KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm
This is now possible because preparation is done by kvm_gmem_get_pfn()
instead of fallocate().  In practice this is not a limitation, because
even though guest_memfd can be bound to multiple struct kvm, for
hardware implementations of confidential computing only one guest
(identified by an ASID on SEV-SNP, or an HKID on TDX) will be able
to access it.

In the case of intra-host migration (not implemented yet for SEV-SNP,
but we can use SEV-ES as an idea of how it will work), the new struct
kvm inherits the same ASID and preparation need not be repeated.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini b85524314a KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest
Initializing the contents of the folio on fallocate() is unnecessarily
restrictive.  It means that the page is registered with the firmware and
then it cannot be touched anymore.  In particular, this loses the
possibility of using fallocate() to pre-allocate the page for SEV-SNP
guests, because kvm_arch_gmem_prepare() then fails.

It's only when the guest actually accesses the page (and therefore
kvm_gmem_get_pfn() is called) that the page must be cleared from any
stale host data and registered with the firmware.  The up-to-date flag
is clear if this has to be done (i.e. it is the first access and
kvm_gmem_populate() has not been called).

All in all, there are enough differences between kvm_gmem_get_pfn() and
kvm_gmem_populate(), that it's better to separate the two flows completely.
Extract the bulk of kvm_gmem_get_folio(), which take a folio and end up
setting its up-to-date flag, to a new function kvm_gmem_prepare_folio();
these are now done only by the non-__-prefixed kvm_gmem_get_pfn().
As a bonus, __kvm_gmem_get_pfn() loses its ugly "bool prepare" argument.

One difference is that fallocate(PUNCH_HOLE) can now race with a
page fault.  Potentially this causes a page to be prepared and into the
filemap even after fallocate(PUNCH_HOLE).  This is harmless, as it can be
fixed by another hole punching operation, and can be avoided by clearing
the private-page attribute prior to invoking fallocate(PUNCH_HOLE).
This way, the page fault will cause an exit to user space.

The previous semantics, where fallocate() could be used to prepare
the pages in advance of running the guest, can be accessed with
KVM_PRE_FAULT_MEMORY.

For now, accessing a page in one VM will attempt to call
kvm_arch_gmem_prepare() in all of those that have bound the guest_memfd.
Cleaning this up is left to a separate patch.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini 78c4293372 KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn
Allow testing the up-to-date flag in the caller without taking the
lock again.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini 564429a6bd KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*
Add "ARCH" to the symbols; shortly, the "prepare" phase will include both
the arch-independent step to clear out contents left in the page by the
host, and the arch-dependent step enabled by CONFIG_HAVE_KVM_GMEM_PREPARE.
For consistency do the same for CONFIG_HAVE_KVM_GMEM_INVALIDATE as well.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini 7fbdda31b0 KVM: guest_memfd: do not go through struct page
We have a perfectly usable folio, use it to retrieve the pfn and order.
All that's needed is a version of folio_file_page that returns a pfn.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini d04c77d231 KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation
The up-to-date flag as is now is not too useful; it tells guest_memfd not
to overwrite the contents of a folio, but it doesn't say that the page
is ready to be mapped into the guest.  For encrypted guests, mapping
a private page requires that the "preparation" phase has succeeded,
and at the same time the same page cannot be prepared twice.

So, ensure that folio_mark_uptodate() is only called on a prepared page.  If
kvm_gmem_prepare_folio() or the post_populate callback fail, the folio
will not be marked up-to-date; it's not a problem to call clear_highpage()
again on such a page prior to the next preparation attempt.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini d0d87226f5 KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()
Right now this is simply more consistent and avoids use of pfn_to_page()
and put_page().  It will be put to more use in upcoming patches, to
ensure that the up-to-date flag is set at the very end of both the
kvm_gmem_get_pfn() and kvm_gmem_populate() flows.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26 14:46:14 -04:00
Paolo Bonzini 86014c1e20 Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEAD
KVM generic changes for 6.11

 - Enable halt poll shrinking by default, as Intel found it to be a clear win.

 - Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.

 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().

 - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.

 - Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.

 - A few minor cleanups
2024-07-16 09:51:36 -04:00
Paolo Bonzini f4501e8bc8 Merge tag 'kvm-x86-fixes-6.10-11' of https://github.com/kvm-x86/linux into HEAD
KVM Xen:

Fix a bug where KVM fails to check the validity of an incoming userspace
virtual address and tries to activate a gfn_to_pfn_cache with a kernel address.
2024-07-16 09:51:14 -04:00
Paolo Bonzini c8b8b8190a Merge tag 'loongarch-kvm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.11

1. Add ParaVirt steal time support.
2. Add some VM migration enhancement.
3. Add perf kvm-stat support for loongarch.
2024-07-12 11:24:12 -04:00
Paolo Bonzini f3996d4d79 Merge branch 'kvm-prefault' into HEAD
Pre-population has been requested several times to mitigate KVM page faults
during guest boot or after live migration.  It is also required by TDX
before filling in the initial guest memory with measured contents.
Introduce it as a generic API.
2024-07-12 11:18:45 -04:00