Commit Graph

1482 Commits

Author SHA1 Message Date
Kalesh Singh
15b48eb602 Revert "ANDROID: 16K: Use vma_area slab cache for pad VMA"
This reverts aosp/I24c5f5d0eb3b06acf506f18f5eb57cd497b13d6d.

Bug: 440210631
Bug: 432564748
Change-Id: I936ae92313fa32fed80efe1bb35c9b4da0afd8d2
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2025-08-21 15:43:26 -07:00
Greg Kroah-Hartman
1741b1e583 Merge android16-6.12 into android16-6.12-lts
This merges the android16-6.12 branch into the -lts branch, catching
it up with the latest changes in there.

It contains the following commits:

* 21ed84930c UPSTREAM: Revert "usb: xhci: Implement xhci_handshake_check_state() helper"
* 5b3ae3bcbe BACKPORT: usb: xhci: Skip xhci_reset in xhci_resume if xhci is being removed
* 5c72e9faba ANDROID: rust_binder: adjust errors from death notifications
* 9e02edea7f ANDROID: rust_binder: use u64 for death cookie
* 4317f0aeff ANDROID: f2fs: fixup ABI break due to reserved_pin_section
* 25bdb4a624 Revert "ANDROID: ABI: update symbol list for honor"
* a76eb2b67b ANDROID: GKI: Update oplus symbol list
* 6222007a04 ANDROID: mm/readahead: add for bypass high order allocation
* 659d7bb454 ANDROID: ABI: Update symbol list for exynos
* 26937a37f5 ANDROID: MODVERSIONS: hide type definition in drivers/usb/core/driver.c
* 8760b6e4f5 ANDROID: usb: Add vendor hook for usb suspend and resume
* da662aecc8 FROMLIST: KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()
* 4be05c6524 FROMLIST: KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
* d6045efc66 FROMLIST: KVM: arm64: vgic-init: Remove vgic_ready() macro
* f06dd0cd35 ANDROID: rust_binder: release threads before refs
* 5bbd30a60b ANDROID: ABI: Update pixel symbol list
* bafbebf2ab ANDROID: GKI: Update symbol list for xiaomi
* b7b130b7cc ANDROID: export folio_deactivate() for GKI purpose.
* 41f730f9c4 ANDROID: GKI: update exynos symbol list
* 766ecae19f UPSTREAM: xhci: dbctty: disable ECHO flag by default
* 8ea40f5243 ANDROID: GKI: Update xiaomi symbol list.
* 5594b4731d ANDROID: vendor_hooks: export tracepoint symbols
* 0d4cc1daff ANDROID: KVM: arm64: Don't update IOMMU under memory pressure
* 672185e575 ANDROID: iommu/iommu: Handle multi-page deferred sg mappings
* 740d42d181 ANDROID: vendor_hooks: Add vendor_hook in futex to fix the OEM scheduling priority bug
* 6eb6f346ac ANDROID: ABI: Update symbol list for mtk
* c302079179 ANDROID: vendor_hooks: Add vendor hook for GenieZone demand paging
* 5c1cddc983 ANDROID: vendor_hooks: Add vendor hook for GenieZone para-virtualization
* d893caf112 ANDROID: ashmem_rust: Add support for retrieving an ashmem area's vmfile
* 0be74214c0 ANDROID: ashmem_rust: Add support for querying the size of an ashmem region
* eb50f663c4 ANDROID: ashmem_rust: Add support for providing an ashmem region's name
* 6bdbae6ea9 ANDROID: ashmem_rust: Add is_ashmem_file()
* 0d890f867e ANDROID: ABI: update symbol list for honor
* 12727f8a4b FROMGIT: f2fs: introduce reserved_pin_section sysfs entry
* 286cd9d628 ANDROID: GKI: Update RTK STB KMI symbol list
* 7b4f7682b5 ANDROID: GKI: Update symbol list for Amlogic
* 862ce4b2c4 ANDROID: KVM: arm64: iommu: Fix power tracking
* 61184996a8 ANDROID: drivers/iommu: Fix return value in iommu_map_sg
* acad0cd51d ANDROID: ABI: update symbol list for galaxy
* 393dbad32c ANDROID: vendor_hook: add condition to call for freezing fail
* b62fe47ba2 ANDROID: fix ashmem_rust return EINVAL bug in ashmem_rust.rs
* a7e1300b95 ANDROID: Revert "cpufreq: Avoid using inconsistent policy->min and policy->max"
* 15d2fe0544 ANDROID: qcom: Update the ABI symbol list
* f6ca783ba2 UPSTREAM: scsi: ufs: qcom: Check gear against max gear in vop freq_to_gear()
* 237708e9d3 ANDROID: GKI: Update symbols list file for honor White list the vm_normal_folio_pmd
* f18e354aa9 ANDROID: mm: export vm_normal_folio_pmd to allow vendors to implement simplified smaps
* c181c478b0 ANDROID: vendor_hooks: add hook to record slab free
* d2e452e197 ANDROID: Build fixups with PROXY_EXEC v18 + !CONFIG_SMP
* 4f9e4406e4 ANDROID: Update proxy-exec logic from v14 to v18
* 3fa8dabe1a ANDROID: GKI: update asr symbols list
* 94310b3f77 ANDROID: Add the dma header to aarch64 allowlist
* 880d6538c5 UPSTREAM: usb: gadget: u_serial: Fix race condition in TTY wakeup
* b115bf2302 ANDROID: ABI: Update symbol list for mtk
* e87018c5f9 FROMGIT: sched/deadline: Fix dl_server runtime calculation formula
* e2bf362ee2 FROMGIT: sched/core: Fix migrate_swap() vs. hotplug
* 06ca12d7d2 ANDROID: GKI: update the ABI symbol list
* 55972ed83a ANDROID: Fixup init_user_ns CRC change
* 4e873ad607 ANDROID: user: Add vendor hook to user for GKI purpose
* a097cd9c30 ANDROID: export find_user() for GKI purpose.
* 85b8233f7e ANDROID: rust_binder: use euid from the task
* 969c904869 ANDROID: ashmem: rename VmAreaNew->VmaNew
* 2ab3e5f283 ANDROID: rust_binder: rename VmAreaNew->VmaNew
* 2ef75ab83a ANDROID: rust_binder: use tgid_nr_ns for getting pid
* 6a2be11026 UPSTREAM: task: rust: rework how current is accessed
* 602e2300de UPSTREAM: rust: add PidNamespace
* 12dfc1d9cb UPSTREAM: rust: miscdevice: add mmap support
* 8e67cb756f UPSTREAM: mm: rust: add VmaNew for f_ops->mmap()
* bd140ddf75 UPSTREAM: mm: rust: add mmput_async support
* 0c50773076 UPSTREAM: mm: rust: add lock_vma_under_rcu
* 0b5465bb31 UPSTREAM: mm: rust: add vm_insert_page
* d7f52612c5 UPSTREAM: mm: rust: add vm_area_struct methods that require read access
* f03d4f7490 UPSTREAM: mm: rust: add abstraction for struct mm_struct
* 2ef6dbc73e BACKPORT: rust: miscdevice: change how f_ops vtable is constructed
* 1acd3b312f Revert "FROMLIST: mm: rust: add abstraction for struct mm_struct"
* a012c15566 Revert "FROMLIST: mm: rust: add vm_area_struct methods that require read access"
* 3be00a9bf8 Revert "FROMLIST: mm: rust: add vm_insert_page"
* 3aed88205e Revert "FROMLIST: mm: rust: add lock_vma_under_rcu"
* a121b6e72f Revert "FROMLIST: mm: rust: add mmput_async support"
* 9248564a81 Revert "FROMLIST: mm: rust: add VmAreaNew for f_ops->mmap()"
* 6de3ace5b5 Revert "FROMLIST: rust: miscdevice: add mmap support"
* b7f54dd23b Revert "BACKPORT: FROMLIST: task: rust: rework how current is accessed"
* 5913c80b22 ANDROID: iommu/arm-smmu-v3-kvm: Fix idmap free_leaf
* c40c54e669 UPSTREAM: erofs: impersonate the opener's credentials when accessing backing file
* 4d0200d0a9 BACKPORT: erofs: add 'fsoffset' mount option to specify filesystem offset
* 399deda7b5 ANDROID: scsi: ufs: add UFSHCD_ANDROID_QUIRK_NO_IS_READ_ON_H8
* f6b1ab83f6 ANDROID: rust_binder: remove binder_logs/procs/pid immediately
* dd35623c83 ANDROID: ABI: update symbol list for mtktv
* 58beebb30f FROMLIST: fuse: give wakeup hints to the scheduler
* 0f917e4066 ANDROID: virt: gunyah: Replace arm_smccc_1_1_smc with arm_smccc_1_1_invoke
* 33429dd323 UPSTREAM: posix-cpu-timers: fix race between handle_posix_cpu_timers() and posix_cpu_timer_del()
* 6483832947 ANDROID: GKI: Update symbol list file for xiaomi
* 668635cd34 UPSTREAM: usb: gadget: uvc: dont call usb_composite_setup_continue when not streaming

Change-Id: I64074144d1a6da9fdd3b4dd5f8314ccea4f9d9e8
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2025-07-13 12:17:44 +00:00
John Stultz
4f9e4406e4 ANDROID: Update proxy-exec logic from v14 to v18
This updates the proxy-exec logic in android16-6.12
which was added at v14, to be synced with the v18
series of the patchset.

v14 series:
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v14-6.12

v18 series:
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v18-6.12

Changes since v14:
* Improved naming consistency and using the guard macro where
  appropriate
* Improved comments
* Build fixes for !CONFIG_SMP
* Fixes for when sched_proxy_exec() is disabled
* Renamed update_curr_se to update_se_times, as suggested by
  Steven Rostedt.
* Use put_prev_set_next_task as suggested by K Prateek Nayak
* Try to rework find_proxy_task() locking to use guard and
  proxy_deactivate_task() in the way Peter suggested.
* Simplified changes to enqueue_task_rt to match deadline's
  logic, as pointed out by Peter
* Get rid of preserve_need_resched flag and rework per Peter's
  suggestion
* Rework find_proxy_task() to use guard to cleanup the exit
  gotos as Peter suggested.
* Reworked the forced return-migration from find_proxy_task to
  use Peter’s dequeue+wakeup approach, which helps resolve the
  cpuhotplug issues I had also seen, caused by the manual return
  migration sending tasks to offline cpus.
* A number of improvements to the commit messages and comments
  suggested by Juri Lelli and Peter Zijlstra
* Added missing logic to put_prev_task_dl as pointed out by
  K Prateek Nayak
* Add lockdep_assert_held_once and drop the READ_ONCE in
  __get_task_blocked_on(), as suggested by Juri Lelli
* Moved update_curr_task logic into update_curr_se to simplify
  things
* Renamed update_se_times to update_se, as suggested by Peter
* Reworked logic to fix an issue Peter pointed out with thread
  group accounting being done on the donor, rather than the
  running execution context.
* Fixed typos caught by Metin Kaya
* Suleiman Souhlal noticed an inefficiency in that we evaluate
  if the lock owner’s task_cpu() is the current cpu, before we
  look to see if the lock owner is on_rq at all. With v17 this
  would result in us proxy-migrating a donor to a remote cpu,
  only to then realize the task wasn’t even on the runqueue,
  and doing the sleeping owner enqueuing. Suleiman suggested
  instead that we evaluate on_rq first, so we can immediately do
  sleeping owner enqueueing. Then only if the owner is on a
  runqueue do we proxy-migrate the donor (which requires the
  more costly lock juggling). While not a huge logical change,
  it did uncover other problems, which needed to be resolved.
* One issue found was there was a race where if
  do_activate_blocked_waiter() from the sleeping owner wakeup
  was delayed and the task had already been woken up elsewhere.
  It’s possible if that task was running and called into
  schedule() to be blocked, it would be dequeued from the
  runqueue, but before we switched to the new task,
  do_activate_blocked_waiter() might try to activate it on a
  different cpu. Clearly the do_activate_blocked_waiter() needed
  to check the task on_cpu value as well.
* I found that we still can hit wakeups that end up skipping the
  BO_WAKING -> BO_RUNNALBE transition (causing find_proxy_task()
  to end up spinning waiting for that transition), so I re-added
  the logic to handle doing return migrations from
  find_proxy_task() if we hit that case.
* Hupu suggested a tweak in ttwu_runnable() to evaluate
  proxy_needs_return() slightly earlier.
* Kuyo Chang reported and isolated a fix for a problem with
  __task_is_pushable() in the !sched_proxy_exec case, which was
  folded into the “sched: Fix rt/dl load balancing via chain
  level balance” patch
* Reworked some of the logic around releasing the rq->donor
  reference on migrations, using rq->idle directly.
* Sueliman also pointed out that some added task_struct elements
  were not being initialized in the init_task code path, so that
  was good to fix.

Bug: 427820735
Change-Id: I20ce778e474124a917dbf51378dc1301535ac858
Signed-off-by: John Stultz <jstultz@google.com>
2025-07-07 12:27:42 -07:00
Greg Kroah-Hartman
69f799168c Merge 6.12.31 into android16-6.12-lts
GKI (arm64) relevant 137 out of 624 changes, affecting 192 files +1647/-1035
  a4f865ecdb nvmem: core: fix bit offsets of more than one byte [1 file, +17/-7]
  4327479e55 nvmem: core: verify cell's raw_len [1 file, +12/-0]
  410f8b72e0 nvmem: core: update raw_len if the bit reading is required [1 file, +3/-1]
  7aea1517fb scsi: ufs: Introduce quirk to extend PA_HIBERN8TIME for UFS devices [2 files, +35/-0]
  b730cb1096 virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN [1 file, +1/-1]
  2998813177 dma/mapping.c: dev_dbg support for dma_addressing_limited [1 file, +10/-1]
  3eec42a17a dma-mapping: avoid potential unused data compilation warning [1 file, +8/-4]
  97edaa0ec6 cgroup: Fix compilation issue due to cgroup_mutex not being exported [1 file, +1/-1]
  f93675793b vhost_task: fix vhost_task_create() documentation [1 file, +1/-1]
  e22034cbee dma-mapping: Fix warning reported for missing prototype [1 file, +8/-8]
  4f5553a08f fs/buffer: split locking for pagecache lookups [1 file, +25/-16]
  e138fc2316 fs/buffer: introduce sleeping flavors for pagecache lookups [2 files, +17/-0]
  a49a4a87ce fs/buffer: use sleeping version of __find_get_block() [1 file, +9/-2]
  f1c5aa614b fs/jbd2: use sleeping version of __find_get_block() [1 file, +9/-6]
  9ece099e95 fs/ext4: use sleeping version of sb_find_get_block() [1 file, +2/-1]
  64f505b08e block: fix race between set_blocksize and read paths [4 files, +43/-1]
  218c838d03 io_uring: don't duplicate flushing in io_req_post_cqe [1 file, +8/-3]
  8014d3e56e bpf: fix possible endless loop in BPF map iteration [1 file, +1/-1]
  d40ca27602 fuse: Return EPERM rather than ENOSYS from link() [1 file, +2/-0]
  bab0bd1389 exfat: call bh_read in get_block only when necessary [1 file, +77/-82]
  01677e7ee1 io_uring/msg: initialise msg request opcode [1 file, +1/-0]
  e506751b7d arm64: Add support for HIP09 Spectre-BHB mitigation [2 files, +3/-0]
  4f427ca9ed tracing: Mark binary printing functions with __printf() attribute [4 files, +18/-21]
  15787ab82a mailbox: use error ret code of of_parse_phandle_with_args() [1 file, +4/-3]
  f48ee562c0 Bluetooth: Disable SCO support if READ_VOICE_SETTING is unsupported/broken [1 file, +3/-0]
  44b79041c4 dql: Fix dql->limit value when reset. [1 file, +1/-1]
  ac30595154 lockdep: Fix wait context check on softirq for PREEMPT_RT [1 file, +18/-0]
  e63b634806 PCI: dwc: ep: Ensure proper iteration over outbound map windows [1 file, +1/-1]
  37ac2434aa ext4: on a remount, only log the ro or r/w state when it has changed [1 file, +4/-3]
  1d1e1efad1 libnvdimm/labels: Fix divide error in nd_label_data_init() [1 file, +2/-1]
  123bcd8f42 pidfs: improve multi-threaded exec and premature thread-group leader exit polling [3 files, +9/-9]
  8f82cf305e cgroup/rstat: avoid disabling irqs for O(num_cpu) [1 file, +5/-7]
  a5a507fa5f blk-cgroup: improve policy registration error handling [1 file, +12/-10]
  94c3cbc69a ext4: reorder capability check last [1 file, +2/-2]
  e658f2d94a bpf: Return prog btf_id without capable check [1 file, +2/-2]
  e2520cc19b PCI: dwc: Use resource start as ioremap() input in dw_pcie_pme_turn_off() [1 file, +1/-1]
  50452704ec jbd2: do not try to recover wiped journal [1 file, +6/-5]
  dab35f4921 tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() [1 file, +32/-24]
  555c0b713c bpf: Allow pre-ordering for bpf cgroup progs [5 files, +30/-9]
  572ed3fb99 kconfig: do not clear SYMBOL_VALID when reading include/config/auto.conf [1 file, +12/-7]
  174dedce64 dm: restrict dm device size to 2^63-512 bytes [1 file, +4/-0]
  2f5f326214 ext4: reject the 'data_err=abort' option in nojournal mode [1 file, +12/-0]
  d0dc233fe2 posix-timers: Add cond_resched() to posix_timer_add() search loop [1 file, +1/-0]
  ae22452d15 posix-timers: Ensure that timer initialization is fully visible [1 file, +14/-7]
  3fb9ee05ec timer_list: Don't use %pK through printk() [1 file, +2/-2]
  21153e0974 netfilter: conntrack: Bound nf_conntrack sysctl writes [1 file, +9/-3]
  236a87e9d2 PNP: Expand length of fixup id string [1 file, +1/-1]
  6215143ad3 arm64/mm: Check pmd_table() in pmd_trans_huge() [1 file, +12/-12]
  8ad58a7eba arm64/mm: Check PUD_TYPE_TABLE in pud_bad() [1 file, +2/-1]
  28306c58da mmc: sdhci: Disable SD card clock before changing parameters [1 file, +7/-2]
  3a75fe58a1 usb: xhci: Don't change the status of stalled TDs on failed Stop EP [1 file, +11/-1]
  101a3b9920 printk: Check CON_SUSPEND when unblanking a console [1 file, +12/-2]
  faba68a86a wifi: cfg80211: allow IR in 20 MHz configurations [5 files, +46/-25]
  c1502fc84d ipv6: save dontfrag in cork [2 files, +6/-4]
  75ae2a3553 badblocks: Fix a nonsense WARN_ON() which checks whether a u64 variable < 0 [1 file, +3/-2]
  7caad075ac crypto: lzo - Fix compression buffer overrun [6 files, +106/-28]
  73d01bcbf2 tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() [1 file, +26/-11]
  1c17190880 usb: xhci: set page size to the xHCI-supported size [2 files, +22/-20]
  93f581d763 drm/gem: Test for imported GEM buffers with helper [2 files, +16/-2]
  c4525b513d net: phylink: use pl->link_interface in phylink_expects_phy() [1 file, +1/-1]
  f29c876d72 perf/core: Clean up perf_try_init_event() [1 file, +38/-27]
  af73c8fd73 ublk: enforce ublks_max only for unprivileged devices [1 file, +27/-15]
  592ba27580 perf/hw_breakpoint: Return EOPNOTSUPP for unsupported breakpoint type [1 file, +3/-2]
  3de322a98b scsi: logging: Fix scsi_logging_level bounds [1 file, +3/-1]
  f33b310eac ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config(). [2 files, +16/-24]
  564f03a797 block: mark bounce buffering as incompatible with integrity [2 files, +5/-2]
  82209faa87 ublk: complete command synchronously on error [1 file, +6/-5]
  b98aad5e5e media: uvcvideo: Add sanity check to uvc_ioctl_xu_ctrl_map [1 file, +6/-0]
  2d6231d5ce media: uvcvideo: Handle uvc menu translation inside uvc_get_le_value [1 file, +32/-45]
  e359d62886 perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters [1 file, +2/-2]
  673dde8d3c bpf: Search and add kfuncs in struct_ops prologue and epilogue [1 file, +24/-1]
  083383aba0 cpuidle: menu: Avoid discarding useful information [1 file, +12/-1]
  20a53c3689 loop: check in LO_FLAGS_DIRECT_IO in loop_default_blocksize [1 file, +1/-1]
  b55a97d1bd dm: fix unconditional IO throttle caused by REQ_PREFLUSH [1 file, +6/-2]
  9f27b38771 crypto: ahash - Set default reqsize from ahash_alg [2 files, +7/-0]
  897c98fb32 crypto: skcipher - Zap type in crypto_alloc_sync_skcipher [1 file, +1/-0]
  4d9fa2ebc0 net: ipv6: Init tunnel link-netns before registering dev [4 files, +9/-7]
  53f42776e4 genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie [2 files, +25/-36]
  2b129e89b8 bpf: don't do clean_live_states when state->loop_entry->branches > 0 [1 file, +4/-0]
  46ba5757a7 bpf: copy_verifier_state() should copy 'loop_entry' field [1 file, +3/-0]
  82b54455b6 PCI: Fix old_size lower bound in calculate_iosize() too [1 file, +2/-4]
  dc5f5c9d2b hrtimers: Replace hrtimer_clock_to_base_table with switch-case [1 file, +12/-17]
  000dd6e344 ASoC: ops: Enforce platform maximum on initial value [1 file, +28/-1]
  c4260bf83b ASoC: soc-dai: check return value at snd_soc_dai_set_tdm_slot() [1 file, +5/-3]
  5b1b4cb46d pinctrl: devicetree: do not goto err when probing hogs in pinctrl_dt_to_map [1 file, +8/-2]
  69689d1138 media: v4l: Memset argument to 0 before calling get_mbus_config pad op [2 files, +5/-1]
  e6e31b0182 sched: Reduce the default slice to avoid tasks getting an extra tick [1 file, +3/-3]
  ef31dc41cf phy: core: don't require set_mode() callback for phy_get_mode() to work [1 file, +4/-3]
  06daedb443 xfrm: prevent high SEQ input in non-ESN mode [1 file, +12/-0]
  9f2911868a ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure(). [2 files, +4/-4]
  7fea5a9140 r8152: add vendor/device ID pair for Dell Alienware AW1022z [2 files, +2/-0]
  16ddd67bb5 pstore: Change kmsg_bytes storage size to u32 [3 files, +9/-8]
  73733c2fdb ext4: don't write back data before punch hole in nojournal mode [1 file, +5/-13]
  1d15319323 f2fs: introduce f2fs_base_attr for global sysfs entries [1 file, +52/-22]
  ded26f9e4c ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only [1 file, +10/-6]
  76e56dbe50 net: flush_backlog() small changes [1 file, +8/-4]
  58cdd1ee65 bridge: mdb: Allow replace of a host-joined group [2 files, +2/-2]
  fcabb69674 rcu: handle unstable rdp in rcu_read_unlock_strict() [2 files, +11/-2]
  d402437cde rcu: fix header guard for rcu_all_qs() [1 file, +1/-1]
  887e39ac47 perf: Avoid the read if the count is already updated [3 files, +24/-18]
  c80b2d159c bpf: Use kallsyms to find the function name of a struct_ops's stub function [1 file, +44/-54]
  46f1c2b508 firmware: arm_scmi: Relax duplicate name constraint across protocol ids [1 file, +6/-13]
  1351052877 drm/atomic: clarify the rules around drm_atomic_state->allow_modeset [1 file, +21/-2]
  9fddd1f154 drm: Add valid clones check [1 file, +28/-0]
  ff214b079d nvme-pci: add quirks for device 126f:1001 [1 file, +3/-0]
  6d196cae4b nvme-pci: add quirks for WDC Blue SN550 15b7:5009 [1 file, +3/-0]
  6a09b6bad0 ALSA: usb-audio: Fix duplicated name in MIDI substream names [1 file, +12/-4]
  ad3e83a6c8 io_uring/fdinfo: annotate racy sq/cq head/tail reads [1 file, +2/-2]
  7f7c8c03fe btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref [1 file, +1/-1]
  8cafd7266f __legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock [1 file, +1/-5]
  28756f22de espintcp: fix skb leaks [3 files, +9/-3]
  9cbca30102 espintcp: remove encap socket caching to avoid reference leak [4 files, +8/-94]
  b1a687eb15 xfrm: Fix UDP GRO handling for some corner cases [2 files, +20/-16]
  447c8f0c06 kernel/fork: only call untrack_pfn_clear() on VMAs duplicated for fork() [1 file, +5/-4]
  252f78a931 xfrm: Sanitize marks before insert [2 files, +6/-0]
  7207effe47 driver core: Split devres APIs to device/devres.h [2 files, +125/-118]
  1e8b7e96f7 Bluetooth: L2CAP: Fix not checking l2cap_chan security level [1 file, +8/-7]
  cd7f022296 loop: don't require ->write_iter for writable files in loop_configure [1 file, +0/-3]
  873ebaf3c1 io_uring: fix overflow resched cqe reordering [1 file, +1/-0]
  689a205cd9 net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done [1 file, +5/-0]
  adb05149a9 can: slcan: allow reception of short error messages [1 file, +20/-6]
  cc55dd28c2 can: bcm: add locking for bcm_op runtime updates [1 file, +45/-21]
  63567ecd99 can: bcm: add missing rcu read protection for procfs content [1 file, +9/-4]
  bf85e49aaf ALSA: pcm: Fix race of buffer access at PCM OSS layer [3 files, +14/-2]
  e78908caf1 pmdomain: core: Fix error checking in genpd_dev_pm_attach_by_id() [1 file, +1/-1]
  dc9bdfb9b0 drm/edid: fixed the bug that hdr metadata was not reset [1 file, +1/-0]
  cb9a1019a6 Input: xpad - add more controllers [1 file, +3/-0]
  9b8263cae6 highmem: add folio_test_partial_kmap() [2 files, +12/-5]
  314bf771cb memcg: always call cond_resched() after fn() [1 file, +2/-4]
  9da33ce114 mm/page_alloc.c: avoid infinite retries caused by cpuset race [1 file, +8/-0]
  9f9517f156 mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled [1 file, +2/-0]
  94efb0d656 mm: vmalloc: actually use the in-place vrealloc region [1 file, +1/-0]
  483ac74183 mm: vmalloc: only zero-init on vrealloc shrink [1 file, +7/-5]
  1d45e0170c spi: use container_of_cont() for to_spi_device() [1 file, +1/-4]
  d28b0305f7 err.h: move IOMEM_ERR_PTR() to err.h [2 files, +3/-2]
  80eb73778d bpf: abort verification if env->cur_state->loop_entry != NULL [1 file, +4/-2]
  85fb1edd05 drm/gem: Internally test import_attach for imported objects [1 file, +1/-2]

Changes in 6.12.31
	drm/amd/display: Configure DTBCLK_P with OPTC only for dcn401
	drm/amd/display: Do not enable replay when vtotal update is pending.
	drm/amd/display: Correct timing_adjust_pending flag setting.
	drm/amd/display: Defer BW-optimization-blocked DRR adjustments
	i2c: designware: Use temporary variable for struct device
	i2c: designware: Fix an error handling path in i2c_dw_pci_probe()
	phy: renesas: rcar-gen3-usb2: Move IRQ request in probe
	phy: renesas: rcar-gen3-usb2: Lock around hardware registers and driver data
	phy: renesas: rcar-gen3-usb2: Assert PLL reset on PHY power off
	cpufreq: Add SM8650 to cpufreq-dt-platdev blocklist
	nvmem: rockchip-otp: Move read-offset into variant-data
	nvmem: rockchip-otp: add rk3576 variant data
	nvmem: core: fix bit offsets of more than one byte
	nvmem: core: verify cell's raw_len
	nvmem: core: update raw_len if the bit reading is required
	nvmem: qfprom: switch to 4-byte aligned reads
	scsi: target: iscsi: Fix timeout on deleted connection
	scsi: ufs: Introduce quirk to extend PA_HIBERN8TIME for UFS devices
	virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN
	dma/mapping.c: dev_dbg support for dma_addressing_limited
	intel_th: avoid using deprecated page->mapping, index fields
	mei: vsc: Use struct vsc_tp_packet as vsc-tp tx_buf and rx_buf type
	dma-mapping: avoid potential unused data compilation warning
	cgroup: Fix compilation issue due to cgroup_mutex not being exported
	vhost_task: fix vhost_task_create() documentation
	vhost-scsi: protect vq->log_used with vq->mutex
	scsi: mpi3mr: Add level check to control event logging
	net: enetc: refactor bulk flipping of RX buffers to separate function
	dma-mapping: Fix warning reported for missing prototype
	ima: process_measurement() needlessly takes inode_lock() on MAY_READ
	fs/buffer: split locking for pagecache lookups
	fs/buffer: introduce sleeping flavors for pagecache lookups
	fs/buffer: use sleeping version of __find_get_block()
	fs/ocfs2: use sleeping version of __find_get_block()
	fs/jbd2: use sleeping version of __find_get_block()
	fs/ext4: use sleeping version of sb_find_get_block()
	drm/amd/display: Enable urgent latency adjustment on DCN35
	drm/amdgpu: Allow P2P access through XGMI
	selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure
	block: fix race between set_blocksize and read paths
	io_uring: don't duplicate flushing in io_req_post_cqe
	bpf: fix possible endless loop in BPF map iteration
	samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora
	kconfig: merge_config: use an empty file as initfile
	x86/fred: Fix system hang during S4 resume with FRED enabled
	s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
	cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES
	cifs: Fix querying and creating MF symlinks over SMB1
	cifs: Fix negotiate retry functionality
	smb: client: Store original IO parameters and prevent zero IO sizes
	fuse: Return EPERM rather than ENOSYS from link()
	exfat: call bh_read in get_block only when necessary
	io_uring/msg: initialise msg request opcode
	NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
	NFS: Don't allow waiting for exiting tasks
	SUNRPC: Don't allow waiting for exiting tasks
	arm64: Add support for HIP09 Spectre-BHB mitigation
	iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
	tracing: Mark binary printing functions with __printf() attribute
	ACPI: PNP: Add Intel OC Watchdog IDs to non-PNP device list
	tpm: Convert warn to dbg in tpm2_start_auth_session()
	mailbox: pcc: Use acpi_os_ioremap() instead of ioremap()
	mailbox: use error ret code of of_parse_phandle_with_args()
	riscv: Allow NOMMU kernels to access all of RAM
	fbdev: fsl-diu-fb: add missing device_remove_file()
	fbcon: Use correct erase colour for clearing in fbcon
	fbdev: core: tileblit: Implement missing margin clearing for tileblit
	cifs: Set default Netbios RFC1001 server name to hostname in UNC
	cifs: add validation check for the fields in smb_aces
	cifs: Fix establishing NetBIOS session for SMB2+ connection
	NFSv4: Treat ENETUNREACH errors as fatal for state recovery
	SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
	SUNRPC: rpcbind should never reset the port to the value '0'
	spi-rockchip: Fix register out of bounds access
	ASoC: codecs: wsa884x: Correct VI sense channel mask
	ASoC: codecs: wsa883x: Correct VI sense channel mask
	mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
	net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
	net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
	thermal/drivers/mediatek/lvts: Start sensor interrupts disabled
	thermal/drivers/qoriq: Power down TMU on system suspend
	Bluetooth: btmtksdio: Prevent enabling interrupts after IRQ handler removal
	Bluetooth: Disable SCO support if READ_VOICE_SETTING is unsupported/broken
	dql: Fix dql->limit value when reset.
	lockdep: Fix wait context check on softirq for PREEMPT_RT
	objtool: Properly disable uaccess validation
	PCI: dwc: ep: Ensure proper iteration over outbound map windows
	r8169: disable RTL8126 ZRX-DC timeout
	tools/build: Don't pass test log files to linker
	pNFS/flexfiles: Report ENETDOWN as a connection error
	drm/amdgpu/discovery: check ip_discovery fw file available
	drm/amdkfd: set precise mem ops caps to disabled for gfx 11 and 12
	PCI: vmd: Disable MSI remapping bypass under Xen
	xen/pci: Do not register devices with segments >= 0x10000
	ext4: on a remount, only log the ro or r/w state when it has changed
	libnvdimm/labels: Fix divide error in nd_label_data_init()
	pidfs: improve multi-threaded exec and premature thread-group leader exit polling
	staging: vchiq_arm: Create keep-alive thread during probe
	mmc: host: Wait for Vdd to settle on card power off
	drm/amdgpu: Skip pcie_replay_count sysfs creation for VF
	cgroup/rstat: avoid disabling irqs for O(num_cpu)
	wifi: mt76: only mark tx-status-failed frames as ACKed on mt76x0/2
	wifi: mt76: mt7996: fix SER reset trigger on WED reset
	wifi: mt76: mt7996: revise TXS size
	wifi: mt76: mt7925: load the appropriate CLC data based on hardware type
	wifi: mt76: mt7925: fix fails to enter low power mode in suspend state
	x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
	x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
	x86/smpboot: Fix INIT delay assignment for extended Intel Families
	x86/microcode: Update the Intel processor flag scan check
	x86/mm: Check return value from memblock_phys_alloc_range()
	i2c: qup: Vote for interconnect bandwidth to DRAM
	i2c: pxa: fix call balance of i2c->clk handling routines
	btrfs: make btrfs_discard_workfn() block_group ref explicit
	btrfs: avoid linker error in btrfs_find_create_tree_block()
	btrfs: run btrfs_error_commit_super() early
	btrfs: fix non-empty delayed iputs list on unmount due to async workers
	btrfs: get zone unusable bytes while holding lock at btrfs_reclaim_bgs_work()
	btrfs: send: return -ENAMETOOLONG when attempting a path that is too long
	blk-cgroup: improve policy registration error handling
	drm/amdgpu: release xcp_mgr on exit
	drm/amd/display: Guard against setting dispclk low for dcn31x
	drm/amdgpu: adjust drm_firmware_drivers_only() handling
	i3c: master: svc: Fix missing STOP for master request
	s390/tlb: Use mm_has_pgste() instead of mm_alloc_pgste()
	dlm: make tcp still work in multi-link env
	clocksource/drivers/timer-riscv: Stop stimecmp when cpu hotplug
	um: Store full CSGSFS and SS register from mcontext
	um: Update min_low_pfn to match changes in uml_reserved
	wifi: mwifiex: Fix HT40 bandwidth issue.
	bnxt_en: Query FW parameters when the CAPS_CHANGE bit is set
	riscv: Call secondary mmu notifier when flushing the tlb
	ext4: reorder capability check last
	hypfs_create_cpu_files(): add missing check for hypfs_mkdir() failure
	scsi: st: Tighten the page format heuristics with MODE SELECT
	scsi: st: ERASE does not change tape location
	vfio/pci: Handle INTx IRQ_NOTCONNECTED
	bpf: Return prog btf_id without capable check
	PCI: dwc: Use resource start as ioremap() input in dw_pcie_pme_turn_off()
	jbd2: do not try to recover wiped journal
	tcp: reorganize tcp_in_ack_event() and tcp_count_delivered()
	rtc: rv3032: fix EERD location
	objtool: Fix error handling inconsistencies in check()
	thunderbolt: Do not add non-active NVM if NVM upgrade is disabled for retimer
	erofs: initialize decompression early
	spi: spi-mux: Fix coverity issue, unchecked return value
	ASoC: pcm6240: Drop bogus code handling IRQ as GPIO
	ASoC: mediatek: mt6359: Add stub for mt6359_accdet_enable_jack_detect
	bpf: Allow pre-ordering for bpf cgroup progs
	kbuild: fix argument parsing in scripts/config
	kconfig: do not clear SYMBOL_VALID when reading include/config/auto.conf
	crypto: octeontx2 - suppress auth failure screaming due to negative tests
	dm: restrict dm device size to 2^63-512 bytes
	net/smc: use the correct ndev to find pnetid by pnetid table
	xen: Add support for XenServer 6.1 platform device
	pinctrl-tegra: Restore SFSEL bit when freeing pins
	mfd: tps65219: Remove TPS65219_REG_TI_DEV_ID check
	drm/amdgpu/gfx12: don't read registers in mqd init
	drm/amdgpu/gfx11: don't read registers in mqd init
	drm/amdgpu: Update SRIOV video codec caps
	ASoC: sun4i-codec: support hp-det-gpios property
	clk: qcom: lpassaudiocc-sc7280: Add support for LPASS resets for QCM6490
	ext4: reject the 'data_err=abort' option in nojournal mode
	ext4: do not convert the unwritten extents if data writeback fails
	RDMA/uverbs: Propagate errors from rdma_lookup_get_uobject()
	posix-timers: Add cond_resched() to posix_timer_add() search loop
	posix-timers: Ensure that timer initialization is fully visible
	net: stmmac: dwmac-rk: Validate GRF and peripheral GRF during probe
	net: hsr: Fix PRP duplicate detection
	timer_list: Don't use %pK through printk()
	wifi: rtw89: set force HE TB mode when connecting to 11ax AP
	netfilter: conntrack: Bound nf_conntrack sysctl writes
	PNP: Expand length of fixup id string
	phy: rockchip: usbdp: Only verify link rates/lanes/voltage when the corresponding set flags are set
	arm64/mm: Check pmd_table() in pmd_trans_huge()
	arm64/mm: Check PUD_TYPE_TABLE in pud_bad()
	mmc: dw_mmc: add exynos7870 DW MMC support
	mmc: sdhci: Disable SD card clock before changing parameters
	usb: xhci: Don't change the status of stalled TDs on failed Stop EP
	wifi: iwlwifi: mvm: fix setting the TK when associated
	hwmon: (dell-smm) Increment the number of fans
	iommu: Keep dev->iommu state consistent
	printk: Check CON_SUSPEND when unblanking a console
	wifi: iwlwifi: don't warn when if there is a FW error
	wifi: iwlwifi: w/a FW SMPS mode selection
	wifi: iwlwifi: fix debug actions order
	wifi: iwlwifi: mark Br device not integrated
	wifi: iwlwifi: fix the ECKV UEFI variable name
	wifi: mac80211: fix warning on disconnect during failed ML reconf
	wifi: mac80211_hwsim: Fix MLD address translation
	wifi: cfg80211: allow IR in 20 MHz configurations
	ipv6: save dontfrag in cork
	drm/amd/display: remove minimum Dispclk and apply oem panel timing.
	drm/amd/display: calculate the remain segments for all pipes
	drm/amd/display: not abort link train when bw is low
	drm/amd/display: Fix incorrect DPCD configs while Replay/PSR switch
	gfs2: Check for empty queue in run_queue
	auxdisplay: charlcd: Partially revert "Move hwidth and bwidth to struct hd44780_common"
	ASoC: qcom: sm8250: explicitly set format in sm8250_be_hw_params_fixup()
	badblocks: Fix a nonsense WARN_ON() which checks whether a u64 variable < 0
	coresight-etb10: change etb_drvdata spinlock's type to raw_spinlock_t
	iommu/amd/pgtbl_v2: Improve error handling
	cpufreq: tegra186: Share policy per cluster
	watchdog: aspeed: Update bootstatus handling
	PCI: endpoint: pci-epf-test: Fix double free that causes kernel to oops
	misc: pci_endpoint_test: Give disabled BARs a distinct error code
	crypto: lzo - Fix compression buffer overrun
	crypto: mxs-dcp - Only set OTP_KEY bit for OTP key
	drm/amdkfd: Set per-process flags only once for gfx9/10/11/12
	drm/amdkfd: Set per-process flags only once cik/vi
	drm/amdgpu: Fix missing drain retry fault the last entry
	arm64: tegra: p2597: Fix gpio for vdd-1v8-dis regulator
	arm64: tegra: Resize aperture for the IGX PCIe C5 slot
	powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
	ALSA: seq: Improve data consistency at polling
	tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()
	rtc: ds1307: stop disabling alarms on probe
	ieee802154: ca8210: Use proper setters and getters for bitwise types
	drm/xe: Nuke VM's mapping upon close
	drm/xe: Retry BO allocation
	soc: samsung: include linux/array_size.h where needed
	ARM: tegra: Switch DSI-B clock parent to PLLD on Tegra114
	media: c8sectpfe: Call of_node_put(i2c_bus) only once in c8sectpfe_probe()
	usb: xhci: set page size to the xHCI-supported size
	dm cache: prevent BUG_ON by blocking retries on failed device resumes
	soc: mediatek: mtk-mutex: Add DPI1 SOF/EOF to MT8188 mutex tables
	orangefs: Do not truncate file size
	drm/gem: Test for imported GEM buffers with helper
	net: phylink: use pl->link_interface in phylink_expects_phy()
	blk-throttle: don't take carryover for prioritized processing of metadata
	remoteproc: qcom_wcnss: Handle platforms with only single power domain
	drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c
	drm/amd/display: Ensure DMCUB idle before reset on DCN31/DCN35
	drm/amd/display: Skip checking FRL_MODE bit for PCON BW determination
	drm/amd/display: Fix DMUB reset sequence for DCN401
	drm/amd/display: Fix p-state type when p-state is unsupported
	drm/amd/display: Request HW cursor on DCN3.2 with SubVP
	perf/core: Clean up perf_try_init_event()
	media: cx231xx: set device_caps for 417
	pinctrl: bcm281xx: Use "unsigned int" instead of bare "unsigned"
	rcu: Fix get_state_synchronize_rcu_full() GP-start detection
	net: ethernet: ti: cpsw_new: populate netdev of_node
	net: phy: nxp-c45-tja11xx: add match_phy_device to TJA1103/TJA1104
	dpll: Add an assertion to check freq_supported_num
	ublk: enforce ublks_max only for unprivileged devices
	iommufd: Disallow allocating nested parent domain with fault ID
	media: imx335: Set vblank immediately
	net: pktgen: fix mpls maximum labels list parsing
	perf/hw_breakpoint: Return EOPNOTSUPP for unsupported breakpoint type
	ALSA: hda/realtek: Enable PC beep passthrough for HP EliteBook 855 G7
	scsi: logging: Fix scsi_logging_level bounds
	ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().
	drm/rockchip: vop2: Add uv swap for cluster window
	block: mark bounce buffering as incompatible with integrity
	ublk: complete command synchronously on error
	media: uvcvideo: Add sanity check to uvc_ioctl_xu_ctrl_map
	media: uvcvideo: Handle uvc menu translation inside uvc_get_le_value
	clk: imx8mp: inform CCF of maximum frequency of clocks
	x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2
	hwmon: (gpio-fan) Add missing mutex locks
	ARM: at91: pm: fix at91_suspend_finish for ZQ calibration
	drm/mediatek: mtk_dpi: Add checks for reg_h_fre_con existence
	fpga: altera-cvp: Increase credit timeout
	perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
	soc: apple: rtkit: Use high prio work queue
	soc: apple: rtkit: Implement OSLog buffers properly
	wifi: ath12k: Report proper tx completion status to mac80211
	PCI: brcmstb: Expand inbound window size up to 64GB
	PCI: brcmstb: Add a softdep to MIP MSI-X driver
	firmware: arm_ffa: Set dma_mask for ffa devices
	drm/xe/vf: Retry sending MMIO request to GUC on timeout error
	drm/xe/pf: Create a link between PF and VF devices
	net/mlx5: Avoid report two health errors on same syndrome
	selftests/net: have `gro.sh -t` return a correct exit code
	pinctrl: sophgo: avoid to modify untouched bit when setting cv1800 pinconf
	drm/amdkfd: KFD release_work possible circular locking
	drm/xe: xe_gen_wa_oob: replace program_invocation_short_name
	leds: pwm-multicolor: Add check for fwnode_property_read_u32
	net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only
	net: xgene-v2: remove incorrect ACPI_PTR annotation
	bonding: report duplicate MAC address in all situations
	wifi: ath12k: Improve BSS discovery with hidden SSID in 6 GHz band
	soc: ti: k3-socinfo: Do not use syscon helper to build regmap
	bpf: Search and add kfuncs in struct_ops prologue and epilogue
	Octeontx2-af: RPM: Register driver with PCI subsys IDs
	x86/build: Fix broken copy command in genimage.sh when making isoimage
	drm/amd/display: handle max_downscale_src_width fail check
	drm/amd/display: fix dcn4x init failed
	drm/amd/display: Fix mismatch type comparison
	ASoC: mediatek: mt8188: Treat DMIC_GAINx_CUR as non-volatile
	ASoC: mediatek: mt8188: Add reference for dmic clocks
	x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
	vhost-scsi: Return queue full for page alloc failures during copy
	vdpa/mlx5: Fix mlx5_vdpa_get_config() endianness on big-endian machines
	cpuidle: menu: Avoid discarding useful information
	media: adv7180: Disable test-pattern control on adv7180
	media: tc358746: improve calculation of the D-PHY timing registers
	net/mlx5e: Add correct match to check IPSec syndromes for switchdev mode
	scsi: mpi3mr: Update timestamp only for supervisor IOCs
	loop: check in LO_FLAGS_DIRECT_IO in loop_default_blocksize
	libbpf: Fix out-of-bound read
	dm: fix unconditional IO throttle caused by REQ_PREFLUSH
	scsi: scsi_debug: First fixes for tapes
	net/mlx5: Change POOL_NEXT_SIZE define value and make it global
	x86/kaslr: Reduce KASLR entropy on most x86 systems
	crypto: ahash - Set default reqsize from ahash_alg
	crypto: skcipher - Zap type in crypto_alloc_sync_skcipher
	net: ipv6: Init tunnel link-netns before registering dev
	drm/xe/oa: Ensure that polled read returns latest data
	MIPS: Use arch specific syscall name match function
	drm/amdgpu: remove all KFD fences from the BO on release
	x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op()
	genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie
	MIPS: pm-cps: Use per-CPU variables as per-CPU, not per-core
	clocksource: mips-gic-timer: Enable counter when CPUs start
	PCI: epf-mhi: Update device ID for SA8775P
	scsi: mpt3sas: Send a diag reset if target reset fails
	wifi: rtw88: Fix rtw_init_vht_cap() for RTL8814AU
	wifi: rtw88: Fix rtw_init_ht_cap() for RTL8814AU
	wifi: rtw88: Fix rtw_desc_to_mcsrate() to handle MCS16-31
	wifi: rtw89: fw: propagate error code from rtw89_h2c_tx()
	wifi: rtw89: fw: get sb_sel_ver via get_unaligned_le32()
	wifi: rtw89: fw: add blacklist to avoid obsolete secure firmware
	wifi: rtw89: 8922a: fix incorrect STA-ID in EHT MU PPDU
	net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
	power: supply: axp20x_battery: Update temp sensor for AXP717 from device tree
	EDAC/ie31200: work around false positive build warning
	i3c: master: svc: Flush FIFO before sending Dynamic Address Assignment(DAA)
	mfd: axp20x: AXP717: Add AXP717_TS_PIN_CFG to writeable regs
	eeprom: ee1004: Check chip before probing
	irqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector
	drm/amd/pm: Fetch current power limit from PMFW
	drm/amd/display: Add support for disconnected eDP streams
	drm/amd/display: Guard against setting dispclk low when active
	drm/amd/display: Fix BT2020 YCbCr limited/full range input
	drm/amd/display: Read LTTPR ALPM caps during link cap retrieval
	Revert "drm/amd/display: Request HW cursor on DCN3.2 with SubVP"
	drm/amd/display: Don't treat wb connector as physical in create_validate_stream_for_sink
	serial: mctrl_gpio: split disable_ms into sync and no_sync APIs
	RDMA/core: Fix best page size finding when it can cross SG entries
	pmdomain: imx: gpcv2: use proper helper for property detection
	can: c_can: Use of_property_present() to test existence of DT property
	bpf: don't do clean_live_states when state->loop_entry->branches > 0
	bpf: copy_verifier_state() should copy 'loop_entry' field
	eth: mlx4: don't try to complete XDP frames in netpoll
	PCI: Fix old_size lower bound in calculate_iosize() too
	ACPI: HED: Always initialize before evged
	vxlan: Join / leave MC group after remote changes
	hrtimers: Replace hrtimer_clock_to_base_table with switch-case
	irqchip/riscv-imsic: Set irq_set_affinity() for IMSIC base
	media: test-drivers: vivid: don't call schedule in loop
	net/mlx5: Modify LSB bitmask in temperature event to include only the first bit
	net/mlx5: Apply rate-limiting to high temperature warning
	firmware: arm_ffa: Reject higher major version as incompatible
	firmware: arm_ffa: Handle the presence of host partition in the partition info
	firmware: xilinx: Dont send linux address to get fpga config get status
	ASoC: ops: Enforce platform maximum on initial value
	ASoC: tas2764: Add reg defaults for TAS2764_INT_CLK_CFG
	ASoC: tas2764: Mark SW_RESET as volatile
	ASoC: tas2764: Power up/down amp on mute ops
	ASoC: soc-dai: check return value at snd_soc_dai_set_tdm_slot()
	pinctrl: devicetree: do not goto err when probing hogs in pinctrl_dt_to_map
	smack: recognize ipv4 CIPSO w/o categories
	smack: Revert "smackfs: Added check catlen"
	kunit: tool: Use qboot on QEMU x86_64
	media: i2c: imx219: Correct the minimum vblanking value
	media: v4l: Memset argument to 0 before calling get_mbus_config pad op
	net/mlx4_core: Avoid impossible mlx4_db_alloc() order value
	drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init()
	drm/xe: Fix xe_tile_init_noalloc() error propagation
	clk: qcom: ipq5018: allow it to be bulid on arm32
	clk: qcom: clk-alpha-pll: Do not use random stack value for recalc rate
	drm/xe/debugfs: fixed the return value of wedged_mode_set
	drm/xe/debugfs: Add missing xe_pm_runtime_put in wedge_mode_set
	x86/ibt: Handle FineIBT in handle_cfi_failure()
	x86/traps: Cleanup and robustify decode_bug()
	sched: Reduce the default slice to avoid tasks getting an extra tick
	serial: sh-sci: Update the suspend/resume support
	pinctrl: renesas: rzg2l: Add suspend/resume support for pull up/down
	phy: phy-rockchip-samsung-hdptx: Swap the definitions of LCPLL_REF and ROPLL_REF
	phy: core: don't require set_mode() callback for phy_get_mode() to work
	phy: exynos5-usbdrd: fix EDS distribution tuning (gs101)
	soundwire: amd: change the soundwire wake enable/disable sequence
	soundwire: cadence_master: set frame shape and divider based on actual clk freq
	net: stmmac: dwmac-loongson: Set correct {tx,rx}_fifo_size
	drm/amdgpu/mes11: fix set_hw_resources_1 calculation
	drm/amdkfd: fix missing L2 cache info in topology
	drm/amdgpu: Set snoop bit for SDMA for MI series
	drm/amd/display: pass calculated dram_speed_mts to dml2
	drm/amd/display: Don't try AUX transactions on disconnected link
	drm/amdgpu: reset psp->cmd to NULL after releasing the buffer
	drm/amd/pm: Skip P2S load for SMU v13.0.12
	drm/amd/display: Support multiple options during psr entry.
	Revert "drm/amd/display: Exit idle optimizations before attempt to access PHY"
	drm/amd/display: Update CR AUX RD interval interpretation
	drm/amd/display: Initial psr_version with correct setting
	drm/amd/display: Increase block_sequence array size
	drm/amd/display: Use Nominal vBlank If Provided Instead Of Capping It
	drm/amd/display: Populate register address for dentist for dcn401
	drm/amdgpu: Use active umc info from discovery
	drm/amdgpu: enlarge the VBIOS binary size limit
	drm/amd/display/dm: drop hw_support check in amdgpu_dm_i2c_xfer()
	scsi: target: spc: Fix loop traversal in spc_rsoc_get_descr()
	net/mlx5: XDP, Enable TX side XDP multi-buffer support
	net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
	net/mlx5e: set the tx_queue_len for pfifo_fast
	net/mlx5e: reduce rep rxq depth to 256 for ECPF
	net/mlx5e: reduce the max log mpwrq sz for ECPF and reps
	drm/v3d: Add clock handling
	xfrm: prevent high SEQ input in non-ESN mode
	wifi: ath12k: fix the ampdu id fetch in the HAL_RX_MPDU_START TLV
	mptcp: pm: userspace: flags: clearer msg if no remote addr
	wifi: iwlwifi: use correct IMR dump variable
	wifi: iwlwifi: don't warn during reprobe
	wifi: mac80211: don't unconditionally call drv_mgd_complete_tx()
	wifi: mac80211: remove misplaced drv_mgd_complete_tx() call
	wifi: mac80211: set ieee80211_prep_tx_info::link_id upon Auth Rx
	net: fec: Refactor MAC reset to function
	powerpc/pseries/iommu: memory notifier incorrectly adds TCEs for pmemory
	powerpc/pseries/iommu: create DDW for devices with DMA mask less than 64-bits
	arch/powerpc/perf: Check the instruction type before creating sample with perf_mem_data_src
	ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().
	r8152: add vendor/device ID pair for Dell Alienware AW1022z
	iio: adc: ad7944: don't use storagebits for sizing
	pstore: Change kmsg_bytes storage size to u32
	leds: trigger: netdev: Configure LED blink interval for HW offload
	ext4: don't write back data before punch hole in nojournal mode
	ext4: remove writable userspace mappings before truncating page cache
	wifi: rtw88: Fix download_firmware_validate() for RTL8814AU
	wifi: rtw88: Fix __rtw_download_firmware() for RTL8814AU
	wifi: rtw89: coex: Assign value over than 0 to avoid firmware timer hang
	wifi: rtw89: fw: validate multi-firmware header before getting its size
	wifi: rtw89: fw: validate multi-firmware header before accessing
	wifi: rtw89: call power_on ahead before selecting firmware
	clk: qcom: camcc-sm8250: Use clk_rcg2_shared_ops for some RCGs
	net: page_pool: avoid false positive warning if NAPI was never added
	tools/power turbostat: Clustered Uncore MHz counters should honor show/hide options
	hwmon: (xgene-hwmon) use appropriate type for the latency value
	f2fs: introduce f2fs_base_attr for global sysfs entries
	media: qcom: camss: csid: Only add TPG v4l2 ctrl if TPG hardware is available
	media: qcom: camss: Add default case in vfe_src_pad_code
	drm/rockchip: vop2: Improve display modes handling on RK3588 HDMI0
	eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs
	tools: ynl-gen: don't output external constants
	net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled
	cpufreq: amd-pstate: Remove unnecessary driver_lock in set_boost
	vxlan: Annotate FDB data races
	ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only
	r8169: don't scan PHY addresses > 0
	net: flush_backlog() small changes
	bridge: mdb: Allow replace of a host-joined group
	ice: init flow director before RDMA
	ice: treat dyn_allowed only as suggestion
	rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
	rcu: handle unstable rdp in rcu_read_unlock_strict()
	rcu: fix header guard for rcu_all_qs()
	perf: Avoid the read if the count is already updated
	ice: count combined queues using Rx/Tx count
	drm/xe/relay: Don't use GFP_KERNEL for new transactions
	net/mana: fix warning in the writer of client oob
	scsi: lpfc: Handle duplicate D_IDs in ndlp search-by D_ID routine
	scsi: lpfc: Ignore ndlp rport mismatch in dev_loss_tmo callbk
	scsi: lpfc: Free phba irq in lpfc_sli4_enable_msi() when pci_irq_vector() fails
	scsi: st: Restore some drive settings after reset
	wifi: ath12k: Avoid napi_sync() before napi_enable()
	HID: usbkbd: Fix the bit shift number for LED_KANA
	arm64: zynqmp: add clock-output-names property in clock nodes
	ASoC: codecs: pcm3168a: Allow for 24-bit in provider mode
	ASoC: rt722-sdca: Add some missing readable registers
	irqchip/riscv-aplic: Add support for hart indexes
	dm vdo indexer: prevent unterminated string warning
	dm vdo: use a short static string for thread name prefix
	drm/ast: Find VBIOS mode from regular display size
	bpf: Use kallsyms to find the function name of a struct_ops's stub function
	bpftool: Fix readlink usage in get_fd_type
	firmware: arm_scmi: Relax duplicate name constraint across protocol ids
	perf/amd/ibs: Fix perf_ibs_op.cnt_mask for CurCnt
	perf/amd/ibs: Fix ->config to sample period calculation for OP PMU
	clk: renesas: rzg2l-cpg: Refactor Runtime PM clock validation
	wifi: rtl8xxxu: retry firmware download on error
	wifi: rtw88: Don't use static local variable in rtw8822b_set_tx_power_index_by_rate
	wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet
	spi: zynqmp-gqspi: Always acknowledge interrupts
	regulator: ad5398: Add device tree support
	wifi: ath12k: fix ath12k_hal_tx_cmd_ext_desc_setup() info1 override
	accel/qaic: Mask out SR-IOV PCI resources
	drm/xe/pf: Reset GuC VF config when unprovisioning critical resource
	wifi: ath9k: return by of_get_mac_address
	wifi: ath12k: Fetch regdb.bin file from board-2.bin
	wifi: ath12k: Fix end offset bit definition in monitor ring descriptor
	drm: bridge: adv7511: fill stream capabilities
	drm/nouveau: fix the broken marco GSP_MSG_MAX_SIZE
	wifi: ath11k: Use dma_alloc_noncoherent for rx_tid buffer allocation
	drm/xe: Move suballocator init to after display init
	drm/xe: Do not attempt to bootstrap VF in execlists mode
	wifi: rtw89: coex: Separated Wi-Fi connecting event from Wi-Fi scan event
	drm/xe/sa: Always call drm_suballoc_manager_fini()
	drm/xe: Reject BO eviction if BO is bound to current VM
	drm/atomic: clarify the rules around drm_atomic_state->allow_modeset
	drm/buddy: fix issue that force_merge cannot free all roots
	drm/panel-edp: Add Starry 116KHD024006
	drm: Add valid clones check
	ASoC: imx-card: Adjust over allocation of memory in imx_card_parse_of()
	book3s64/radix: Fix compile errors when CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP=n
	pinctrl: meson: define the pull up/down resistor value as 60 kOhm
	smb: server: smb2pdu: check return value of xa_store()
	platform/x86/intel: hid: Add Pantherlake support
	platform/x86: asus-wmi: Disable OOBE state after resume from hibernation
	platform/x86: ideapad-laptop: add support for some new buttons
	ASoC: cs42l43: Disable headphone clamps during type detection
	ASoC: Intel: bytcr_rt5640: Add DMI quirk for Acer Aspire SW3-013
	ALSA: hda/realtek: Add quirk for HP Spectre x360 15-df1xxx
	nvme-pci: add quirks for device 126f:1001
	nvme-pci: add quirks for WDC Blue SN550 15b7:5009
	ALSA: usb-audio: Fix duplicated name in MIDI substream names
	nvmet-tcp: don't restore null sk_state_change
	io_uring/fdinfo: annotate racy sq/cq head/tail reads
	cifs: Fix and improve cifs_query_path_info() and cifs_query_file_info()
	cifs: Fix changing times and read-only attr over SMB1 smb_set_file_info() function
	ASoC: intel/sdw_utils: Add volume limit to cs42l43 speakers
	btrfs: compression: adjust cb->compressed_folios allocation type
	btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref
	btrfs: handle empty eb->folios in num_extent_folios()
	btrfs: avoid NULL pointer dereference if no valid csum tree
	tools: ynl-gen: validate 0 len strings from kernel
	block: only update request sector if needed
	wifi: iwlwifi: add support for Killer on MTL
	x86/Kconfig: make CFI_AUTO_DEFAULT depend on !RUST or Rust >= 1.88
	xenbus: Allow PVH dom0 a non-local xenstore
	drm/amd/display: Call FP Protect Before Mode Programming/Mode Support
	__legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock
	soundwire: bus: Fix race on the creation of the IRQ domain
	espintcp: fix skb leaks
	espintcp: remove encap socket caching to avoid reference leak
	xfrm: Fix UDP GRO handling for some corner cases
	dmaengine: idxd: Fix allowing write() from different address spaces
	x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL macro
	kernel/fork: only call untrack_pfn_clear() on VMAs duplicated for fork()
	remoteproc: qcom_wcnss: Fix on platforms without fallback regulators
	clk: sunxi-ng: d1: Add missing divider for MMC mod clocks
	xfrm: Sanitize marks before insert
	dmaengine: idxd: Fix ->poll() return value
	dmaengine: fsl-edma: Fix return code for unhandled interrupts
	driver core: Split devres APIs to device/devres.h
	devres: Introduce devm_kmemdup_array()
	ASoC: SOF: Intel: hda: Fix UAF when reloading module
	irqchip/riscv-imsic: Start local sync timer on correct CPU
	perf/x86/intel: Fix segfault with PEBS-via-PT with sample_freq
	Bluetooth: L2CAP: Fix not checking l2cap_chan security level
	Bluetooth: btusb: use skb_pull to avoid unsafe access in QCA dump handling
	ptp: ocp: Limit signal/freq counts in summary output functions
	bridge: netfilter: Fix forwarding of fragmented packets
	ice: fix vf->num_mac count with port representors
	ice: Fix LACP bonds without SRIOV environment
	idpf: fix null-ptr-deref in idpf_features_check
	loop: don't require ->write_iter for writable files in loop_configure
	pinctrl: qcom: switch to devm_register_sys_off_handler()
	net: dwmac-sun8i: Use parsed internal PHY address instead of 1
	net: lan743x: Restore SGMII CTRL register on resume
	io_uring: fix overflow resched cqe reordering
	idpf: fix idpf_vport_splitq_napi_poll()
	sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
	octeontx2-pf: Add AF_XDP non-zero copy support
	net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done
	octeontx2-af: Set LMT_ENA bit for APR table entries
	octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG
	clk: s2mps11: initialise clk_hw_onecell_data::num before accessing ::hws[] in probe()
	crypto: algif_hash - fix double free in hash_accept
	padata: do not leak refcount in reorder_work
	can: slcan: allow reception of short error messages
	can: bcm: add locking for bcm_op runtime updates
	can: bcm: add missing rcu read protection for procfs content
	ASoC: SOF: ipc4-control: Use SOF_CTRL_CMD_BINARY as numid for bytes_ext
	ASoC: SOF: Intel: hda-bus: Use PIO mode on ACE2+ platforms
	ASoc: SOF: topology: connect DAI to a single DAI link
	ASoC: SOF: ipc4-pcm: Delay reporting is only supported for playback direction
	ALSA: pcm: Fix race of buffer access at PCM OSS layer
	ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ASP10
	llc: fix data loss when reading from a socket in llc_ui_recvmsg()
	can: kvaser_pciefd: Continue parsing DMA buf after dropped RX
	can: kvaser_pciefd: Fix echo_skb race
	net: dsa: microchip: linearize skb for tail-tagging switches
	vmxnet3: update MTU after device quiesce
	pmdomain: renesas: rcar: Remove obsolete nullify checks
	pmdomain: core: Fix error checking in genpd_dev_pm_attach_by_id()
	platform/x86: dell-wmi-sysman: Avoid buffer overflow in current_password_store()
	thermal: intel: x86_pkg_temp_thermal: Fix bogus trip temperature
	drm/edid: fixed the bug that hdr metadata was not reset
	smb: client: Fix use-after-free in cifs_fill_dirent
	arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs
	smb: client: Reset all search buffer pointers when releasing buffer
	Revert "drm/amd: Keep display off while going into S4"
	Input: xpad - add more controllers
	highmem: add folio_test_partial_kmap()
	memcg: always call cond_resched() after fn()
	mm/page_alloc.c: avoid infinite retries caused by cpuset race
	mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
	mm: vmalloc: actually use the in-place vrealloc region
	mm: vmalloc: only zero-init on vrealloc shrink
	nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs()
	Bluetooth: btmtksdio: Check function enabled before doing close
	Bluetooth: btmtksdio: Do close if SDIO card removed without close
	Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection"
	ksmbd: fix stream write failure
	platform/x86: think-lmi: Fix attribute name usage for non-compliant items
	spi: use container_of_cont() for to_spi_device()
	spi: spi-fsl-dspi: restrict register range for regmap access
	spi: spi-fsl-dspi: Halt the module after a new message transfer
	spi: spi-fsl-dspi: Reset SR flags before sending a new message
	err.h: move IOMEM_ERR_PTR() to err.h
	gcc-15: make 'unterminated string initialization' just a warning
	gcc-15: disable '-Wunterminated-string-initialization' entirely for now
	Fix mis-uses of 'cc-option' for warning disablement
	kbuild: Properly disable -Wunterminated-string-initialization for clang
	drm/amd/display: Exit idle optimizations before accessing PHY
	bpf: abort verification if env->cur_state->loop_entry != NULL
	serial: sh-sci: Save and restore more registers
	drm/amdkfd: Correct F8_MODE for gfx950
	watchdog: aspeed: fix 64-bit division
	pinctrl: tegra: Fix off by one in tegra_pinctrl_get_group()
	i3c: master: svc: Fix implicit fallthrough in svc_i3c_master_ibi_work()
	x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers bounce buffers
	drm/gem: Internally test import_attach for imported objects
	Linux 6.12.31

Change-Id: I017795966fb764f9320a6a0df1571d19e5e631fe
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2025-07-03 07:19:01 +00:00
Kalesh Singh
030e00a2d7 ANDROID: 16K: Use vma_area slab cache for pad VMA
Allocate padding VMA from the vma slab cache; this make it
easier to debug slab leaks than from kmalloc slabs.

Bug: 427145188
Change-Id: I24c5f5d0eb3b06acf506f18f5eb57cd497b13d6d
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2025-06-24 16:42:50 -07:00
David Hildenbrand
447c8f0c06 kernel/fork: only call untrack_pfn_clear() on VMAs duplicated for fork()
[ Upstream commit e9f180d7cfde23b9f8eebd60272465176373ab2c ]

Not intuitive, but vm_area_dup() located in kernel/fork.c is not only used
for duplicating VMAs during fork(), but also for duplicating VMAs when
splitting VMAs or when mremap()'ing them.

VM_PFNMAP mappings can at least get ordinarily mremap()'ed (no change in
size) and apparently also shrunk during mremap(), which implies
duplicating the VMA in __split_vma() first.

In case of ordinary mremap() (no change in size), we first duplicate the
VMA in copy_vma_and_data()->copy_vma() to then call untrack_pfn_clear() on
the old VMA: we effectively move the VM_PAT reservation.  So the
untrack_pfn_clear() call on the new VMA duplicating is wrong in that
context.

Splitting of VMAs seems problematic, because we don't duplicate/adjust the
reservation when splitting the VMA.  Instead, in memtype_erase() -- called
during zapping/munmap -- we shrink a reservation in case only the end
address matches: Assume we split a VMA into A and B, both would share a
reservation until B is unmapped.

So when unmapping B, the reservation would be updated to cover only A.
When unmapping A, we would properly remove the now-shrunk reservation.
That scenario describes the mremap() shrinking (old_size > new_size),
where we split + unmap B, and the untrack_pfn_clear() on the new VMA when
is wrong.

What if we manage to split a VM_PFNMAP VMA into A and B and unmap A first?
It would be broken because we would never free the reservation.  Likely,
there are ways to trigger such a VMA split outside of mremap().

Affecting other VMA duplication was not intended, vm_area_dup() being used
outside of kernel/fork.c was an oversight.  So let's fix that for; how to
handle VMA splits better should be investigated separately.

With a simple reproducer that uses mprotect() to split such a VMA I can
trigger

x86/PAT: pat_mremap:26448 freeing invalid memtype [mem 0x00000000-0x00000fff]

Link: https://lkml.kernel.org/r/20250422144942.2871395-1-david@redhat.com
Fixes: dc84bc2aba85 ("x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29 11:03:14 +02:00
xiaofeng
2a5729e149 ANDROID: vendor_hooks:vendor hook for mmput
add vendor hook in mmput while mm_users decreased to 0.

Bug: 238821038
Change-Id: I42a717cbeeb3176bac14b4b2391fdb2366c972d3
Signed-off-by: xiaofeng <xiaofeng5@xiaomi.com>
2025-04-28 12:00:31 -07:00
Greg Kroah-Hartman
0946c695bb Merge 7d8dfc27d9 ("smb: client: Fix netns refcount imbalance causing leaks and use-after-free") into android16-6.12
Steps on the way to 6.12.23

Change-Id: I071040c57ea134f0a618ecc9e25db4a302dff4a8
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2025-04-24 08:30:10 -07:00
David Hildenbrand
8d6373f83f x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
[ Upstream commit dc84bc2aba85a1508f04a936f9f9a15f64ebfb31 ]

If track_pfn_copy() fails, we already added the dst VMA to the maple
tree. As fork() fails, we'll cleanup the maple tree, and stumble over
the dst VMA for which we neither performed any reservation nor copied
any page tables.

Consequently untrack_pfn() will see VM_PAT and try obtaining the
PAT information from the page table -- which fails because the page
table was not copied.

The easiest fix would be to simply clear the VM_PAT flag of the dst VMA
if track_pfn_copy() fails. However, the whole thing is about "simply"
clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy()
and performed a reservation, but copying the page tables fails, we'll
simply clear the VM_PAT flag, not properly undoing the reservation ...
which is also wrong.

So let's fix it properly: set the VM_PAT flag only if the reservation
succeeded (leaving it clear initially), and undo the reservation if
anything goes wrong while copying the page tables: clearing the VM_PAT
flag after undoing the reservation.

Note that any copied page table entries will get zapped when the VMA will
get removed later, after copy_page_range() succeeded; as VM_PAT is not set
then, we won't try cleaning VM_PAT up once more and untrack_pfn() will be
happy. Note that leaving these page tables in place without a reservation
is not a problem, as we are aborting fork(); this process will never run.

A reproducer can trigger this usually at the first try:

  https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/pat_fork.c

  WARNING: CPU: 26 PID: 11650 at arch/x86/mm/pat/memtype.c:983 get_pat_info+0xf6/0x110
  Modules linked in: ...
  CPU: 26 UID: 0 PID: 11650 Comm: repro3 Not tainted 6.12.0-rc5+ #92
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
  RIP: 0010:get_pat_info+0xf6/0x110
  ...
  Call Trace:
   <TASK>
   ...
   untrack_pfn+0x52/0x110
   unmap_single_vma+0xa6/0xe0
   unmap_vmas+0x105/0x1f0
   exit_mmap+0xf6/0x460
   __mmput+0x4b/0x120
   copy_process+0x1bf6/0x2aa0
   kernel_clone+0xab/0x440
   __do_sys_clone+0x66/0x90
   do_syscall_64+0x95/0x180

Likely this case was missed in:

  d155df53f3 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed")

... and instead of undoing the reservation we simply cleared the VM_PAT flag.

Keep the documentation of these functions in include/linux/pgtable.h,
one place is more than sufficient -- we should clean that up for the other
functions like track_pfn_remap/untrack_pfn separately.

Fixes: d155df53f3 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed")
Fixes: 2ab640379a ("x86: PAT: hooks in generic vm code to help archs to track pfnmap regions - v3")
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yuxin wang <wang1315768607@163.com>
Reported-by: Marius Fleischer <fleischermarius@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20250321112323.153741-1-david@redhat.com
Closes: https://lore.kernel.org/lkml/CABOYnLx_dnqzpCW99G81DmOr+2UzdmZMk=T3uxwNxwz+R1RAwg@mail.gmail.com/
Closes: https://lore.kernel.org/lkml/CAJg=8jwijTP5fre8woS4JVJQ8iUA6v+iNcsOgtj9Zfpc3obDOQ@mail.gmail.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10 14:39:18 +02:00
chenweitao
1dc69ebe20 ANDROID: vendor_hooks: Add hook for trace_android_vh_copy_process
Add hook for trace_android_vh_copy_process, which gives the vendor a chance to monitor the total thread count of the system and the thread count under a particular process

Bug: 325765508
Change-Id: Ibeb8aa571d44997ac10623321cd00d1686bde033
Signed-off-by: chenweitao <chenweitao@oppo.com>
2025-03-11 11:26:45 +08:00
Suren Baghdasaryan
3e74468f1e FROMGIT: mm: make vma cache SLAB_TYPESAFE_BY_RCU
To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that
object reuse before RCU grace period is over will be detected by
lock_vma_under_rcu().

Current checks are sufficient as long as vma is detached before it is
freed.  The only place this is not currently happening is in exit_mmap().
Add the missing vma_mark_detached() in exit_mmap().

Another issue which might trick lock_vma_under_rcu() during vma reuse is
vm_area_dup(), which copies the entire content of the vma into a new one,
overriding new vma's vm_refcnt and temporarily making it appear as
attached.  This might trick a racing lock_vma_under_rcu() to operate on a
reused vma if it found the vma before it got reused.  To prevent this
situation, we should ensure that vm_refcnt stays at detached state (0)
when it is copied and advances to attached state only after it is added
into the vma tree.  Introduce vm_area_init_from() which preserves new
vma's vm_refcnt and use it in vm_area_dup().  Since all vmas are in
detached state with no current readers when they are freed,

lock_vma_under_rcu() will not be able to take vm_refcnt after vma got
detached even if vma is reused. vma_mark_attached() in modified to
include a release fence to ensure all stores to the vma happen before
vm_refcnt gets initialized.

Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate
vm_area_struct reuse and will minimize the number of call_rcu() calls.

Link: https://lkml.kernel.org/r/20250213224655.1680278-18-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit f56ae9bc0002a2ff7bf3cdd27ed847fe6e9d686a
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 322132947
Change-Id: I410c6fbce2e0d87ed5f7c19dc1f8806b2556837a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2025-02-28 02:29:43 -08:00
Suren Baghdasaryan
540df3e90d BACKPORT: FROMGIT: mm: replace vm_lock and detached flag with a reference count
rw_semaphore is a sizable structure of 40 bytes and consumes considerable
space for each vm_area_struct.  However vma_lock has two important
specifics which can be used to replace rw_semaphore with a simpler
structure:

1. Readers never wait.  They try to take the vma_lock and fall back to
   mmap_lock if that fails.

2. Only one writer at a time will ever try to write-lock a vma_lock
   because writers first take mmap_lock in write mode.  Because of these
   requirements, full rw_semaphore functionality is not needed and we can
   replace rw_semaphore and the vma->detached flag with a refcount
   (vm_refcnt).

When vma is in detached state, vm_refcnt is 0 and only a call to
vma_mark_attached() can take it out of this state.  Note that unlike
before, now we enforce both vma_mark_attached() and vma_mark_detached() to
be done only after vma has been write-locked.  vma_mark_attached() changes
vm_refcnt to 1 to indicate that it has been attached to the vma tree.
When a reader takes read lock, it increments vm_refcnt, unless the top
usable bit of vm_refcnt (0x40000000) is set, indicating presence of a
writer.  When writer takes write lock, it sets the top usable bit to
indicate its presence.  If there are readers, writer will wait using newly
introduced mm->vma_writer_wait.  Since all writers take mmap_lock in write
mode first, there can be only one writer at a time.  The last reader to
release the lock will signal the writer to wake up.  refcount might
overflow if there are many competing readers, in which case read-locking
will fail.  Readers are expected to handle such failures.

In summary:
1. all readers increment the vm_refcnt;
2. writer sets top usable (writer) bit of vm_refcnt;
3. readers cannot increment the vm_refcnt if the writer bit is set;
4. in the presence of readers, writer must wait for the vm_refcnt to drop
to 1 (plus the VMA_LOCK_OFFSET writer bit), indicating an attached vma
with no readers;
5. vm_refcnt overflow is handled by the readers.

While this vm_lock replacement does not yet result in a smaller
vm_area_struct (it stays at 256 bytes due to cacheline alignment), it
allows for further size optimization by structure member regrouping to
bring the size of vm_area_struct below 192 bytes.

Link: https://lkml.kernel.org/r/20250213224655.1680278-13-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 810c1edd93f29baa10142aa430f8d6c2909fcc25
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: trivial merge conflicts in mm.h and vma_internal.h]
Bug: 322132947
Change-Id: I4ef39de83b6b44b30c5bd2ff0cd34c0a84d10632
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2025-02-28 02:29:43 -08:00
Suren Baghdasaryan
5fcab29750 FROMGIT: mm: move mmap_init_lock() out of the header file
mmap_init_lock() is used only from mm_init() in fork.c, therefore it does
not have to reside in the header file.  This move lets us avoid including
additional headers in mmap_lock.h later, when mmap_init_lock() needs to
initialize rcuwait object.

Link: https://lkml.kernel.org/r/20250213224655.1680278-9-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 9ab68ea874f31ea5b633d14095f7ec001495b11e
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 322132947
Change-Id: I69aeecdd917bae33a429aa872643c3a11dfa0e32
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2025-02-28 02:29:43 -08:00
Suren Baghdasaryan
74cc099459 BACKPORT: FROMGIT: mm: mark vma as detached until it's added into vma tree
Current implementation does not set detached flag when a VMA is first
allocated.  This does not represent the real state of the VMA, which is
detached until it is added into mm's VMA tree.  Fix this by marking new
VMAs as detached and resetting detached flag only after VMA is added into
a tree.

Introduce vma_mark_attached() to make the API more readable and to
simplify possible future cleanup when vma->vm_mm might be used to indicate
detached vma and vma_mark_attached() will need an additional mm parameter.

Link: https://lkml.kernel.org/r/20250213224655.1680278-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 286750a6443552abad64c66ac96e629c4516bb3b
[surenb: resolved conflict due to the reattach_vmas() being moved from
vma.h to vma.c]
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 322132947
Change-Id: I7361060f5e3ef392848f835db4c0c0f74de12ea7
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2025-02-28 02:29:43 -08:00
Suren Baghdasaryan
e1e4842c07 FROMGIT: mm: move per-vma lock into vm_area_struct
Back when per-vma locks were introduces, vm_lock was moved out of
vm_area_struct in [1] because of the performance regression caused by
false cacheline sharing.  Recent investigation [2] revealed that the
regressions is limited to a rather old Broadwell microarchitecture and
even there it can be mitigated by disabling adjacent cacheline
prefetching, see [3].

Splitting single logical structure into multiple ones leads to more
complicated management, extra pointer dereferences and overall less
maintainable code.  When that split-away part is a lock, it complicates
things even further.  With no performance benefits, there are no reasons
for this split.  Merging the vm_lock back into vm_area_struct also allows
vm_area_struct to use SLAB_TYPESAFE_BY_RCU later in this patchset.  Move
vm_lock back into vm_area_struct, aligning it at the cacheline boundary
and changing the cache to be cacheline-aligned as well.  With kernel
compiled using defconfig, this causes VMA memory consumption to grow from
160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes:

    slabinfo before:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vma_lock         ...     40  102    1 : ...
     vm_area_struct   ...    160   51    2 : ...

    slabinfo after moving vm_lock:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vm_area_struct   ...    256   32    2 : ...

Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages,
which is 5.5MB per 100000 VMAs.  Note that the size of this structure is
dependent on the kernel configuration and typically the original size is
higher than 160 bytes.  Therefore these calculations are close to the
worst case scenario.  A more realistic vm_area_struct usage before this
change is:

     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vma_lock         ...     40  102    1 : ...
     vm_area_struct   ...    176   46    2 : ...

Aggregate VMA memory consumption per 1000 VMAs grows from 54 to 64 pages,
which is 3.9MB per 100000 VMAs.  This memory consumption growth can be
addressed later by optimizing the vm_lock.

[1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/
[2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/
[3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com/

Link: https://lkml.kernel.org/r/20250213224655.1680278-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit ad8786318a05a4c59fa9bc03a0e69d0b6b2170f9
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 322132947
Change-Id: Iefd3e6cfcd7a003d994eaa24b4a72593045e48b4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2025-02-28 02:29:43 -08:00
Suren Baghdasaryan
90996df30f UPSTREAM: mm: convert mm_lock_seq to a proper seqcount
Convert mm_lock_seq to be seqcount_t and change all mmap_write_lock
variants to increment it, in-line with the usual seqcount usage pattern.
This lets us check whether the mmap_lock is write-locked by checking
mm_lock_seq.sequence counter (odd=locked, even=unlocked). This will be
used when implementing mmap_lock speculation functions.
As a result vm_lock_seq is also change to be unsigned to match the type
of mm_lock_seq.sequence.

Link: https://lkml.kernel.org/r/20241122174416.1367052-2-surenb@google.com
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit e5e7fb278e5924f29ceab42bbbb891cde528f7cc)
Bug: 322132947
Change-Id: I515a62599fa971935471bf61d314b0365c3e2926
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2025-02-28 02:29:43 -08:00
Sooyong Suk
7b7404ab99 ANDROID: mm: export symbol for vendor module
export symbols for vendor module for custom madvise behavior
- mm_access, pidfd_get_pid, swp_swapcount

Bug: 351175506
Change-Id: I55a48d09fa61b74a00eba32723eca16153d309ec
Signed-off-by: Sooyong Suk <s.suk@samsung.corp-partner.google.com>
2025-02-11 14:44:05 -08:00
Liujie Xie
ad17f45365 ANDROID: vendor_hooks: Export the tracepoints task_rename
Export the tracepoint task_rename to identify specific new task,
to customize task's util for power and performance, or optimize
task schedule parameters.

Bug: 189985971

Change-Id: I3bb71eae316e3096d361e7b47012ba46ea4be509
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit ed1e87e42cc2c4ed61ad6bc9d242e7e7a70c5b99)
2025-01-22 13:27:34 -08:00
Greg Kroah-Hartman
bba787badd Merge 6.12.8 into android16-6.12
GKI (arm64) relevant 24 out of 115 changes, affecting 34 files +169/-94
  f4ab7d7424 bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP [1 file, +5/-1]
  8cdfb06569 fork: avoid inappropriate uprobe access to invalid mm [1 file, +6/-7]
  2175b66c7f mm/vmstat: fix a W=1 clang compiler warning [1 file, +1/-1]
  35727f4506 tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress() [2 files, +9/-3]
  4aa5dcb389 tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection [3 files, +16/-5]
  997cf2d8c2 bpf: Check negative offsets in __bpf_skb_min_len() [1 file, +15/-6]
  a817e938a0 phy: core: Fix an OF node refcount leakage in _of_phy_get() [1 file, +5/-2]
  479b6c2a5f phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup() [1 file, +3/-1]
  09f17bfb36 phy: core: Fix that API devm_phy_put() fails to release the phy [1 file, +1/-1]
  f797151e84 phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider [1 file, +3/-3]
  7e7c8ffc01 phy: core: Fix that API devm_phy_destroy() fails to destroy the phy [1 file, +1/-1]
  c180c3f42d ALSA: memalloc: prefer dma_mapping_error() over explicit address checking [1 file, +1/-1]
  a39ff5bf23 stddef: make __struct_group() UAPI C++-friendly [2 files, +21/-7]
  68662d78af tracing/kprobe: Make trace_kprobe's module callback called after jump_label update [1 file, +1/-1]
  ca5995f805 regmap: Use correct format specifier for logging range errors [1 file, +2/-2]
  fdaaf92943 bpf: Zero index arg error string for dynptr and iter [6 files, +29/-29]
  92d5139b91 virtio-blk: don't keep queue frozen during system suspend [1 file, +5/-2]
  16b54ee81d blk-mq: register cpuhp callback after hctx is added to xarray table [1 file, +7/-8]
  7d680f2f76 ublk: detach gendisk from ublk device if add_disk() fails [1 file, +17/-9]
  79a47fd0f1 freezer, sched: Report frozen tasks as 'D' instead of 'R' [1 file, +2/-1]
  a744146969 tracing: Constify string literal data member in struct trace_event_call [1 file, +1/-1]
  1cca920af1 tracing: Prevent bad count for tracing_cpumask_write [1 file, +3/-0]
  8e8494c83c io_uring/sqpoll: fix sqpoll error handling races [1 file, +6/-0]
  aed157301c PCI/MSI: Handle lack of irqdomain gracefully [2 files, +9/-2]

Changes in 6.12.8
	media: dvb-frontends: dib3000mb: fix uninit-value in dib3000_write_reg
	ceph: allocate sparse_ext map only for sparse reads
	arm64: dts: broadcom: Fix L2 linesize for Raspberry Pi 5
	bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP
	fork: avoid inappropriate uprobe access to invalid mm
	mm/vmstat: fix a W=1 clang compiler warning
	selftests/bpf: Fix compilation error in get_uprobe_offset()
	smb: client: Deduplicate "select NETFS_SUPPORT" in Kconfig
	smb: fix bytes written value in /proc/fs/cifs/Stats
	tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()
	tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection
	bpf: Check negative offsets in __bpf_skb_min_len()
	nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"
	nfsd: restore callback functionality for NFSv4.0
	mtd: diskonchip: Cast an operand to prevent potential overflow
	mtd: rawnand: arasan: Fix double assertion of chip-select
	mtd: rawnand: arasan: Fix missing de-registration of NAND
	phy: qcom-qmp: Fix register name in RX Lane config of SC8280XP
	phy: core: Fix an OF node refcount leakage in _of_phy_get()
	phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup()
	phy: core: Fix that API devm_phy_put() fails to release the phy
	phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider
	phy: core: Fix that API devm_phy_destroy() fails to destroy the phy
	phy: usb: Toggle the PHY power during init
	phy: rockchip: samsung-hdptx: Set drvdata before enabling runtime PM
	phy: rockchip: naneng-combphy: fix phy reset
	ALSA: memalloc: prefer dma_mapping_error() over explicit address checking
	dmaengine: mv_xor: fix child node refcount handling in early exit
	dmaengine: dw: Select only supported masters for ACPI devices
	dmaengine: tegra: Return correct DMA status when paused
	dmaengine: amd: qdma: Remove using the private get and set dma_ops APIs
	dmaengine: fsl-edma: implement the cleanup path of fsl_edma3_attach_pd()
	dmaengine: apple-admac: Avoid accessing registers in probe
	dmaengine: at_xdmac: avoid null_prt_deref in at_xdmac_prep_dma_memset
	ASoC: SOF: Intel: hda-dai: Do not release the link DMA on STOP
	platform/chrome: cros_ec_lpc: fix product identity for early Framework Laptops
	mtd: rawnand: fix double free in atmel_pmecc_create_user()
	ASoC: amd: ps: Fix for enabling DMIC on acp63 platform via _DSD entry
	ASoC: Intel: sof_sdw: Fix DMI match for Lenovo 21QA and 21QB
	ASoC: dt-bindings: realtek,rt5645: Fix CPVDD voltage comment
	ASoC: Intel: sof_sdw: Fix DMI match for Lenovo 21Q6 and 21Q7
	powerpc/pseries/vas: Add close() callback in vas_vm_ops struct
	power: supply: bq24190: Fix BQ24296 Vbus regulator support
	stddef: make __struct_group() UAPI C++-friendly
	tracing/kprobe: Make trace_kprobe's module callback called after jump_label update
	watchdog: it87_wdt: add PWRGD enable quirk for Qotom QCML04
	watchdog: rzg2l_wdt: Power on the watchdog domain in the restart handler
	Revert "watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs"
	watchdog: mediatek: Add support for MT6735 TOPRGU/WDT
	scsi: qla1280: Fix hw revision numbering for ISP1020/1040
	scsi: megaraid_sas: Fix for a potential deadlock
	udf: Skip parent dir link count update if corrupted
	udf: Verify inode link counts before performing rename
	ALSA: ump: Don't open legacy substream for an inactive group
	ALSA: ump: Indicate the inactive group in legacy substream names
	ALSA: ump: Update legacy substream names upon FB info update
	ALSA: hda/conexant: fix Z60MR100 startup pop issue
	ALSA: sh: Use standard helper for buffer accesses
	smb: server: Fix building with GCC 15
	regmap: Use correct format specifier for logging range errors
	LoongArch: Fix reserving screen info memory for above-4G firmware
	LoongArch: BPF: Adjust the parameter of emit_jirl()
	platform/x86: asus-nb-wmi: Ignore unknown event 0xCF
	bpf: Zero index arg error string for dynptr and iter
	spi: intel: Add Panther Lake SPI controller support
	scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
	scsi: mpi3mr: Synchronize access to ioctl data buffer
	scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs
	scsi: mpi3mr: Start controller indexing from 0
	scsi: mpi3mr: Handling of fault code for insufficient power
	scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
	ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A
	spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()
	drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()
	virtio-blk: don't keep queue frozen during system suspend
	blk-mq: register cpuhp callback after hctx is added to xarray table
	wifi: iwlwifi: be less noisy if the NIC is dead in S3
	ublk: detach gendisk from ublk device if add_disk() fails
	drm/xe: Take PM ref in delayed snapshot capture worker
	drm/xe: Move the coredump registration to the worker thread
	objtool: Add bch2_trans_unlocked_error() to bcachefs noreturns
	freezer, sched: Report frozen tasks as 'D' instead of 'R'
	dmaengine: loongson2-apb: Change GENMASK to GENMASK_ULL
	perf/x86/intel/uncore: Add Clearwater Forest support
	tracing: Constify string literal data member in struct trace_event_call
	tracing: Prevent bad count for tracing_cpumask_write
	rtla/timerlat: Fix histogram ALL for zero samples
	io_uring/sqpoll: fix sqpoll error handling races
	i2c: microchip-core: actually use repeated sends
	x86/fred: Clear WFE in missing-ENDBRANCH #CPs
	virt: tdx-guest: Just leak decrypted memory on unrecoverable errors
	PCI/MSI: Handle lack of irqdomain gracefully
	perf/x86/intel: Fix bitmask of OCR and FRONTEND events for LNC
	i2c: imx: add imx7d compatible string for applying erratum ERR007805
	i2c: microchip-core: fix "ghost" detections
	perf/x86/intel/ds: Add PEBS format 6
	power: supply: cros_charge-control: add mutex for driver data
	power: supply: cros_charge-control: allow start_threshold == end_threshold
	power: supply: cros_charge-control: hide start threshold on v2 cmd
	power: supply: gpio-charger: Fix set charge current limits
	btrfs: fix race with memory mapped writes when activating swap file
	btrfs: avoid monopolizing a core when activating a swap file
	btrfs: fix swap file activation failure due to extents that used to be shared
	btrfs: fix transaction atomicity bug when enabling simple quotas
	btrfs: sysfs: fix direct super block member reads
	btrfs: fix use-after-free when COWing tree bock and tracing is enabled
	btrfs: check folio mapping after unlock in put_file_data()
	btrfs: check folio mapping after unlock in relocate_one_folio()
	Bluetooth: btusb: mediatek: move Bluetooth power off command position
	Bluetooth: btusb: mediatek: add callback function in btusb_disconnect
	Bluetooth: btusb: mediatek: add intf release flow when usb disconnect
	Bluetooth: btusb: mediatek: change the conditions for ISO interface
	ALSA: ump: Shut up truncated string warning
	ALSA: sh: Fix wrong argument order for copy_from_iter()
	Linux 6.12.8

Change-Id: I2f5b46453984dde6ed8c381109655261a6bc3596
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2025-01-03 07:44:08 +00:00
Lorenzo Stoakes
8cdfb06569 fork: avoid inappropriate uprobe access to invalid mm
[ Upstream commit 8ac662f5da19f5873fdd94c48a5cdb45b2e1b58f ]

If dup_mmap() encounters an issue, currently uprobe is able to access the
relevant mm via the reverse mapping (in build_map_info()), and if we are
very unlucky with a race window, observe invalid XA_ZERO_ENTRY state which
we establish as part of the fork error path.

This occurs because uprobe_write_opcode() invokes anon_vma_prepare() which
in turn invokes find_mergeable_anon_vma() that uses a VMA iterator,
invoking vma_iter_load() which uses the advanced maple tree API and thus
is able to observe XA_ZERO_ENTRY entries added to dup_mmap() in commit
d240629148 ("fork: use __mt_dup() to duplicate maple tree in
dup_mmap()").

This change was made on the assumption that only process tear-down code
would actually observe (and make use of) these values.  However this very
unlikely but still possible edge case with uprobes exists and
unfortunately does make these observable.

The uprobe operation prevents races against the dup_mmap() operation via
the dup_mmap_sem semaphore, which is acquired via uprobe_start_dup_mmap()
and dropped via uprobe_end_dup_mmap(), and held across
register_for_each_vma() prior to invoking build_map_info() which does the
reverse mapping lookup.

Currently these are acquired and dropped within dup_mmap(), which exposes
the race window prior to error handling in the invoking dup_mm() which
tears down the mm.

We can avoid all this by just moving the invocation of
uprobe_start_dup_mmap() and uprobe_end_dup_mmap() up a level to dup_mm()
and only release this lock once the dup_mmap() operation succeeds or clean
up is done.

This means that the uprobe code can never observe an incompletely
constructed mm and resolves the issue in this case.

Link: https://lkml.kernel.org/r/20241210172412.52995-1-lorenzo.stoakes@oracle.com
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: syzbot+2d788f4f7cb660dac4b7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02 10:34:10 +01:00
kuyo chang
7b8d3e27a3 ANDROID: GKI: Add initial dynamically task vendor size flow
UBSAN trigged loading invalid value when CONFIG_PAGE_POISONING=y.
The static vendor data has been initial by android_init_vendor_data.
Add the initial flow for the memory content to zero before vendor use it.

Bug: 383246978

Change-Id: Ic4351dfeda5b9d49cfddeaf0464f9250bed80ffe
Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
Signed-off-by: kuyo chang <kuyo.chang@mediatek.corp-partner.google.com>
[jstultz: Minor cleanup to avoid ifdefs]
Signed-off-by: John Stultz <jstultz@google.com>
2024-12-17 10:15:29 +08:00
Peter Zijlstra
f86b854c98 ANDROID: sched: Add deactivated (sleeping) owner handling to find_proxy_task()
If the blocked_on chain resolves to a sleeping owner, deactivate
the donor task, and enqueue it on the sleeping owner task.
Then re-activate it later when the owner is woken up.

NOTE: This has been particularly challenging to get working
properly, and some of the locking is particularly awkward. I'd
very much appreciate review and feedback for ways to simplify
this.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Change-Id: Ib7e9a793c13465be06a60dbdaff7e97133091e44
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This was broken out from the larger proxy() patch]
Signed-off-by: John Stultz <jstultz@google.com>
Bug: 306081722
---
v5:
* Split out from larger proxy patch
v6:
* Major rework, replacing the single list head per task with
  per-task list head and nodes, creating a tree structure so
  we only wake up descendants of the task woken.
* Reworked the locking to take the task->pi_lock, so we can
  avoid mid-chain wakeup races from try_to_wake_up() called by
  the ww_mutex logic.
v7:
* Drop unnecessary __nested lock annotation, as we already drop
  the lock prior.
* Add comments on #else & #endif lines, and clearer function
  names, and commit message tweaks as suggested by Metin Kaya
* Move activate_blocked_entities() call from ttwu_queue to
  try_to_wake_up() to simplify locking. Thanks to questions from
  Metin Kaya
* Fix irqsave/irqrestore usage now we call this outside where
  the pi_lock is held
* Fix activate_blocked_entitites not preserving wake_cpu
* Fix for UP builds
v8:
* Minor checkpatch fixup
* Drop proxy_deactivate and cleanups suggested by Metin
v9:
* Fix bug causing possibly uninitialized cpu value to be used with
  activate_blocked_entities()
* Improved comment around preserving wake_cpu suggested by Metin
* Add additional lockdep asserts, suggested by Metin
* Tweaked placement of lockdep assert, suggested by Metin
* Fixed comment referring to structure entry name
* Fix to call proxy_resched_idle() _prior_ to calling
  proxy_enqueue_on_owner() where we deactivate the task, this
  avoids stale references to rq_selected() when the task may
  have been migrated to another rq.
* Fix to remove the blocked_head list at the start of
  activate_blocked_entities() so we only do a finite amount
  of work, avoiding a potential livelock of two cpus removing
  and adding tasks to the list at the same time if the owner
  went back to sleep while blocked entities were being woken.
v11:
* Big rework to get rid of recursion. Had to add another list
  item to the task_stuct to do this as we are in atomic context
  and cannot allocate memory while activating blocked entities.
  Will need to watch carefully for bugs, as switching to a
  list_head in the task_struct instead of a pointer on the
  stack opens up the potential for races on the shared state,
  but I think I've got the locking sorted.
* Moved proxy_set_task_cpu helper to earlier in the series
* Minor rework for try_to_deactivate_task changes
* Minor variable name cleanups suggested by Metin
v13:
* Switch to use donor from next for proxy_enqueue_on_owner
* Switch to using block_task instead of deactivate_task
v14:
* Ensure we call block_task() last in proxy_enqueue_on_owner
  and not touch it again to avoid races where it might be
  activated on another cpu
* Make sure we activate blocked_entities when we exit from ttwu
* Fix to enqueue the last task in the chain (p) on the blocked
  owner instead of donor, so that we preserve the chain
  structure so mid-chain wakeups propagate properly
* Rework of sleeping_owner handling so that we properly deal
  with delayed-dequeued (sched_delayed) tasks (also removes
  now unused proxy_deactivate() logic)
2024-12-13 10:01:53 -08:00
John Stultz
95c9e8505a ANDROID: sched: Migrate whole chain in proxy_migrate_task()
Instead of migrating one task each time through find_proxy_task(),
we can walk up the blocked_donor ptrs and migrate the entire
current chain in one go.

This was broken out of earlier patches and held back while the
series was being stabilized, but I wanted to re-introduce it.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Change-Id: Ia920b2d4161b47b10b5d0774fb1e3283e92bbf0f
Signed-off-by: John Stultz <jstultz@google.com>
Bug: 306081722
---
v12:
* Earlier this was re-using blocked_node, but I hit
  a race with activating blocked entities, and to
  avoid it introduced a new migration_node listhead
2024-12-13 10:01:53 -08:00
Peter Zijlstra
465f85fe91 ANDROID: sched: Add blocked_donor link to task for smarter mutex handoffs
Add link to the task this task is proxying for, and use it so
the mutex owner can do an intelligent hand-off of the mutex to
the task that the owner is running on behalf.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Change-Id: Iad6f775f928b9e90e22d1d831aff26f60f37e773
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This patch was split out from larger proxy patch]
Signed-off-by: John Stultz <jstultz@google.com>
Bug: 306081722
---
v5:
* Split out from larger proxy patch
v6:
* Moved proxied value from earlier patch to this one where it
  is actually used
* Rework logic to check sched_proxy_exec() instead of using ifdefs
* Moved comment change to this patch where it makes sense
v7:
* Use more descriptive term then "us" in comments, as suggested
  by Metin Kaya.
* Minor typo fixup from Metin Kaya
* Reworked proxied variable to prev_not_proxied to simplify usage
v8:
* Use helper for donor blocked_on_state transition
v9:
* Re-add mutex lock handoff in the unlock path, but only when we
  have a blocked donor
* Slight reword of commit message suggested by Metin
2024-12-13 10:01:53 -08:00
Peter Zijlstra
484044f3c6 FROMLIST: locking/mutex: Rework task_struct::blocked_on
Track the blocked-on relation for mutexes, to allow following this
relation at schedule time.

   task
     | blocked-on
     v
   mutex
     | owner
     v
   task

Also add a blocked_on_state value so we can distinguish when a
task is blocked_on a mutex, but is either blocked, waking up, or
runnable (such that it can try to acquire the lock its blocked
on).

This avoids some of the subtle & racy games where the blocked_on
state gets cleared, only to have it re-added by the
mutex_lock_slowpath call when it tries to acquire the lock on
wakeup

Also add blocked_lock to the task_struct so we can safely
serialize the blocked-on state.

Finally add wrappers that are useful to provide correctness
checks. Folded in from a patch by:
  Valentin Schneider <valentin.schneider@arm.com>

This all will be used for tracking blocked-task/mutex chains
with the prox-execution patch in a similar fashion to how
priority inheritance is done with rt_mutexes.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Change-Id: I3c88f64c5defe46b7f5ac468048d88dbbd2deb5e
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[minor changes while rebasing]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths]
Signed-off-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/lkml/20241125195204.2374458-3-jstultz@google.com/
Bug: 306081722
---
v2:
* Fixed blocked_on tracking in error paths that was causing crashes
v4:
* Ensure we clear blocked_on when waking ww_mutexes to die or wound.
  This is critical so we don't get circular blocked_on relationships
  that can't be resolved.
v5:
* Fix potential bug where the skip_wait path might clear blocked_on
  when that path never set it
* Slight tweaks to where we set blocked_on to make it consistent,
  along with extra WARN_ON correctness checking
* Minor comment changes
v7:
* Minor commit message change suggested by Metin Kaya
* Fix WARN_ON conditionals in unlock path (as blocked_on might already
  be cleared), found while looking at issue Metin Kaya raised.
* Minor tweaks to be consistent in what we do under the
  blocked_on lock, also tweaked variable name to avoid confusion
  with label, and comment typos, as suggested by Metin Kaya
* Minor tweak for CONFIG_SCHED_PROXY_EXEC name change
* Moved unused block of code to later in the series, as suggested
  by Metin Kaya
* Switch to a tri-state to be able to distinguish from waking and
  runnable so we can later safely do return migration from ttwu
* Folded together with related blocked_on changes
v8:
* Fix issue leaving task BO_BLOCKED when calling into optimistic
  spinning path.
* Include helper to better handle BO_BLOCKED->BO_WAKING transitions
v9:
* Typo fixup pointed out by Metin
* Cleanup BO_WAKING->BO_RUNNABLE transitions for the !proxy case
* Many cleanups and simplifications suggested by Metin
v11:
* Whitespace fixup pointed out by Metin
v13:
* Refactor set_blocked_on helpers clean things up a bit
v14:
* Small build fixup with PREEMPT_RT
2024-12-13 10:01:53 -08:00
Christian Brauner
13111945c2 Revert "fs: don't block i_writecount during exec"
commit 3b832035387ff508fdcf0fba66701afc78f79e3d upstream.

This reverts commit 2a010c4128.

Rui Ueyama <rui314@gmail.com> writes:

> I'm the creator and the maintainer of the mold linker
> (https://github.com/rui314/mold). Recently, we discovered that mold
> started causing process crashes in certain situations due to a change
> in the Linux kernel. Here are the details:
>
> - In general, overwriting an existing file is much faster than
> creating an empty file and writing to it on Linux, so mold attempts to
> reuse an existing executable file if it exists.
>
> - If a program is running, opening the executable file for writing
> previously failed with ETXTBSY. If that happens, mold falls back to
> creating a new file.
>
> - However, the Linux kernel recently changed the behavior so that
> writing to an executable file is now always permitted
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a010c412853).
>
> That caused mold to write to an executable file even if there's a
> process running that file. Since changes to mmap'ed files are
> immediately visible to other processes, any processes running that
> file would almost certainly crash in a very mysterious way.
> Identifying the cause of these random crashes took us a few days.
>
> Rejecting writes to an executable file that is currently running is a
> well-known behavior, and Linux had operated that way for a very long
> time. So, I don’t believe relying on this behavior was our mistake;
> rather, I see this as a regression in the Linux kernel.

Quoting myself from commit 2a010c4128 ("fs: don't block i_writecount during exec")

> Yes, someone in userspace could potentially be relying on this. It's not
> completely out of the realm of possibility but let's find out if that's
> actually the case and not guess.

It seems we found out that someone is relying on this obscure behavior.
So revert the change.

Link: https://github.com/rui314/mold/issues/1361
Link: https://lore.kernel.org/r/4a2bc207-76be-4715-8e12-7fc45a76a125@leemhuis.info
Cc: <stable@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05 14:02:50 +01:00
Matthias Maennich
7fc0276001 Merge 'v6.12-rc6' into android-mainline
Change-Id: I0c3f47fe0cae2b79dc90050b15d424ac8a56d089
Signed-off-by: Matthias Maennich <maennich@google.com>
2024-11-05 00:24:26 +00:00
Linus Torvalds
b019b4a670 Merge tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single fix for posix CPU timers.

  When a thread is cloned, the posix CPU timers are not inherited.

  If the parent has a CPU timer armed the corresponding tick dependency
  in the tasks tick_dep_mask is set and copied to the new thread, which
  means the new thread and all decendants will prevent the system to go
  into full NOHZ operation.

  Clear the tick dependency mask in copy_process() to fix this"

* tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
2024-11-03 08:22:21 -10:00
Liangliang Li
16151a687e ANDROID: vendor_hooks: Add hooks to dup_task_struct
Add hook to dup_task_struct for vendor data fields initialisation.

Bug: 188004638

Change-Id: I4b58604ee822fb8d1e0cc37bec72e820e7318427
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit f66d96b14aab5051fdf6b5054d87362c17a7b365)
(cherry picked from commit bafafe0ec46160573bef46d3d0f5d6c65fadaa3b)
2024-10-30 00:42:30 +00:00
Lorenzo Stoakes
985da552a9 fork: only invoke khugepaged, ksm hooks if no error
There is no reason to invoke these hooks early against an mm that is in an
incomplete state.

The change in commit d240629148 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.

Their placement early in dup_mmap() only appears to have been meaningful
for early error checking, and since functionally it'd require a very small
allocation to fail (in practice 'too small to fail') that'd only occur in
the most dire circumstances, meaning the fork would fail or be OOM'd in
any case.

Since both khugepaged and KSM tracking are there to provide optimisations
to memory performance rather than critical functionality, it doesn't
really matter all that much if, under such dire memory pressure, we fail
to register an mm with these.

As a result, we follow the example of commit d2081b2bf8 ("mm:
khugepaged: make khugepaged_enter() void function") and make ksm_fork() a
void function also.

We only expose the mm to these functions once we are done with them and
only if no error occurred in the fork operation.

Link: https://lkml.kernel.org/r/e0cb8b840c9d1d5a6e84d4f8eff5f3f2022aa10c.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28 21:40:39 -07:00
Lorenzo Stoakes
f64e67e5d3 fork: do not invoke uffd on fork if error occurs
Patch series "fork: do not expose incomplete mm on fork".

During fork we may place the virtual memory address space into an
inconsistent state before the fork operation is complete.

In addition, we may encounter an error during the fork operation that
indicates that the virtual memory address space is invalidated.

As a result, we should not be exposing it in any way to external machinery
that might interact with the mm or VMAs, machinery that is not designed to
deal with incomplete state.

We specifically update the fork logic to defer khugepaged and ksm to the
end of the operation and only to be invoked if no error arose, and
disallow uffd from observing fork events should an error have occurred.


This patch (of 2):

Currently on fork we expose the virtual address space of a process to
userland unconditionally if uffd is registered in VMAs, regardless of
whether an error arose in the fork.

This is performed in dup_userfaultfd_complete() which is invoked
unconditionally, and performs two duties - invoking registered handlers
for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down
userfaultfd_fork_ctx objects established in dup_userfaultfd().

This is problematic, because the virtual address space may not yet be
correctly initialised if an error arose.

The change in commit d240629148 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.

We address this by, on fork error, ensuring that we roll back state that
we would otherwise expect to clean up through the event being handled by
userland and perform the memory freeing duty otherwise performed by
dup_userfaultfd_complete().

We do this by implementing a new function, dup_userfaultfd_fail(), which
performs the same loop, only decrementing reference counts.

Note that we perform mmgrab() on the parent and child mm's, however
userfaultfd_ctx_put() will mmdrop() this once the reference count drops to
zero, so we will avoid memory leaks correctly here.

Link: https://lkml.kernel.org/r/cover.1729014377.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28 21:40:38 -07:00
Benjamin Segall
b5413156ba posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
When cloning a new thread, its posix_cputimers are not inherited, and
are cleared by posix_cputimers_init(). However, this does not clear the
tick dependency it creates in tsk->tick_dep_mask, and the handler does
not reach the code to clear the dependency if there were no timers to
begin with.

Thus if a thread has a cputimer running before clone/fork, all
descendants will prevent nohz_full unless they create a cputimer of
their own.

Fix this by entirely clearing the tick_dep_mask in copy_process().
(There is currently no inherited state that needs a tick dependency)

Process-wide timers do not have this problem because fork does not copy
signal_struct as a baseline, it creates one from scratch.

Fixes: b78783000d ("posix-cpu-timers: Migrate to use new tick dependency mask model")
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/xm26o737bq8o.fsf@google.com
2024-10-27 10:36:04 +01:00
Sai Harshini Nimmala
de863f65b8 ANDROID: GKI: Guard dynamic task_struct size feature with config option
Ensure that dynamic task_struct size feature is enabled only for GKI
platforms. With this patch, non-GKI platforms will not face build issues
anymore due to incorrect configuration earlier.

Bug: 233921394
Fixes: 5e9a8cb714 ("ANDROID: GKI: Add to task_struct size via
cmdline")
Change-Id: Ice341f4826baf8d20a3c846d55db5ea870753c7d
Signed-off-by: Sai Harshini Nimmala <quic_snimmala@quicinc.com>
2024-10-17 15:05:31 -07:00
Sai Harshini Nimmala
5e9a8cb714 ANDROID: GKI: Add to task_struct size via cmdline
To reduce the size of vendor data allocated in the task_struct, from 512
bytes to a significantly lower 48 bytes, the move to a dynamically sized
task_struct is being made.
As part of this effort, provide means for vendors to pass a size value
via kernel cmdline. Use the passed value to dynamically add to the
task_struct size to accommodate vendor data.
The cmdline parameter to be used is 'android_task_struct_vendor_size'.
For eg., vendors can add the following to the bootargs section of their
devicetree to add an extra 512 bytes to the task_struct:
"android_task_struct_vendor_size=512"
To access this additional memory, use the android_task_vendor_data
function provided.

Bug: 233921394
Change-Id: I6d5ab92080b82f29bbe9735d40f7d0b1e5bb5913
Signed-off-by: Sai Harshini Nimmala <quic_snimmala@quicinc.com>
2024-10-15 01:09:54 +00:00
Matthias Maennich
32fec317a6 Merge 8cf0b93919 ("Linux 6.12-rc2") into android-mainline
Bug: 367265496
Change-Id: I5fec4dbf7e9cd941e3fcd8adca6e0d26ba6adbfe
Signed-off-by: Matthias Maennich <maennich@google.com>
2024-10-07 17:20:05 +00:00
Matthias Maennich
0e65cf24a0 Merge aa486552a1 ("Merge tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock") into android-mainline
Steps on the way to 6.12-rc1

Bug: 367265496
Change-Id: I4a4b6fec7b7f189f30a2ce5c650c73d3dda6945d
Signed-off-by: Matthias Maennich <maennich@google.com>
2024-10-03 20:41:35 +00:00
Matthias Maennich
662100c8e6 Merge 88264981f2 ("Merge tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext") into android-mainline
Steps on the way to 6.12-rc1

Bug: 367265496
Change-Id: If7725ee337ef04be805a9677090bbc38b9dc3358
Signed-off-by: Matthias Maennich <maennich@google.com>
2024-09-30 20:27:29 +00:00
Matthias Maennich
e9d92621d7 Merge 7856a56541 ("Merge tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm") into android-mainline
Steps on the way to 6.12-rc1

Bug: 367265496
Change-Id: Ia778d96b2e701765c170e2f4e920e850ceedec0e
Signed-off-by: Matthias Maennich <maennich@google.com>
2024-09-30 16:20:19 +00:00
Al Viro
678379e1d4 close_range(): fix the logics in descriptor table trimming
Cloning a descriptor table picks the size that would cover all currently
opened files.  That's fine for clone() and unshare(), but for close_range()
there's an additional twist - we clone before we close, and it would be
a shame to have
	close_range(3, ~0U, CLOSE_RANGE_UNSHARE)
leave us with a huge descriptor table when we are not going to keep
anything past stderr, just because some large file descriptor used to
be open before our call has taken it out.

Unfortunately, it had been dealt with in an inherently racy way -
sane_fdtable_size() gets a "don't copy anything past that" argument
(passed via unshare_fd() and dup_fd()), close_range() decides how much
should be trimmed and passes that to unshare_fd().

The problem is, a range that used to extend to the end of descriptor
table back when close_range() had looked at it might very well have stuff
grown after it by the time dup_fd() has allocated a new files_struct
and started to figure out the capacity of fdtable to be attached to that.

That leads to interesting pathological cases; at the very least it's a
QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes
a snapshot of descriptor table one might have observed at some point.
Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination
of unshare(CLONE_FILES) with plain close_range(), ending up with a
weird state that would never occur with unshare(2) is confusing, to put
it mildly.

It's not hard to get rid of - all it takes is passing both ends of the
range down to sane_fdtable_size().  There we are under ->files_lock,
so the race is trivially avoided.

So we do the following:
	* switch close_files() from calling unshare_fd() to calling
dup_fd().
	* undo the calling convention change done to unshare_fd() in
60997c3d45 "close_range: add CLOSE_RANGE_UNSHARE"
	* introduce struct fd_range, pass a pointer to that to dup_fd()
and sane_fdtable_size() instead of "trim everything past that point"
they are currently getting.  NULL means "we are not going to be punching
any holes"; NR_OPEN_MAX is gone.
	* make sane_fdtable_size() use find_last_bit() instead of
open-coding it; it's easier to follow that way.
	* while we are at it, have dup_fd() report errors by returning
ERR_PTR(), no need to use a separate int *errorp argument.

Fixes: 60997c3d45 "close_range: add CLOSE_RANGE_UNSHARE"
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-09-29 21:52:29 -04:00
Matthias Maennich
df2ebc4bcb Merge efdfcd40ad ("Merge tag 'lkmm.2024.09.14b' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu") into android-mainline
Steps on the way to 6.12-rc1

Bug: 367265496
Change-Id: I0a0d83175270f57ba857b91e7c1c403e939fa34f
Signed-off-by: Matthias Maennich <maennich@google.com>
2024-09-27 01:47:34 +00:00
Linus Torvalds
aa486552a1 Merge tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:

 - new memblock_estimated_nr_free_pages() helper to replace
   totalram_pages() which is less accurate when
   CONFIG_DEFERRED_STRUCT_PAGE_INIT is set

 - fixes for memblock tests

* tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  s390/mm: get estimated free pages by memblock api
  kernel/fork.c: get estimated free pages by memblock api
  mm/memblock: introduce a new helper memblock_estimated_nr_free_pages()
  memblock test: fix implicit declaration of function 'strscpy'
  memblock test: fix implicit declaration of function 'isspace'
  memblock test: fix implicit declaration of function 'memparse'
  memblock test: add the definition of __setup()
  memblock test: fix implicit declaration of function 'virt_to_phys'
  tools/testing: abstract two init.h into common include directory
  memblock tests: include export.h in linkage.h as kernel dose
  memblock tests: include memory_hotplug.h in mmzone.h as kernel dose
2024-09-25 11:35:19 -07:00
Matthias Maennich
b5aeebd6f1 Merge c903327d32 ("Merge tag 'printk-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux") into android-mainline
Steps on the way to 6.12-rc1

Bug: 367265496
Change-Id: I0d94aa9be16f183bf187f91dc4916add32722775
Signed-off-by: Matthias Maennich <maennich@google.com>
2024-09-25 08:51:49 +00:00
Linus Torvalds
88264981f2 Merge tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext support from Tejun Heo:
 "This implements a new scheduler class called ‘ext_sched_class’, or
  sched_ext, which allows scheduling policies to be implemented as BPF
  programs.

  The goals of this are:

   - Ease of experimentation and exploration: Enabling rapid iteration
     of new scheduling policies.

   - Customization: Building application-specific schedulers which
     implement policies that are not applicable to general-purpose
     schedulers.

   - Rapid scheduler deployments: Non-disruptive swap outs of scheduling
     policies in production environments"

See individual commits for more documentation, but also the cover letter
for the latest series:

Link: https://lore.kernel.org/all/20240618212056.2833381-1-tj@kernel.org/

* tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (110 commits)
  sched: Move update_other_load_avgs() to kernel/sched/pelt.c
  sched_ext: Don't trigger ops.quiescent/runnable() on migrations
  sched_ext: Synchronize bypass state changes with rq lock
  scx_qmap: Implement highpri boosting
  sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()
  sched_ext: Compact struct bpf_iter_scx_dsq_kern
  sched_ext: Replace consume_local_task() with move_local_task_to_local_dsq()
  sched_ext: Move consume_local_task() upward
  sched_ext: Move sanity check and dsq_mod_nr() into task_unlink_from_dsq()
  sched_ext: Reorder args for consume_local/remote_task()
  sched_ext: Restructure dispatch_to_local_dsq()
  sched_ext: Fix processs_ddsp_deferred_locals() by unifying DTL_INVALID handling
  sched_ext: Make find_dsq_for_dispatch() handle SCX_DSQ_LOCAL_ON
  sched_ext: Refactor consume_remote_task()
  sched_ext: Rename scx_kfunc_set_sleepable to unlocked and relocate
  sched_ext: Add missing static to scx_dump_data
  sched_ext: Add missing static to scx_has_op[]
  sched_ext: Temporarily work around pick_task_scx() being called without balance_scx()
  sched_ext: Add a cgroup scheduler which uses flattened hierarchy
  sched_ext: Add cgroup support
  ...
2024-09-21 09:44:57 -07:00
Linus Torvalds
617a814f14 Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
 "Along with the usual shower of singleton patches, notable patch series
  in this pull request are:

   - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
     consistency to the APIs and behaviour of these two core allocation
     functions. This also simplifies/enables Rustification.

   - "Some cleanups for shmem" from Baolin Wang. No functional changes -
     mode code reuse, better function naming, logic simplifications.

   - "mm: some small page fault cleanups" from Josef Bacik. No
     functional changes - code cleanups only.

   - "Various memory tiering fixes" from Zi Yan. A small fix and a
     little cleanup.

   - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
     simplifications and .text shrinkage.

   - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
     Butt. This is a feature, it adds new feilds to /proc/vmstat such as

       $ grep kstack /proc/vmstat
       kstack_1k 3
       kstack_2k 188
       kstack_4k 11391
       kstack_8k 243
       kstack_16k 0

     which tells us that 11391 processes used 4k of stack while none at
     all used 16k. Useful for some system tuning things, but
     partivularly useful for "the dynamic kernel stack project".

   - "kmemleak: support for percpu memory leak detect" from Pavel
     Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.

   - "mm: memcg: page counters optimizations" from Roman Gushchin. "3
     independent small optimizations of page counters".

   - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
     David Hildenbrand. Improves PTE/PMD splitlock detection, makes
     powerpc/8xx work correctly by design rather than by accident.

   - "mm: remove arch_make_page_accessible()" from David Hildenbrand.
     Some folio conversions which make arch_make_page_accessible()
     unneeded.

   - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
     Finkel. Cleans up and fixes our handling of the resetting of the
     cgroup/process peak-memory-use detector.

   - "Make core VMA operations internal and testable" from Lorenzo
     Stoakes. Rationalizaion and encapsulation of the VMA manipulation
     APIs. With a view to better enable testing of the VMA functions,
     even from a userspace-only harness.

   - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
     issues in the zswap global shrinker, resulting in improved
     performance.

   - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
     in some missing info in /proc/zoneinfo.

   - "mm: replace follow_page() by folio_walk" from David Hildenbrand.
     Code cleanups and rationalizations (conversion to folio_walk())
     resulting in the removal of follow_page().

   - "improving dynamic zswap shrinker protection scheme" from Nhat
     Pham. Some tuning to improve zswap's dynamic shrinker. Significant
     reductions in swapin and improvements in performance are shown.

   - "mm: Fix several issues with unaccepted memory" from Kirill
     Shutemov. Improvements to the new unaccepted memory feature,

   - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
     DAX PUDs. This was missing, although nobody seems to have notied
     yet.

   - "Introduce a store type enum for the Maple tree" from Sidhartha
     Kumar. Cleanups and modest performance improvements for the maple
     tree library code.

   - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
     more cgroup v1 remnants away from the v2 memcg code.

   - "memcg: initiate deprecation of v1 features" from Shakeel Butt.
     Adds various warnings telling users that memcg v1 features are
     deprecated.

   - "mm: swap: mTHP swap allocator base on swap cluster order" from
     Chris Li. Greatly improves the success rate of the mTHP swap
     allocation.

   - "mm: introduce numa_memblks" from Mike Rapoport. Moves various
     disparate per-arch implementations of numa_memblk code into generic
     code.

   - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
     improves the performance of munmap() of swap-filled ptes.

   - "support large folio swap-out and swap-in for shmem" from Baolin
     Wang. With this series we no longer split shmem large folios into
     simgle-page folios when swapping out shmem.

   - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
     performance improvements and code reductions for gigantic folios.

   - "support shmem mTHP collapse" from Baolin Wang. Adds support for
     khugepaged's collapsing of shmem mTHP folios.

   - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
     performance regression due to the addition of mseal().

   - "Increase the number of bits available in page_type" from Matthew
     Wilcox. Increases the number of bits available in page_type!

   - "Simplify the page flags a little" from Matthew Wilcox. Many legacy
     page flags are now folio flags, so the page-based flags and their
     accessors/mutators can be removed.

   - "mm: store zero pages to be swapped out in a bitmap" from Usama
     Arif. An optimization which permits us to avoid writing/reading
     zero-filled zswap pages to backing store.

   - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
     window which occurs when a MAP_FIXED operqtion is occurring during
     an unrelated vma tree walk.

   - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
     the vma_merge() functionality, making ot cleaner, more testable and
     better tested.

   - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
     Minor fixups of DAMON selftests and kunit tests.

   - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
     Code cleanups and folio conversions.

   - "Shmem mTHP controls and stats improvements" from Ryan Roberts.
     Cleanups for shmem controls and stats.

   - "mm: count the number of anonymous THPs per size" from Barry Song.
     Expose additional anon THP stats to userspace for improved tuning.

   - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
     folio conversions and removal of now-unused page-based APIs.

   - "replace per-quota region priorities histogram buffer with
     per-context one" from SeongJae Park. DAMON histogram
     rationalization.

   - "Docs/damon: update GitHub repo URLs and maintainer-profile" from
     SeongJae Park. DAMON documentation updates.

   - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
     improve related doc and warn" from Jason Wang: fixes usage of page
     allocator __GFP_NOFAIL and GFP_ATOMIC flags.

   - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
     This was overprovisioning THPs in sparsely accessed memory areas.

   - "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
     Add support for zram run-time compression algorithm tuning.

   - "mm: Care about shadow stack guard gap when getting an unmapped
     area" from Mark Brown. Fix up the various arch_get_unmapped_area()
     implementations to better respect guard areas.

   - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
     of mem_cgroup_iter() and various code cleanups.

   - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
     pfnmap support.

   - "resource: Fix region_intersects() vs add_memory_driver_managed()"
     from Huang Ying. Fix a bug in region_intersects() for systems with
     CXL memory.

   - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
     a couple more code paths to correctly recover from the encountering
     of poisoned memry.

   - "mm: enable large folios swap-in support" from Barry Song. Support
     the swapin of mTHP memory into appropriately-sized folios, rather
     than into single-page folios"

* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
  zram: free secondary algorithms names
  uprobes: turn xol_area->pages[2] into xol_area->page
  uprobes: introduce the global struct vm_special_mapping xol_mapping
  Revert "uprobes: use vm_special_mapping close() functionality"
  mm: support large folios swap-in for sync io devices
  mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
  mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
  mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
  set_memory: add __must_check to generic stubs
  mm/vma: return the exact errno in vms_gather_munmap_vmas()
  memcg: cleanup with !CONFIG_MEMCG_V1
  mm/show_mem.c: report alloc tags in human readable units
  mm: support poison recovery from copy_present_page()
  mm: support poison recovery from do_cow_fault()
  resource, kunit: add test case for region_intersects()
  resource: make alloc_free_mem_region() works for iomem_resource
  mm: z3fold: deprecate CONFIG_Z3FOLD
  vfio/pci: implement huge_fault support
  mm/arm64: support large pfn mappings
  mm/x86: support large pfn mappings
  ...
2024-09-21 07:29:05 -07:00
Linus Torvalds
78567e2bc7 Merge tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - cpuset isolation improvements

 - cpuset cgroup1 support is split into its own file behind the new
   config option CONFIG_CPUSET_V1. This makes it the second controller
   which makes cgroup1 support optional after memcg

 - Handling of unavailable v1 controller handling improved during
   cgroup1 mount operations

 - union_find applied to cpuset. It makes code simpler and more
   efficient

 - Reduce spurious events in pids.events

 - Cleanups and other misc changes

 - Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes
   that further changes build upon

* tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (34 commits)
  cgroup: Do not report unavailable v1 controllers in /proc/cgroups
  cgroup: Disallow mounting v1 hierarchies without controller implementation
  cgroup/cpuset: Expose cpuset filesystem with cpuset v1 only
  cgroup/cpuset: Move cpu.h include to cpuset-internal.h
  cgroup/cpuset: add sefltest for cpuset v1
  cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1
  cgroup/cpuset: rename functions shared between v1 and v2
  cgroup/cpuset: move v1 interfaces to cpuset-v1.c
  cgroup/cpuset: move validate_change_legacy to cpuset-v1.c
  cgroup/cpuset: move legacy hotplug update to cpuset-v1.c
  cgroup/cpuset: add callback_lock helper
  cgroup/cpuset: move memory_spread to cpuset-v1.c
  cgroup/cpuset: move relax_domain_level to cpuset-v1.c
  cgroup/cpuset: move memory_pressure to cpuset-v1.c
  cgroup/cpuset: move common code to cpuset-internal.h
  cgroup/cpuset: introduce cpuset-v1.c
  selftest/cgroup: Make test_cpuset_prs.sh deal with pre-isolated CPUs
  cgroup/cpuset: Account for boot time isolated CPUs
  cgroup/cpuset: remove use_parent_ecpus of cpuset
  cgroup/cpuset: remove fetch_xcpus
  ...
2024-09-18 06:39:03 +02:00
Oleg Nesterov
ed8d5b0ce1 Revert "uprobes: use vm_special_mapping close() functionality"
This reverts commit 08e28de116.

A malicious application can munmap() its "[uprobes]" vma and in this case
xol_mapping.close == uprobe_clear_state() will free the memory which can
be used by another thread, or the same thread when it hits the uprobe bp
afterwards.

Link: https://lkml.kernel.org/r/20240911131320.GA3448@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 01:07:01 -07:00
Sven Schnelle
08e28de116 uprobes: use vm_special_mapping close() functionality
The following KASAN splat was shown:

[   44.505448] ==================================================================                                                                      20:37:27 [3421/145075]
[   44.505455] BUG: KASAN: slab-use-after-free in special_mapping_close+0x9c/0xc8
[   44.505471] Read of size 8 at addr 00000000868dac48 by task sh/1384
[   44.505479]
[   44.505486] CPU: 51 UID: 0 PID: 1384 Comm: sh Not tainted 6.11.0-rc6-next-20240902-dirty #1496
[   44.505503] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
[   44.505508] Call Trace:
[   44.505511]  [<000b0324d2f78080>] dump_stack_lvl+0xd0/0x108
[   44.505521]  [<000b0324d2f5435c>] print_address_description.constprop.0+0x34/0x2e0
[   44.505529]  [<000b0324d2f5464c>] print_report+0x44/0x138
[   44.505536]  [<000b0324d1383192>] kasan_report+0xc2/0x140
[   44.505543]  [<000b0324d2f52904>] special_mapping_close+0x9c/0xc8
[   44.505550]  [<000b0324d12c7978>] remove_vma+0x78/0x120
[   44.505557]  [<000b0324d128a2c6>] exit_mmap+0x326/0x750
[   44.505563]  [<000b0324d0ba655a>] __mmput+0x9a/0x370
[   44.505570]  [<000b0324d0bbfbe0>] exit_mm+0x240/0x340
[   44.505575]  [<000b0324d0bc0228>] do_exit+0x548/0xd70
[   44.505580]  [<000b0324d0bc1102>] do_group_exit+0x132/0x390
[   44.505586]  [<000b0324d0bc13b6>] __s390x_sys_exit_group+0x56/0x60
[   44.505592]  [<000b0324d0adcbd6>] do_syscall+0x2f6/0x430
[   44.505599]  [<000b0324d2f78434>] __do_syscall+0xa4/0x170
[   44.505606]  [<000b0324d2f9454c>] system_call+0x74/0x98
[   44.505614]
[   44.505616] Allocated by task 1384:
[   44.505621]  kasan_save_stack+0x40/0x70
[   44.505630]  kasan_save_track+0x28/0x40
[   44.505636]  __kasan_kmalloc+0xa0/0xc0
[   44.505642]  __create_xol_area+0xfa/0x410
[   44.505648]  get_xol_area+0xb0/0xf0
[   44.505652]  uprobe_notify_resume+0x27a/0x470
[   44.505657]  irqentry_exit_to_user_mode+0x15e/0x1d0
[   44.505664]  pgm_check_handler+0x122/0x170
[   44.505670]
[   44.505672] Freed by task 1384:
[   44.505676]  kasan_save_stack+0x40/0x70
[   44.505682]  kasan_save_track+0x28/0x40
[   44.505687]  kasan_save_free_info+0x4a/0x70
[   44.505693]  __kasan_slab_free+0x5a/0x70
[   44.505698]  kfree+0xe8/0x3f0
[   44.505704]  __mmput+0x20/0x370
[   44.505709]  exit_mm+0x240/0x340
[   44.505713]  do_exit+0x548/0xd70
[   44.505718]  do_group_exit+0x132/0x390
[   44.505722]  __s390x_sys_exit_group+0x56/0x60
[   44.505727]  do_syscall+0x2f6/0x430
[   44.505732]  __do_syscall+0xa4/0x170
[   44.505738]  system_call+0x74/0x98

The problem is that uprobe_clear_state() kfree's struct xol_area, which
contains struct vm_special_mapping *xol_mapping. This one is passed to
_install_special_mapping() in xol_add_vma().
__mput reads:

static inline void __mmput(struct mm_struct *mm)
{
        VM_BUG_ON(atomic_read(&mm->mm_users));

        uprobe_clear_state(mm);
        exit_aio(mm);
        ksm_exit(mm);
        khugepaged_exit(mm); /* must run before exit_mmap */
        exit_mmap(mm);
        ...
}

So uprobe_clear_state() in the beginning free's the memory area
containing the vm_special_mapping data, but exit_mmap() uses this
address later via vma->vm_private_data (which was set in
_install_special_mapping().

Fix this by moving uprobe_clear_state() to uprobes.c and use it as
close() callback.

[usama.anjum@collabora.com: remove unneeded condition]
  Link: https://lkml.kernel.org/r/20240906101825.177490-1-usama.anjum@collabora.com
Link: https://lkml.kernel.org/r/20240903073629.2442754-1-svens@linux.ibm.com
Fixes: 223febc6e5 ("mm: add optional close() to struct vm_special_mapping")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:14 -07:00
Lee Jones
6bf9b9c6e9 Merge tag 'v6.11-rc6' into android-mainline
Linux 6.11-rc6

Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I321f364a91703f6814332ef96c1d9ae3747625af
2024-09-05 10:16:50 +00:00
Tejun Heo
649e980dad Merge branch 'bpf/master' into for-6.12
Pull bpf/master to receive baebe9aaba ("bpf: allow passing struct
bpf_iter_<type> as kfunc arguments") and related changes in preparation for
the DSQ iterator patchset.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-04 11:41:32 -10:00
Lee Jones
8e0dce3251 Merge tag 'v6.11-rc4' into android-mainline
Linux 6.11-rc4

Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Icd84f7f6bed0651850e3f9c98898d8ab444271da
2024-09-03 07:16:47 +00:00