ack-tegra

Author	SHA1	Message	Date
Damien Le Moal	5d1966b61d	FROMGIT: block: Introduce bio_needs_zone_write_plugging() In preparation for fixing device mapper zone write handling, introduce the inline helper function bio_needs_zone_write_plugging() to test if a BIO requires handling through zone write plugging using the function blk_zone_plug_bio(). This function returns true for any write (op_is_write(bio) == true) operation directed at a zoned block device using zone write plugging, that is, a block device with a disk that has a zone write plug hash table. This helper allows simplifying the check on entry to blk_zone_plug_bio() and used in to protect calls to it for blk-mq devices and DM devices. Fixes: `f211268ed1` ("dm: Use the block layer zone append emulation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250625093327.548866-3-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Bug: 417517944 Change-Id: I9628b14d4fe0e1f964d4036178fbc6ee49b3be78 (cherry picked from commit bf7a8b5cbbb2d531f3336be2186af0c5590d157c git://git.kernel.dk/linux-block for-next) Signed-off-by: Bart Van Assche <bvanassche@google.com>	2025-06-26 14:10:08 -07:00
Pankaj Raghav	30ce6652ee	ANDROID: block: Support npo2 zone sizes Checking if a given sector is aligned to a zone is a common operation that is performed for zoned devices. Add bdev_is_zone_start helper to check for this instead of opencoding it everywhere. Convert the calculations on zone size to be generic instead of relying on power-of-2(po2) based arithmetic in the block layer using the helpers wherever possible. The only hot path affected by this change for zoned devices with po2 zone size is in blk_check_zone_append() but bdev_is_zone_start() helper is used to optimize the calculation for po2 zone sizes. Finally, allow zoned devices with non po2 zone sizes provided that their zone capacity and zone size are equal. The main motivation to allow zoned devices with non po2 zone size is to remove the unmapped LBA between zone capacity and zone size for devices that cannot have a po2 zone capacity. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Bug: 269471019 Bug: 415836627 Link: https://lore.kernel.org/linux-block/20220923173618.6899-4-p.raghav@samsung.com/ Change-Id: I2ecc186d7b14f5508b6abfe9821526d39a21d7e4 [ bvanassche: ported this patch to kernel 6.12 ] Signed-off-by: Bart Van Assche <bvanassche@google.com>	2025-05-16 15:54:49 -07:00
Christoph Hellwig	574e0848d2	UPSTREAM: block: add a queue_limits_commit_update_frozen helper Add a helper that freezes the queue, updates the queue limits and unfreezes the queue and convert all open coded versions of that to the new helper. Change-Id: I38b3dae3012fdbeaccbc12d17e1b19c7f31db8fa Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250110054726.1499538-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit aa427d7b73b196f657d6d2cf0e94eff6b883fdef) Signed-off-by: Bart Van Assche <bvanassche@google.com>	2025-05-16 15:54:49 -07:00
Ming Lei	26febb7cde	UPSTREAM: block: track queue dying state automatically for modeling queue freeze lockdep Now we only verify the outmost freeze & unfreeze in current context in case that !q->mq_freeze_depth, so it is reliable to save queue lying state when we want to lock the freeze queue since the state is one per-task variable now. Change-Id: Ic11e09d92c00c4b5080fbe4cd7cfa50e808096f7 Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Bug: 415836627 (cherry picked from commit f6661b1d0525f3764596a1b65eeed9e75aecafa7) Signed-off-by: Bart Van Assche <bvanassche@google.com>	2025-05-16 12:17:58 +00:00
Ming Lei	752dff69ae	UPSTREAM: block: track disk DEAD state automatically for modeling queue freeze lockdep Now we only verify the outmost freeze & unfreeze in current context in case that !q->mq_freeze_depth, so it is reliable to save disk DEAD state when we want to lock the freeze queue since the state is one per-task variable now. Doing this way can kill lots of false positive when freeze queue is called before adding disk[1]. [1] https://lore.kernel.org/linux-block/6741f6b2.050a0220.1cc393.0017.GAE@google.com/ Change-Id: I1b0331f5863865d05ac2d719cd314addfed23838 Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Bug: 415836627 (cherry picked from commit 6f491a8d4b92d1a840fd9209cba783c84437d0b7) Signed-off-by: Bart Van Assche <bvanassche@google.com>	2025-05-16 12:17:57 +00:00
Christoph Hellwig	24f685a927	UPSTREAM: block: add a rq_list type Replace the semi-open coded request list helpers with a proper rq_list type that mirrors the bio_list and has head and tail pointers. Besides better type safety this actually allows to insert at the tail of the list, which will be useful soon. Change-Id: Ia470736d0468c265f5b61cb9d8a0e5544b6b7b0d Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Bug: 415836627 (cherry picked from commit a3396b99990d8b4e5797e7b16fdeb64c15ae97bb) Signed-off-by: Bart Van Assche <bvanassche@google.com>	2025-05-16 12:17:57 +00:00
Greg Kroah-Hartman	75adb09e2f	ANDROID: GKI: the "reusachtig" padding sync with android16-6.12 Add the initial set of ABI padding fields in android16-6.12 based on what is in the android15-6.6 branch. Bug: 151154716 Change-Id: Icdb394863b2911389bfdced0fd1ea20236ca4ce1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com>	2025-05-16 12:17:56 +00:00
Bart Van Assche	996a35040a	FROMLIST: dm-zone: Use bdev_*() helper functions where applicable Improve code readability by using bdev_is_zone_aligned() and bdev_offset_from_zone_start() where applicable. No functionality has been changed. This patch is a reworked version of a patch from Pankaj Raghav. See also https://lore.kernel.org/linux-block/20220923173618.6899-11-p.raghav@samsung.com/. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Change-Id: Iddeca794a7b695a414cffdaf7442e5595523792f Signed-off-by: Bart Van Assche <bvanassche@acm.org> Bug: 415836627 Link: https://lore.kernel.org/dm-devel/20250514205033.2108129-1-bvanassche@acm.org/ Signed-off-by: Bart Van Assche <bvanassche@google.com>	2025-05-15 19:45:33 -07:00
Wang Jianzheng	51d17a187f	ANDROID: Block: add OEM data to struct gendisk In order to alleviate the issue of priority inversion caused by the lock_page, it is proposed to add oem_data to the struct gendisk to store a pointer to its struct block_device . This will allow us to check its priority through a customized scheduler hook when lock_folio. Bug: 338959088 Bug: 407947260 Change-Id: I118ef11cb89a3fad9a15a2c3b8383d42be0fded4 Signed-off-by: Wang Jianzheng <11134417@vivo.corp-partner.google.com> (cherry picked from commit feb92ccf10bce90739b5f51cc33d1bd6f16d7fab) Signed-off-by: ying zuxin <11154159@vivo.com>	2025-04-16 13:39:52 +00:00
Ming Lei	aa91035a5f	BACKPORT: block: make segment size limit workable for > 4K PAGE_SIZE Using PAGE_SIZE as a minimum expected DMA segment size in consideration of devices which have a max DMA segment size of < 64k when used on 64k PAGE_SIZE systems leads to devices not being able to probe such as eMMC and Exynos UFS controller [0] [1] you can end up with a probe failure as follows: WARNING: CPU: 2 PID: 397 at block/blk-settings.c:339 blk_validate_limits+0x364/0x3c0 Ensure we use min(max_seg_size, seg_boundary_mask + 1) as the new min segment size when max segment size is < PAGE_SIZE for 16k and 64k base page size systems. If anyone need to backport this patch, the following commits are depended: commit 6aeb4f836480 ("block: remove bio_add_pc_page") commit 02ee5d69e3ba ("block: remove blk_rq_bio_prep") commit b7175e24d6ac ("block: add a dma mapping iterator") Bug: 399192075 Signed-off-by: Sandeep Dhavale <dhavale@google.com> Link: https://lore.kernel.org/linux-block/20230612203314.17820-1-bvanassche@acm.org/ # [0] Link: https://lore.kernel.org/linux-block/1d55e942-5150-de4c-3a02-c3d066f87028@acm.org/ # [1] Cc: Yi Zhang <yi.zhang@redhat.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Keith Busch <kbusch@kernel.org> Tested-by: Paul Bunyan <pbunyan@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250225022141.2154581-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 889c57066ceee5e9172232da0608a8ac053bb6e5) Signed-off-by: Sandeep Dhavale <dhavale@google.com> [dhavale: resolved minor conflict in block/blk.h] Change-Id: I5fe54dd8c73621259cbd9720b77253d8a2af29c7	2025-03-19 13:28:29 +00:00
Sandeep Dhavale	d76170f735	Revert "ANDROID: block: Support configuring limits below the page size" This reverts commit `6e8ff6954a`. Bug: 399192075 Changes related to ANDROID's implementation of small segment size for block device are being reverted in order to backport upstream accepted solution at https://lore.kernel.org/linux-block/20250225022141.2154581-1-ming.lei@redhat.com/ Change-Id: If6954197beb13ddac565392c540bdf1ca6795acf Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2025-03-19 13:28:28 +00:00
Damien Le Moal	4095790470	UPSTREAM: block: Remove zone write plugs when handling native zone append writes commit a6aa36e957a1bfb5341986dec32d013d23228fe1 upstream. For devices that natively support zone append operations, REQ_OP_ZONE_APPEND BIOs are not processed through zone write plugging and are immediately issued to the zoned device. This means that there is no write pointer offset tracking done for these operations and that a zone write plug is not necessary. However, when receiving a zone append BIO, we may already have a zone write plug for the target zone if that zone was previously partially written using regular write operations. In such case, since the write pointer offset of the zone write plug is not incremented by the amount of sectors appended to the zone, 2 issues arise: 1) we risk leaving the plug in the disk hash table if the zone is fully written using zone append or regular write operations, because the write pointer offset will never reach the "zone full" state. 2) Regular write operations that are issued after zone append operations will always be failed by blk_zone_wplug_prepare_bio() as the write pointer alignment check will fail, even if the user correctly accounted for the zone append operations and issued the regular writes with a correct sector. Avoid these issues by immediately removing the zone write plug of zones that are the target of zone append operations when blk_zone_plug_bio() is called. The new function blk_zone_wplug_handle_native_zone_append() implements this for devices that natively support zone append. The removal of the zone write plug using disk_remove_zone_wplug() requires aborting all plugged regular write using disk_zone_wplug_abort() as otherwise the plugged write BIOs would never be executed (with the plug removed, the completion path will never see again the zone write plug as disk_get_zone_wplug() will return NULL). Rate-limited warnings are added to blk_zone_wplug_handle_native_zone_append() and to disk_zone_wplug_abort() to signal this. Since blk_zone_wplug_handle_native_zone_append() is called in the hot path for operations that will not be plugged, disk_get_zone_wplug() is optimized under the assumption that a user issuing zone append operations is not at the same time issuing regular writes and that there are no hashed zone write plugs. The struct gendisk atomic counter nr_zone_wplugs is added to check this, with this counter incremented in disk_insert_zone_wplug() and decremented in disk_remove_zone_wplug(). To be consistent with this fix, we do not need to fill the zone write plug hash table with zone write plugs for zones that are partially written for a device that supports native zone append operations. So modify blk_revalidate_seq_zone() to return early to avoid allocating and inserting a zone write plug for partially written sequential zones if the device natively supports zone append. Reported-by: Jorgen Hansen <Jorgen.Hansen@wdc.com> Fixes: `9b1ce7f0c6` ("block: Implement zone append emulation") Cc: stable@vger.kernel.org Change-Id: If7a37be9828e0d59ff68c7b7db4f30a9a10ede89 Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Jorgen Hansen <Jorgen.Hansen@wdc.com> Link: https://lore.kernel.org/r/20250214041434.82564-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `2f572c42bb`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2025-03-19 13:28:26 +00:00
Greg Kroah-Hartman	b5fd1cdaf6	Revert "block: Remove zone write plugs when handling native zone append writes" This reverts commit `2f572c42bb` which is commit a6aa36e957a1bfb5341986dec32d013d23228fe1 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I48f47a48084edfbca1f6e07fdde108f9c164aacf Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2025-03-10 15:17:11 +00:00
Greg Kroah-Hartman	0a0ca652b6	Merge 6.12.18 into android16-6.12 GKI (arm64) relevant 37 out of 149 changes, affecting 60 files +390/-338 `659bfea591` scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out() [1 file, +4/-4] `3594aad97e` ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_up [1 file, +1/-1] `a3ae6a60ba` SUNRPC: Prevent looping due to rpc_signal_task() races [3 files, +2/-6] `b5038504da` scsi: core: Clear driver private data when retrying request [1 file, +7/-7] `465a814323` scsi: ufs: core: Set default runtime/system PM levels before ufshcd_hba_init() [1 file, +15/-15] `ee5d6cb5cc` ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports [1 file, +1/-1] `5c9921f1da` Bluetooth: L2CAP: Fix L2CAP_ECRED_CONN_RSP response [1 file, +7/-2] `f22df335b2` net: loopback: Avoid sending IP packets without an Ethernet header [1 file, +14/-0] `915d64a78f` net: set the minimum for net_hotdata.netdev_budget_usecs [1 file, +2/-1] `db8b2a613d` ipv4: Convert icmp_route_lookup() to dscp_t. [1 file, +9/-10] `97c455c3c2` ipv4: Convert ip_route_input() to dscp_t. [6 files, +18/-9] `8ffd0390fc` ipvs: Always clear ipvs_property flag in skb_scrub_packet() [1 file, +1/-1] `c417b1e4d8` tcp: devmem: don't write truncated dmabuf CMSGs to userspace [3 files, +22/-16] `33d782e38d` tcp: Defer ts_recent changes until req is owned [1 file, +4/-6] `902d576296` net: Clear old fragment checksum value in napi_reuse_skb [1 file, +1/-0] `806437d047` thermal: gov_power_allocator: Fix incorrect calculation in divvy_up_power() [1 file, +1/-1] `7d582eb6e4` perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list [1 file, +9/-2] `13cca2b73e` uprobes: Reject the shared zeropage in uprobe_write_opcode() [1 file, +5/-0] `07a82c78d8` thermal: of: Simplify thermal_of_should_bind with scoped for each OF child [1 file, +2/-3] `e11df3bffd` thermal/of: Fix cdev lookup in thermal_of_should_bind() [1 file, +29/-21] `19cd2dc4d4` thermal: core: Move lists of thermal instances to trip descriptors [7 files, +62/-64] `27a144c3be` thermal: gov_power_allocator: Update total_weight on bind and cdev updates [1 file, +22/-8] `546c19eb69` io_uring/net: save msg_control for compat [1 file, +3/-1] `8cc451444c` unreachable: Unify [2 files, +7/-15] `2cfd0e5084` objtool: Remove annotate_{,un}reachable() [2 files, +2/-68] `a00e900c9b` objtool: Fix C jump table annotations for Clang [3 files, +6/-5] `435d2964af` tracing: Fix bad hist from corrupting named_triggers list [1 file, +15/-15] `8e31d9fb2f` ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2 [1 file, +1/-0] `b9de147b2c` KVM: arm64: Ensure a VMID is allocated before programming VTTBR_EL2 [3 files, +14/-21] `a2475ccad6` perf/core: Add RCU read lock protection to perf_iterate_ctx() [1 file, +2/-1] `322cb23e24` perf/core: Fix low freq setting via IOC_PERIOD [1 file, +9/-8] `8f6369c3cd` arm64/mm: Fix Boot panic on Ampere Altra [1 file, +1/-6] `2f572c42bb` block: Remove zone write plugs when handling native zone append writes [2 files, +73/-10] `29b6d5ad3e` rcuref: Plug slowpath race in rcuref_put() [2 files, +8/-6] `0362847c52` sched/core: Prevent rescheduling when interrupts are disabled [1 file, +1/-1] `59455f968c` scsi: ufs: core: bsg: Fix crash when arpmb command fails [1 file, +4/-2] `72cbaf8b41` thermal: gov_power_allocator: Add missing NULL pointer check [1 file, +6/-1] Changes in 6.12.18 RDMA/mlx5: Fix the recovery flow of the UMR QP IB/mlx5: Set and get correct qp_num for a DCT QP RDMA/mlx5: Fix a race for DMABUF MR which can lead to CQE with error RDMA/mlx5: Fix a WARN during dereg_mr for DM type RDMA/mana_ib: Allocate PAGE aligned doorbell index RDMA/hns: Fix mbox timing out by adding retry mechanism RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved RDMA/bnxt_re: Refactor NQ allocation RDMA/bnxt_re: Cache MSIx info to a local structure RDMA/bnxt_re: Add sanity checks on rdev validity RDMA/bnxt_re: Allocate dev_attr information dynamically RDMA/bnxt_re: Fix the statistics for Gen P7 VF landlock: Fix non-TCP sockets restriction scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out() ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_up NFS: O_DIRECT writes must check and adjust the file length NFS: Adjust delegated timestamps for O_DIRECT reads and writes SUNRPC: Prevent looping due to rpc_signal_task() races NFSv4: Fix a deadlock when recovering state on a sillyrenamed file SUNRPC: Handle -ETIMEDOUT return from tlshd RDMA/mlx5: Fix implicit ODP hang on parent deregistration RDMA/mlx5: Fix AH static rate parsing scsi: core: Clear driver private data when retrying request scsi: ufs: core: Set default runtime/system PM levels before ufshcd_hba_init() RDMA/mlx5: Fix bind QP error cleanup flow RDMA/bnxt_re: Fix the page details for the srq created by kernel consumers sunrpc: suppress warnings for unused procfs functions ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports Bluetooth: L2CAP: Fix L2CAP_ECRED_CONN_RSP response rxrpc: rxperf: Fix missing decoding of terminal magic cookie afs: Fix the server_list to unuse a displaced server rather than putting it afs: Give an afs_server object a ref on the afs_cell object it points to net: loopback: Avoid sending IP packets without an Ethernet header net: set the minimum for net_hotdata.netdev_budget_usecs ipv4: Convert icmp_route_lookup() to dscp_t. ipv4: Convert ip_route_input() to dscp_t. ipvlan: Prepare ipvlan_process_v4_outbound() to future .flowi4_tos conversion. ipvlan: ensure network headers are in skb linear part net: cadence: macb: Synchronize stats calculations net: dsa: rtl8366rb: Fix compilation problem ASoC: es8328: fix route from DAC to output ASoC: fsl: Rename stream name of SAI DAI driver ipvs: Always clear ipvs_property flag in skb_scrub_packet() drm/xe/oa: Signal output fences drm/xe/oa: Move functions up so they can be reused for config ioctl drm/xe/oa: Add syncs support to OA config ioctl drm/xe/oa: Allow only certain property changes from config drm/xe/oa: Allow oa_exponent value of 0 firmware: cs_dsp: Remove async regmap writes ASoC: cs35l56: Prevent races when soft-resetting using SPI control ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15 net: ethernet: ti: am65-cpsw: select PAGE_POOL tcp: devmem: don't write truncated dmabuf CMSGs to userspace ice: add E830 HW VF mailbox message limit support ice: Fix deinitializing VF in error path ice: Avoid setting default Rx VSI twice in switchdev setup tcp: Defer ts_recent changes until req is owned net: Clear old fragment checksum value in napi_reuse_skb net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination. net/mlx5: IRQ, Fix null string in debug print net: ipv6: fix dst ref loop on input in seg6 lwt net: ipv6: fix dst ref loop on input in rpl lwt selftests: drv-net: Check if combined-count exists idpf: fix checksums set in idpf_rx_rsc() net: ti: icss-iep: Reject perout generation request thermal: gov_power_allocator: Fix incorrect calculation in divvy_up_power() perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list uprobes: Reject the shared zeropage in uprobe_write_opcode() thermal: of: Simplify thermal_of_should_bind with scoped for each OF child thermal/of: Fix cdev lookup in thermal_of_should_bind() thermal: core: Move lists of thermal instances to trip descriptors thermal: gov_power_allocator: Update total_weight on bind and cdev updates io_uring/net: save msg_control for compat unreachable: Unify objtool: Remove annotate_{,un}reachable() objtool: Fix C jump table annotations for Clang x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems phy: rockchip: fix Kconfig dependency more phy: rockchip: naneng-combphy: compatible reset with old DT riscv: KVM: Fix hart suspend status check riscv: KVM: Fix hart suspend_type use riscv: KVM: Fix SBI IPI error generation riscv: KVM: Fix SBI TIME error generation tracing: Fix bad hist from corrupting named_triggers list ftrace: Avoid potential division by zero in function_stat_show() ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2 ALSA: hda/realtek: Fix microphone regression on ASUS N705UD KVM: arm64: Ensure a VMID is allocated before programming VTTBR_EL2 perf/core: Add RCU read lock protection to perf_iterate_ctx() perf/x86: Fix low freqency setting issue perf/core: Fix low freq setting via IOC_PERIOD drm/xe/regs: remove a duplicate definition for RING_CTL_SIZE(size) drm/xe/userptr: restore invalidation list on error drm/xe/userptr: fix EFAULT handling drm/amdkfd: Preserve cp_hqd_pq_control on update_mqd drm/amdgpu: disable BAR resize on Dell G5 SE drm/amdgpu: init return value in amdgpu_ttm_clear_buffer drm/amd/display: Disable PSR-SU on eDP panels drm/amd/display: add a quirk to enable eDP0 on DP1 drm/amd/display: Fix HPD after gpu reset arm64/mm: Fix Boot panic on Ampere Altra block: Remove zone write plugs when handling native zone append writes i2c: npcm: disable interrupt enable bit before devm_request_irq i2c: ls2x: Fix frequency division register access usbnet: gl620a: fix endpoint checking in genelink_bind() net: stmmac: dwmac-loongson: Add fix_soc_reset() callback net: phy: qcom: qca807x fix condition for DAC_DSP_BIAS_CURRENT net: enetc: fix the off-by-one issue in enetc_map_tx_buffs() net: enetc: keep track of correct Tx BD count in enetc_map_tx_tso_buffs() net: enetc: VFs do not support HWTSTAMP_TX_ONESTEP_SYNC net: enetc: update UDP checksum when updating originTimestamp field net: enetc: correct the xdp_tx statistics net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs() phy: tegra: xusb: reset VBUS & ID OVERRIDE phy: exynos5-usbdrd: fix MPLL_MULTIPLIER and SSC_REFCLKSEL masks in refclk phy: exynos5-usbdrd: gs101: ensure power is gated to SS phy in phy_exit() iommu/vt-d: Remove device comparison in context_setup_pass_through_cb iommu/vt-d: Fix suspicious RCU usage intel_idle: Handle older CPUs, which stop the TSC in deeper C states, correctly mptcp: always handle address removal under msk socket lock mptcp: reset when MPTCP opts are dropped after join selftests/landlock: Test that MPTCP actions are not restricted vmlinux.lds: Ensure that const vars with relocations are mapped R/O rcuref: Plug slowpath race in rcuref_put() sched/core: Prevent rescheduling when interrupts are disabled sched_ext: Fix pick_task_scx() picking non-queued tasks when it's called without balance() selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP dm-integrity: Avoid divide by zero in table status in Inline mode dm vdo: add missing spin_lock_init ima: Reset IMA_NONACTION_RULE_FLAGS after post_setattr scsi: ufs: core: bsg: Fix crash when arpmb command fails rseq/selftests: Fix riscv rseq_offset_deref_addv inline asm riscv/futex: sign extend compare value in atomic cmpxchg riscv: signal: fix signal frame size riscv: cacheinfo: Use of_property_present() for non-boolean properties riscv: signal: fix signal_minsigstksz riscv: cpufeature: use bitmap_equal() instead of memcmp() efi: Don't map the entire mokvar table to determine its size amdgpu/pm/legacy: fix suspend/resume issues x86/microcode/AMD: Return bool from find_blobs_in_containers() x86/microcode/AMD: Have __apply_microcode_amd() return bool x86/microcode/AMD: Remove ugly linebreak in __verify_patch_section() signature x86/microcode/AMD: Remove unused save_microcode_in_initrd_amd() declarations x86/microcode/AMD: Merge early_apply_microcode() into its single callsite x86/microcode/AMD: Get rid of the _load_microcode_amd() forward declaration x86/microcode/AMD: Add get_patch_level() x86/microcode/AMD: Load only SHA256-checksummed patches thermal: gov_power_allocator: Add missing NULL pointer check Linux 6.12.18 Change-Id: Id06a9c751e3315bfd1a6e642b2c0f276edb46319 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2025-03-10 13:05:41 +00:00
Damien Le Moal	2f572c42bb	block: Remove zone write plugs when handling native zone append writes commit a6aa36e957a1bfb5341986dec32d013d23228fe1 upstream. For devices that natively support zone append operations, REQ_OP_ZONE_APPEND BIOs are not processed through zone write plugging and are immediately issued to the zoned device. This means that there is no write pointer offset tracking done for these operations and that a zone write plug is not necessary. However, when receiving a zone append BIO, we may already have a zone write plug for the target zone if that zone was previously partially written using regular write operations. In such case, since the write pointer offset of the zone write plug is not incremented by the amount of sectors appended to the zone, 2 issues arise: 1) we risk leaving the plug in the disk hash table if the zone is fully written using zone append or regular write operations, because the write pointer offset will never reach the "zone full" state. 2) Regular write operations that are issued after zone append operations will always be failed by blk_zone_wplug_prepare_bio() as the write pointer alignment check will fail, even if the user correctly accounted for the zone append operations and issued the regular writes with a correct sector. Avoid these issues by immediately removing the zone write plug of zones that are the target of zone append operations when blk_zone_plug_bio() is called. The new function blk_zone_wplug_handle_native_zone_append() implements this for devices that natively support zone append. The removal of the zone write plug using disk_remove_zone_wplug() requires aborting all plugged regular write using disk_zone_wplug_abort() as otherwise the plugged write BIOs would never be executed (with the plug removed, the completion path will never see again the zone write plug as disk_get_zone_wplug() will return NULL). Rate-limited warnings are added to blk_zone_wplug_handle_native_zone_append() and to disk_zone_wplug_abort() to signal this. Since blk_zone_wplug_handle_native_zone_append() is called in the hot path for operations that will not be plugged, disk_get_zone_wplug() is optimized under the assumption that a user issuing zone append operations is not at the same time issuing regular writes and that there are no hashed zone write plugs. The struct gendisk atomic counter nr_zone_wplugs is added to check this, with this counter incremented in disk_insert_zone_wplug() and decremented in disk_remove_zone_wplug(). To be consistent with this fix, we do not need to fill the zone write plug hash table with zone write plugs for zones that are partially written for a device that supports native zone append operations. So modify blk_revalidate_seq_zone() to return early to avoid allocating and inserting a zone write plug for partially written sequential zones if the device natively supports zone append. Reported-by: Jorgen Hansen <Jorgen.Hansen@wdc.com> Fixes: `9b1ce7f0c6` ("block: Implement zone append emulation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Jorgen Hansen <Jorgen.Hansen@wdc.com> Link: https://lore.kernel.org/r/20250214041434.82564-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-07 18:25:40 +01:00
Sandeep Dhavale	6e8ff6954a	ANDROID: block: Support configuring limits below the page size Allow block drivers to configure the following: * Maximum number of hardware sectors values smaller than PAGE_SIZE >> SECTOR_SHIFT. For PAGE_SIZE = 4096 this means that values below 8 become supported. * A maximum segment size below the page size. This is most useful for page sizes above 4096 bytes. The blk_sub_page_segments static branch will be used in later patches to prevent that performance of block drivers that support segments >= PAGE_SIZE and max_hw_sectors >= PAGE_SIZE >> SECTOR_SHIFT would be affected. This patch may change the behavior of existing block drivers from not working into working. An attempt to configure a limit below what is supported by the block layer causes the block layer to select a larger value. If that value is not supported by the block driver, this may cause other data to be transferred than requested, a kernel crash or other undesirable behavior. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sandeep Dhavale <dhavale@google.com> Bug: 346870006 Link: https://lore.kernel.org/all/20230612203314.17820-4-bvanassche@acm.org/ [dhavale: current patch is based on the FROMLIST patch send to kernel mailing list. As the queue config functions are removed, the logic has been adapted in analogus function blk_validate_limits(). Block maintainers have rejected all our previous attempt to land patches which support sub page segment size. But we have decided that these patches are necessary to have 16KB page size kernel work with hardware which supports maximum 4KB segment size. ] Change-Id: I3faa20be1e83d1501d0f25f549b40301443d0df4	2025-01-24 13:45:40 -08:00
Weichao Guo	a2d282588e	ANDROID: GKI: add ANDROID_OEM_DATA() in struct request_queue Add ANDROID_OEM_DATA(1) in struct request_queue to support more request queue's status for extend copy feature. Bug: 283021230 Change-Id: Ic946fd08dcebed708f03749557d9289ddb3696b8 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Weichao Guo <guoweichao@oppo.corp-partner.google.com> (cherry picked from commit d7b3d8d1e527dc41fe8faeb68cef879290db379c) (cherry picked from commit b169eba61f7301995db4e5753b4bd9806c0afab5)	2025-01-14 10:39:13 -08:00
Damien Le Moal	7fa80134cf	block: Prevent potential deadlocks in zone write plug error recovery commit fe0418eb9bd69a19a948b297c8de815e05f3cde1 upstream. Zone write plugging for handling writes to zones of a zoned block device always execute a zone report whenever a write BIO to a zone fails. The intent of this is to ensure that the tracking of a zone write pointer is always correct to ensure that the alignment to a zone write pointer of write BIOs can be checked on submission and that we can always correctly emulate zone append operations using regular write BIOs. However, this error recovery scheme introduces a potential deadlock if a device queue freeze is initiated while BIOs are still plugged in a zone write plug and one of these write operation fails. In such case, the disk zone write plug error recovery work is scheduled and executes a report zone. This in turn can result in a request allocation in the underlying driver to issue the report zones command to the device. But with the device queue freeze already started, this allocation will block, preventing the report zone execution and the continuation of the processing of the plugged BIOs. As plugged BIOs hold a queue usage reference, the queue freeze itself will never complete, resulting in a deadlock. Avoid this problem by completely removing from the zone write plugging code the use of report zones operations after a failed write operation, instead relying on the device user to either execute a report zones, reset the zone, finish the zone, or give up writing to the device (which is a fairly common pattern for file systems which degrade to read-only after write failures). This is not an unreasonnable requirement as all well-behaved applications, FSes and device mapper already use report zones to recover from write errors whenever possible by comparing the current position of a zone write pointer with what their assumption about the position is. The changes to remove the automatic error recovery are as follows: - Completely remove the error recovery work and its associated resources (zone write plug list head, disk error list, and disk zone_wplugs_work work struct). This also removes the functions disk_zone_wplug_set_error() and disk_zone_wplug_clear_error(). - Change the BLK_ZONE_WPLUG_ERROR zone write plug flag into BLK_ZONE_WPLUG_NEED_WP_UPDATE. This new flag is set for a zone write plug whenever a write opration targetting the zone of the zone write plug fails. This flag indicates that the zone write pointer offset is not reliable and that it must be updated when the next report zone, reset zone, finish zone or disk revalidation is executed. - Modify blk_zone_write_plug_bio_endio() to set the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag for the target zone of a failed write BIO. - Modify the function disk_zone_wplug_set_wp_offset() to clear this new flag, thus implementing recovery of a correct write pointer offset with the reset (all) zone and finish zone operations. - Modify blkdev_report_zones() to always use the disk_report_zones_cb() callback so that disk_zone_wplug_sync_wp_offset() can be called for any zone marked with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag. This implements recovery of a correct write pointer offset for zone write plugs marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE and within the range of the report zones operation executed by the user. - Modify blk_revalidate_seq_zone() to call disk_zone_wplug_sync_wp_offset() for all sequential write required zones when a zoned block device is revalidated, thus always resolving any inconsistency between the write pointer offset of zone write plugs and the actual write pointer position of sequential zones. Fixes: `dd291d77cc` ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241209122357.47838-5-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:13:00 +01:00
Damien Le Moal	a4b656ea1b	dm: Fix dm-zoned-reclaim zone write pointer alignment commit b76b840fd93374240b59825f1ab8e2f5c9907acb upstream. The zone reclaim processing of the dm-zoned device mapper uses blkdev_issue_zeroout() to align the write pointer of a zone being used for reclaiming another zone, to write the valid data blocks from the zone being reclaimed at the same position relative to the zone start in the reclaim target zone. The first call to blkdev_issue_zeroout() will try to use hardware offload using a REQ_OP_WRITE_ZEROES operation if the device reports a non-zero max_write_zeroes_sectors queue limit. If this operation fails because of the lack of hardware support, blkdev_issue_zeroout() falls back to using a regular write operation with the zero-page as buffer. Currently, such REQ_OP_WRITE_ZEROES failure is automatically handled by the block layer zone write plugging code which will execute a report zones operation to ensure that the write pointer of the target zone of the failed operation has not changed and to "rewind" the zone write pointer offset of the target zone as it was advanced when the write zero operation was submitted. So the REQ_OP_WRITE_ZEROES failure does not cause any issue and blkdev_issue_zeroout() works as expected. However, since the automatic recovery of zone write pointers by the zone write plugging code can potentially cause deadlocks with queue freeze operations, a different recovery must be implemented in preparation for the removal of zone write plugging report zones based recovery. Do this by introducing the new function blk_zone_issue_zeroout(). This function first calls blkdev_issue_zeroout() with the flag BLKDEV_ZERO_NOFALLBACK to intercept failures on the first execution which attempt to use the device hardware offload with the REQ_OP_WRITE_ZEROES operation. If this attempt fails, a report zone operation is issued to restore the zone write pointer offset of the target zone to the correct position and blkdev_issue_zeroout() is called again without the BLKDEV_ZERO_NOFALLBACK flag. The report zones operation performing this recovery is implemented using the helper function disk_zone_sync_wp_offset() which calls the gendisk report_zones file operation with the callback disk_report_zones_cb(). This callback updates the target write pointer offset of the target zone using the new function disk_zone_wplug_sync_wp_offset(). dmz_reclaim_align_wp() is modified to change its call to blkdev_issue_zeroout() to a call to blk_zone_issue_zeroout() without any other change needed as the two functions are functionnally equivalent. Fixes: `dd291d77cc` ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241209122357.47838-4-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:13:00 +01:00
Damien Le Moal	493326c4f1	block: RCU protect disk->conv_zones_bitmap [ Upstream commit d7cb6d7414ea1b33536fa6d11805cb8dceec1f97 ] Ensure that a disk revalidation changing the conventional zones bitmap of a disk does not cause invalid memory references when using the disk_zone_is_conv() helper by RCU protecting the disk->conv_zones_bitmap pointer. disk_zone_is_conv() is modified to operate under the RCU read lock and the function disk_set_conv_zones_bitmap() is added to update a disk conv_zones_bitmap pointer using rcu_replace_pointer() with the disk zone_wplugs_lock spinlock held. disk_free_zone_resources() is modified to call disk_update_zone_resources() with a NULL bitmap pointer to free the disk conv_zones_bitmap. disk_set_conv_zones_bitmap() is also used in disk_update_zone_resources() to set the new (revalidated) bitmap and free the old one. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241107064300.227731-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-14 20:03:35 +01:00
Ming Lei	b12cfcae8a	block: always verify unfreeze lock on the owner task commit 6a78699838a0ddeed3620ddf50c1521f1fe1e811 upstream. commit f1be1788a32e ("block: model freeze & enter queue as lock for supporting lockdep") tries to apply lockdep for verifying freeze & unfreeze. However, the verification is only done the outmost freeze and unfreeze. This way is actually not correct because q->mq_freeze_depth still may drop to zero on other task instead of the freeze owner task. Fix this issue by always verifying the last unfreeze lock on the owner task context, and make sure both the outmost freeze & unfreeze are verified in the current task. Fixes: f1be1788a32e ("block: model freeze & enter queue as lock for supporting lockdep") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241031133723.303835-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-05 14:03:10 +01:00
Christoph Hellwig	5e15cc7a1d	block: return unsigned int from bdev_io_min [ Upstream commit 46fd48ab3ea3eb3bb215684bd66ea3d260b091a9 ] The underlying limit is defined as an unsigned int, so return that from bdev_io_min as well. Fixes: `ac481c20ef` ("block: Topology ioctls") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241119072602.1059488-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-05 14:03:06 +01:00
Ming Lei	a6fc2ba1c7	block: model freeze & enter queue as lock for supporting lockdep [ Upstream commit f1be1788a32e8fa63416ad4518bbd1a85a825c9d ] Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue and blk_enter_queue(). Turns out the two are just like acquiring read/write lock, so model them as read/write lock for supporting lockdep: 1) model q->q_usage_counter as two locks(io and queue lock) - queue lock covers sync with blk_enter_queue() - io lock covers sync with bio_enter_queue() 2) make the lockdep class/key as per-queue: - different subsystem has very different lock use pattern, shared lock class causes false positive easily - freeze_queue degrades to no lock in case that disk state becomes DEAD because bio_enter_queue() won't be blocked any more - freeze_queue degrades to no lock in case that request queue becomes dying because blk_enter_queue() won't be blocked any more 3) model blk_mq_freeze_queue() as acquire_exclusive & try_lock - it is exclusive lock, so dependency with blk_enter_queue() is covered - it is trylock because blk_mq_freeze_queue() are allowed to run concurrently 4) model blk_enter_queue() & bio_enter_queue() as acquire_read() - nested blk_enter_queue() are allowed - dependency with blk_mq_freeze_queue() is covered - blk_queue_exit() is often called from other contexts(such as irq), and it can't be annotated as lock_release(), so simply do it in blk_enter_queue(), this way still covered cases as many as possible With lockdep support, such kind of reports may be reported asap and needn't wait until the real deadlock is triggered. For example, lockdep report can be triggered in the report[3] with this patch applied. [1] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler' https://bugzilla.kernel.org/show_bug.cgi?id=219166 [2] del_gendisk() vs blk_queue_enter() race condition https://lore.kernel.org/linux-block/20241003085610.GK11458@google.com/ [3] queue_freeze & queue_enter deadlock in scsi https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241025003722.3630252-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 3802f73bd807 ("block: fix uaf for flush rq while iterating tags") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-05 14:03:05 +01:00
Dr. David Alan Gilbert	9ba5dcc722	block: Remove unused blk_limits_io_{min,opt} blk_limits_io_min and blk_limits_io_opt are unused since the recent commit `0a94a469a4` ("dm: stop using blk_limits_io_{min,opt}") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20240920004817.676216-1-linux@treblig.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-09-20 00:19:48 -06:00
Jens Axboe	42b16d3ac3	Merge tag 'v6.11' into for-6.12/block Merge in 6.11 final to get the fix for preventing deadlocks on an elevator switch, as there's a fixup for that patch. * tag 'v6.11': (1788 commits) Linux 6.11 Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop" pinctrl: pinctrl-cy8c95x0: Fix regcache cifs: Fix signature miscalculation mm: avoid leaving partial pfn mappings around in error case drm/xe/client: add missing bo locking in show_meminfo() drm/xe/client: fix deadlock in show_meminfo() drm/xe/oa: Enable Xe2+ PES disaggregation drm/xe/display: fix compat IS_DISPLAY_STEP() range end drm/xe: Fix access_ok check in user_fence_create drm/xe: Fix possible UAF in guc_exec_queue_process_msg drm/xe: Remove fence check from send_tlb_invalidation drm/xe/gt: Remove double include net: netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init() PCI: Fix potential deadlock in pcim_intx() workqueue: Clear worker->pool in the worker thread context net: tighten bad gso csum offset check in virtio_net_hdr netlink: specs: mptcp: fix port endianness net: dpaa: Pad packets to ETH_ZLEN mptcp: pm: Fix uaf in __timer_delete_sync ...	2024-09-17 08:32:53 -06:00
Christoph Hellwig	379b122a3e	block: constify the lim argument to queue_limits_max_zone_append_sectors queue_limits_max_zone_append_sectors doesn't change the lim argument, so mark it as const. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Link: https://lore.kernel.org/r/20240826173820.1690925-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-08-29 04:32:32 -06:00
John Garry	81475beb1b	block: Drop NULL check in bdev_write_zeroes_sectors() Function bdev_get_queue() must not return NULL, so drop the check in bdev_write_zeroes_sectors(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20240815163228.216051-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-08-19 09:48:59 -06:00
Linus Torvalds	7d080fa867	Merge tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux Pull more block updates from Jens Axboe: - MD fixes via Song: - md-cluster fixes (Heming Zhao) - raid1 fix (Mateusz Jończyk) - s390/dasd module description (Jeff) - Series cleaning up and hardening the blk-mq debugfs flag handling (John, Christoph) - blk-cgroup cleanup (Xiu) - Error polled IO attempts if backend doesn't support it (hexue) - Fix for an sbitmap hang (Yang) * tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux: (23 commits) blk-cgroup: move congestion_count to struct blkcg sbitmap: fix io hung due to race on sbitmap_word::cleared block: avoid polling configuration errors block: Catch possible entries missing from rqf_name[] block: Simplify definition of RQF_NAME() block: Use enum to define RQF_x bit indexes block: Catch possible entries missing from cmd_flag_name[] block: Catch possible entries missing from alloc_policy_name[] block: Catch possible entries missing from hctx_flag_name[] block: Catch possible entries missing from hctx_state_name[] block: Catch possible entries missing from blk_queue_flag_name[] block: Make QUEUE_FLAG_x as an enum block: Relocate BLK_MQ_MAX_DEPTH block: Relocate BLK_MQ_CPU_WORK_BATCH block: remove QUEUE_FLAG_STOPPED block: Add missing entry to hctx_flag_name[] block: Add zone write plugging entry to rqf_name[] block: Add missing entries from cmd_flag_name[] s390/dasd: fix error checks in dasd_copy_pair_store() s390/dasd: add missing MODULE_DESCRIPTION() macros ...	2024-07-22 11:32:05 -07:00
John Garry	55177adf18	block: Make QUEUE_FLAG_x as an enum This will allow us better keep in sync with blk_queue_flag_name[]. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-8-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-19 09:32:49 -06:00
Christoph Hellwig	c8f51feee1	block: remove QUEUE_FLAG_STOPPED QUEUE_FLAG_STOPPED is entirely unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-5-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-19 09:32:48 -06:00
Linus Torvalds	3e78198862	Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - NVMe updates via Keith: - Device initialization memory leak fixes (Keith) - More constants defined (Weiwen) - Target debugfs support (Hannes) - PCIe subsystem reset enhancements (Keith) - Queue-depth multipath policy (Redhat and PureStorage) - Implement get_unique_id (Christoph) - Authentication error fixes (Gaosheng) - MD updates via Song - sync_action fix and refactoring (Yu Kuai) - Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li) - Fix loop detach/open race (Gulam) - Fix lower control limit for blk-throttle (Yu) - Add module descriptions to various drivers (Jeff) - Add support for atomic writes for block devices, and statx reporting for same. Includes SCSI and NVMe (John, Prasad, Alan) - Add IO priority information to block trace points (Dongliang) - Various zone improvements and tweaks (Damien) - mq-deadline tag reservation improvements (Bart) - Ignore direct reclaim swap writes in writeback throttling (Baokun) - Block integrity improvements and fixes (Anuj) - Add basic support for rust based block drivers. Has a dummy null_blk variant for now (Andreas) - Series converting driver settings to queue limits, and cleanups and fixes related to that (Christoph) - Cleanup for poking too deeply into the bvec internals, in preparation for DMA mapping API changes (Christoph) - Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas, Ming, Zhu, Damien, Christophe, Chaitanya) * tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits) floppy: add missing MODULE_DESCRIPTION() macro loop: add missing MODULE_DESCRIPTION() macro ublk_drv: add missing MODULE_DESCRIPTION() macro xen/blkback: add missing MODULE_DESCRIPTION() macro block/rnbd: Constify struct kobj_type block: take offset into account in blk_bvec_map_sg again block: fix get_max_segment_size() warning loop: Don't bother validating blocksize virtio_blk: Don't bother validating blocksize null_blk: Don't bother validating blocksize block: Validate logical block size in blk_validate_limits() virtio_blk: Fix default logical block size fallback nvmet-auth: fix nvmet_auth hash error handling nvme: implement ->get_unique_id block: pass a phys_addr_t to get_max_segment_size block: add a bvec_phys helper blk-lib: check for kill signal in ioctl BLKZEROOUT block: limit the Write Zeroes to manually writing zeroes fallback block: refacto blkdev_issue_zeroout block: move read-only and supported checks into (__)blkdev_issue_zeroout ...	2024-07-15 14:20:22 -07:00
John Garry	fe3d508ba9	block: Validate logical block size in blk_validate_limits() Some drivers validate that their own logical block size. It is no harm to always do this, so validate in blk_validate_limits(). This allows us to remove the validation in most of those drivers. Add a comment to blk_validate_block_size() to inform users that self- validation of LBS is usually unnecessary. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240708091651.177447-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-09 00:00:17 -06:00
Christoph Hellwig	bf86bcdb40	blk-lib: check for kill signal in ioctl BLKZEROOUT Zeroout can access a significant capacity and take longer than the user expected. A user may change their mind about wanting to run that command and attempt to kill the process and do something else with their device. But since the task is uninterruptable, they have to wait for it to finish, which could be many hours. Add a new BLKDEV_ZERO_KILLABLE flag for blkdev_issue_zeroout that checks for a fatal signal at each iteration so the user doesn't have to wait for their regretted operation to complete naturally. Heavily based on an earlier patch from Keith Busch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:53:15 -06:00
Damien Le Moal	f2a7bea237	block: Remove REQ_OP_ZONE_RESET_ALL emulation Now that device mapper can handle resetting all zones of a mapped zoned device using REQ_OP_ZONE_RESET_ALL, all zoned block device drivers support this operation. With this, the request queue feature BLK_FEAT_ZONE_RESETALL is not necessary and the emulation code in blk-zone.c can be removed. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240704052816.623865-5-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:42:04 -06:00
Christoph Hellwig	5476394aa9	block: simplify queue_logical_block_size queue_logical_block_size is never called with a 0 queue, and the logical_block_size field in queue_limits is always initialized for a live queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240627111407.476276-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 15:06:16 -06:00
John Garry	63db4a1f79	block: Delete blk_queue_flag_test_and_set() Since commit `70200574cc` ("block: remove QUEUE_FLAG_DISCARD"), blk_queue_flag_test_and_set() has not been used, so delete it. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240627160735.842189-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-27 12:43:27 -06:00
Christoph Hellwig	e94b45d08b	block: move dma_pad_mask into queue_limits dma_pad_mask is a queue_limits by all ways of looking at it, so move it there and set it through the atomic queue limits APIs. Add a little helper that takes the alignment and pad into account to simplify the code that is touched a bit. Note that there never was any need for the > check in blk_queue_update_dma_pad, this probably was just copy and paste from dma_update_dma_alignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	abfc9d8109	block: remove the fallback case in queue_dma_alignment Now that all updates go through blk_validate_limits the default of 511 is set at initialization time. Also remove the unused NULL check as calling this helper on a NULL queue can't happen (and doesn't make much sense to start with). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	73781b3b81	block: remove disk_update_readahead Mark blk_apply_bdi_limits non-static and open code disk_update_readahead in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	fcf865e357	block: convert features and flags to __bitwise types ... and let sparse help us catch mismatches or abuses. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	ec9b1cf0b0	block: rename BLK_FEAT_MISALIGNED This is a flag for ->flags and not a feature for ->features. And fix the one place that actually incorrectly cleared it from ->features. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	44348870de	block: fix the blk_queue_nonrot polarity Take care of the inverse polarity of the BLK_FEAT_ROTATIONAL flag vs the old nonrot helper. Fixes: `bd4a633b6f` ("block: move the nonrot flag to queue_limits") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240624173835.76753-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-24 13:06:12 -06:00
Damien Le Moal	caaf7101c0	block: Cleanup block device zone helpers There is no need to conditionally define on CONFIG_BLK_DEV_ZONED the inline helper functions bdev_nr_zones(), bdev_max_open_zones(), bdev_max_active_zones() and disk_zone_no() as these function will return the correct valu in all cases (zoned device or not, including when CONFIG_BLK_DEV_ZONED is not set). Furthermore, disk_nr_zones() definition can be simplified as disk->nr_zones is always 0 for regular block devices. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240621031506.759397-4-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-21 08:26:36 -06:00
Damien Le Moal	b6cfe2287d	block: Define bdev_nr_zones() as an inline function There is no need for bdev_nr_zones() to be an exported function calculating the number of zones of a block device. Instead, given that all callers use this helper with a fully initialized block device that has a gendisk, we can redefine this function as an inline helper in blkdev.h. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240621031506.759397-3-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-21 08:26:35 -06:00
Prasad Singamsetty	9abcfbd235	block: Add atomic write support for statx Extend statx system call to return additional info for atomic write support support if the specified file is a block device. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Prasad Singamsetty <prasad.singamsetty@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-7-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 15:19:17 -06:00
John Garry	9da3d1e912	block: Add core atomic write support Add atomic write support, as follows: - add helper functions to get request_queue atomic write limits - report request_queue atomic write support limits to sysfs and update Doc - support to safely merge atomic writes - deal with splitting atomic writes - misc helper functions - add a per-request atomic write flag New request_queue limits are added, as follows: - atomic_write_hw_max is set by the block driver and is the maximum length of an atomic write which the device may support. It is not necessarily a power-of-2. - atomic_write_max_sectors is derived from atomic_write_hw_max_sectors and max_hw_sectors. It is always a power-of-2. Atomic writes may be merged, and atomic_write_max_sectors would be the limit on a merged atomic write request size. This value is not capped at max_sectors, as the value in max_sectors can be controlled from userspace, and it would only cause trouble if userspace could limit atomic_write_unit_max_bytes and the other atomic write limits. - atomic_write_hw_unit_{min,max} are set by the block driver and are the min/max length of an atomic write unit which the device may support. They both must be a power-of-2. Typically atomic_write_hw_unit_max will hold the same value as atomic_write_hw_max. - atomic_write_unit_{min,max} are derived from atomic_write_hw_unit_{min,max}, max_hw_sectors, and block core limits. Both min and max values must be a power-of-2. - atomic_write_hw_boundary is set by the block driver. If non-zero, it indicates an LBA space boundary at which an atomic write straddles no longer is atomically executed by the disk. The value must be a power-of-2. Note that it would be acceptable to enforce a rule that atomic_write_hw_boundary_sectors is a multiple of atomic_write_hw_unit_max, but the resultant code would be more complicated. All atomic writes limits are by default set 0 to indicate no atomic write support. Even though it is assumed by Linux that a logical block can always be atomically written, we ignore this as it is not of particular interest. Stacked devices are just not supported either for now. An atomic write must always be submitted to the block driver as part of a single request. As such, only a single BIO must be submitted to the block layer for an atomic write. When a single atomic write BIO is submitted, it cannot be split. As such, atomic_write_unit_{max, min}_bytes are limited by the maximum guaranteed BIO size which will not be required to be split. This max size is calculated by request_queue max segments and the number of bvecs a BIO can fit, BIO_MAX_VECS. Currently we rely on userspace issuing a write with iovcnt=1 for pwritev2() - as such, we can rely on each segment containing PAGE_SIZE of data, apart from the first+last, which each can fit logical block size of data. The first+last will be LBS length/aligned as we rely on direct IO alignment rules also. New sysfs files are added to report the following atomic write limits: - atomic_write_unit_max_bytes - same as atomic_write_unit_max_sectors in bytes - atomic_write_unit_min_bytes - same as atomic_write_unit_min_sectors in bytes - atomic_write_boundary_bytes - same as atomic_write_hw_boundary_sectors in bytes - atomic_write_max_bytes - same as atomic_write_max_sectors in bytes Atomic writes may only be merged with other atomic writes and only under the following conditions: - total resultant request length <= atomic_write_max_bytes - the merged write does not straddle a boundary Helper function bdev_can_atomic_write() is added to indicate whether atomic writes may be issued to a bdev. If a bdev is a partition, the partition start must be aligned with both atomic_write_unit_min_sectors and atomic_write_hw_boundary_sectors. FSes will rely on the block layer to validate that an atomic write BIO submitted will be of valid size, so add blk_validate_atomic_write_op_size() for this purpose. Userspace expects an atomic write which is of invalid size to be rejected with -EINVAL, so add BLK_STS_INVAL for this. Also use BLK_STS_INVAL for when a BIO needs to be split, as this should mean an invalid size BIO. Flag REQ_ATOMIC is used for indicating an atomic write. Co-developed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-6-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 15:19:17 -06:00
John Garry	f70167a7a6	block: Generalize chunk_sectors support as boundary support The purpose of the chunk_sectors limit is to ensure that a mergeble request fits within the boundary of the chunck_sector value. Such a feature will be useful for other request_queue boundary limits, so generalize the chunk_sectors merge code. This idea was proposed by Hannes Reinecke. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 15:19:17 -06:00
Jens Axboe	e821bcecdf	Merge branch 'for-6.11/block-limits' into for-6.11/block Merge in queue limits cleanups. * for-6.11/block-limits: block: move the raid_partial_stripes_expensive flag into the features field block: remove the discard_alignment flag block: move the misaligned flag into the features field block: renumber and rename the cache disabled flag block: fix spelling and grammar for in writeback_cache_control.rst block: remove the unused blk_bounce enum	2024-06-20 06:55:20 -06:00
Christoph Hellwig	7d4dec525f	block: move the raid_partial_stripes_expensive flag into the features field Move the raid_partial_stripes_expensive flags into the features field to reclaim a little bit of space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240619154623.450048-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 06:53:15 -06:00
Christoph Hellwig	4cac3d3a71	block: remove the discard_alignment flag queue_limits.discard_alignment is never read except in the places where it is stacked into another limit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240619154623.450048-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 06:53:14 -06:00

1 2 3 4 5 ...

1285 Commits