From e2768b798a197318736f00c506633cb78ff77012 Mon Sep 17 00:00:00 2001 From: Ryan Roberts Date: Mon, 27 Nov 2023 11:17:26 +0000 Subject: [PATCH 01/87] arm64/mm: Modify range-based tlbi to decrement scale In preparation for adding support for LPA2 to the tlb invalidation routines, modify the algorithm used by range-based tlbi to start at the highest 'scale' and decrement instead of starting at the lowest 'scale' and incrementing. This new approach makes it possible to maintain 64K alignment as we work through the range, until the last op (at scale=0). This is required when LPA2 is enabled. (This part will be added in a subsequent commit). This change is separated into its own patch because it will also impact non-LPA2 systems, and I want to make it easy to bisect in case it leads to performance regression (see below for benchmarks that suggest this should not be a problem). The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in arm64") stated this as the reason for _incrementing_ scale: However, in most scenarios, the pages = 1 when flush_tlb_range() is called. Start from scale = 3 or other proper value (such as scale =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0 to maximum. But pages=1 is already special cased by the non-range invalidation path, which will take care of it the first time through the loop (both in the original commit and in my change), so I don't think switching to decrement scale should have any extra performance impact after all. Indeed benchmarking kernel compilation, a TLBI-heavy workload, suggests that this new approach actually _improves_ performance slightly (using a virtual machine on Apple M2): Table shows time to execute kernel compilation workload with 8 jobs, relative to baseline without this patch (more negative number is bigger speedup). Repeated 9 times across 3 system reboots: | counter | mean | stdev | |:----------|-----------:|----------:| | real-time | -0.6% | 0.0% | | kern-time | -1.6% | 0.5% | | user-time | -0.4% | 0.1% | Reviewed-by: Oliver Upton Signed-off-by: Ryan Roberts Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20231127111737.1897081-2-ryan.roberts@arm.com --- arch/arm64/include/asm/tlbflush.h | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bb2c2833a987..36acdb3d16a5 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -350,14 +350,14 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) * entries one by one at the granularity of 'stride'. If the TLB * range ops are supported, then: * - * 1. If 'pages' is odd, flush the first page through non-range - * operations; + * 1. The minimum range granularity is decided by 'scale', so multiple range + * TLBI operations may be required. Start from scale = 3, flush the largest + * possible number of pages ((num+1)*2^(5*scale+1)) that fit into the + * requested range, then decrement scale and continue until one or zero pages + * are left. * - * 2. For remaining pages: the minimum range granularity is decided - * by 'scale', so multiple range TLBI operations may be required. - * Start from scale = 0, flush the corresponding number of pages - * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it - * until no pages left. + * 2. If there is 1 page remaining, flush it through non-range operations. Range + * operations can only span an even number of pages. * * Note that certain ranges can be represented by either num = 31 and * scale or num = 0 and scale + 1. The loop below favours the latter @@ -367,12 +367,12 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) asid, tlb_level, tlbi_user) \ do { \ int num = 0; \ - int scale = 0; \ + int scale = 3; \ unsigned long addr; \ \ while (pages > 0) { \ if (!system_supports_tlb_range() || \ - pages % 2 == 1) { \ + pages == 1) { \ addr = __TLBI_VADDR(start, asid); \ __tlbi_level(op, addr, tlb_level); \ if (tlbi_user) \ @@ -392,7 +392,7 @@ do { \ start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \ pages -= __TLBI_RANGE_PAGES(num, scale); \ } \ - scale++; \ + scale--; \ } \ } while (0) From 936a4ec28141fa9369b6af4d6401f0be9f7e304c Mon Sep 17 00:00:00 2001 From: Ryan Roberts Date: Mon, 27 Nov 2023 11:17:27 +0000 Subject: [PATCH 02/87] arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs Add stub functions which is initially always return false. These provide the hooks that we need to update the range-based TLBI routines, whose operands are encoded differently depending on whether lpa2 is enabled or not. The kernel and kvm will enable the use of lpa2 asynchronously in future, and part of that enablement will involve fleshing out their respective hook to advertise when it is using lpa2. Since the kernel's decision to use lpa2 relies on more than just whether the HW supports the feature, it can't just use the same static key as kvm. This is another reason to use separate functions. lpa2_is_enabled() is already implemented as part of Ard's kernel lpa2 series. Since kvm will make its decision solely based on HW support, kvm_lpa2_is_enabled() will be defined as system_supports_lpa2() once kvm starts using lpa2. Reviewed-by: Oliver Upton Signed-off-by: Ryan Roberts Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20231127111737.1897081-3-ryan.roberts@arm.com --- arch/arm64/include/asm/kvm_pgtable.h | 2 ++ arch/arm64/include/asm/pgtable-prot.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index d3e354bb8351..10068500d601 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -25,6 +25,8 @@ #define KVM_PGTABLE_MIN_BLOCK_LEVEL 2U #endif +#define kvm_lpa2_is_enabled() false + static inline u64 kvm_get_parange(u64 mmfr0) { u64 parange = cpuid_feature_extract_unsigned_field(mmfr0, diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index e9624f6326dd..483dbfa39c4c 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -71,6 +71,8 @@ extern bool arm64_use_ng_mappings; #define PTE_MAYBE_NG (arm64_use_ng_mappings ? PTE_NG : 0) #define PMD_MAYBE_NG (arm64_use_ng_mappings ? PMD_SECT_NG : 0) +#define lpa2_is_enabled() false + /* * If we have userspace only BTI we don't want to mark kernel pages * guarded even if the system does support BTI. From c910f2b65518538b5072cb51760c8ef749e455d0 Mon Sep 17 00:00:00 2001 From: Ryan Roberts Date: Mon, 27 Nov 2023 11:17:28 +0000 Subject: [PATCH 03/87] arm64/mm: Update tlb invalidation routines for FEAT_LPA2 FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in the non-range tlbi instructions can now validly take a 0 value as a level hint for the 4KB granule (this is due to the extra level of translation) - previously TTL=0b0100 meant no hint and was treated as 0b0000. Secondly, The BADDR field of the range-based tlbi instructions is specified in 64KB units when LPA2 is in use (TCR.DS=1), whereas it is in page units otherwise. Changes are required for tlbi to continue to operate correctly when LPA2 is in use. Solve the first problem by always adding the level hint if the level is between [0, 3] (previously anything other than 0 was hinted, which breaks in the new level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to hint as the HW will fall back to non-hinted. While we are at it, we replace the notion of 0 being the non-hinted sentinel with a macro, TLBI_TTL_UNKNOWN. This means callers won't need updating if/when translation depth increases in future. The second issue is more complex: When LPA2 is in use, use the non-range tlbi instructions to forward align to a 64KB boundary first, then use range-based tlbi from there on, until we have either invalidated all pages or we have a single page remaining. If the latter, that is done with non-range tlbi. We determine whether LPA2 is in use based on lpa2_is_enabled() (for kernel calls) or kvm_lpa2_is_enabled() (for kvm calls). Reviewed-by: Catalin Marinas Reviewed-by: Oliver Upton Signed-off-by: Ryan Roberts Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20231127111737.1897081-4-ryan.roberts@arm.com --- arch/arm64/include/asm/tlb.h | 15 ++++-- arch/arm64/include/asm/tlbflush.h | 90 ++++++++++++++++++++----------- 2 files changed, 68 insertions(+), 37 deletions(-) diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h index 846c563689a8..0150deb332af 100644 --- a/arch/arm64/include/asm/tlb.h +++ b/arch/arm64/include/asm/tlb.h @@ -22,15 +22,15 @@ static void tlb_flush(struct mmu_gather *tlb); #include /* - * get the tlbi levels in arm64. Default value is 0 if more than one - * of cleared_* is set or neither is set. - * Arm64 doesn't support p4ds now. + * get the tlbi levels in arm64. Default value is TLBI_TTL_UNKNOWN if more than + * one of cleared_* is set or neither is set - this elides the level hinting to + * the hardware. */ static inline int tlb_get_level(struct mmu_gather *tlb) { /* The TTL field is only valid for the leaf entry. */ if (tlb->freed_tables) - return 0; + return TLBI_TTL_UNKNOWN; if (tlb->cleared_ptes && !(tlb->cleared_pmds || tlb->cleared_puds || @@ -47,7 +47,12 @@ static inline int tlb_get_level(struct mmu_gather *tlb) tlb->cleared_p4ds)) return 1; - return 0; + if (tlb->cleared_p4ds && !(tlb->cleared_ptes || + tlb->cleared_pmds || + tlb->cleared_puds)) + return 0; + + return TLBI_TTL_UNKNOWN; } static inline void tlb_flush(struct mmu_gather *tlb) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 36acdb3d16a5..1deb5d789c2e 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void) * When ARMv8.4-TTL exists, TLBI operations take an additional hint for * the level at which the invalidation must take place. If the level is * wrong, no invalidation may take place. In the case where the level - * cannot be easily determined, a 0 value for the level parameter will - * perform a non-hinted invalidation. + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform + * a non-hinted invalidation. Any provided level outside the hint range + * will also cause fall-back to non-hinted invalidation. * * For Stage-2 invalidation, use the level values provided to that effect * in asm/stage2_pgtable.h. */ #define TLBI_TTL_MASK GENMASK_ULL(47, 44) +#define TLBI_TTL_UNKNOWN INT_MAX + #define __tlbi_level(op, addr, level) do { \ u64 arg = addr; \ \ if (alternative_has_cap_unlikely(ARM64_HAS_ARMv8_4_TTL) && \ - level) { \ + level >= 0 && level <= 3) { \ u64 ttl = level & 3; \ ttl |= get_trans_granule() << 2; \ arg &= ~TLBI_TTL_MASK; \ @@ -122,28 +125,34 @@ static inline unsigned long get_trans_granule(void) } while (0) /* - * This macro creates a properly formatted VA operand for the TLB RANGE. - * The value bit assignments are: + * This macro creates a properly formatted VA operand for the TLB RANGE. The + * value bit assignments are: * * +----------+------+-------+-------+-------+----------------------+ * | ASID | TG | SCALE | NUM | TTL | BADDR | * +-----------------+-------+-------+-------+----------------------+ * |63 48|47 46|45 44|43 39|38 37|36 0| * - * The address range is determined by below formula: - * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE) + * The address range is determined by below formula: [BADDR, BADDR + (NUM + 1) * + * 2^(5*SCALE + 1) * PAGESIZE) + * + * Note that the first argument, baddr, is pre-shifted; If LPA2 is in use, BADDR + * holds addr[52:16]. Else BADDR holds page number. See for example ARM DDI + * 0487J.a section C5.5.60 "TLBI VAE1IS, TLBI VAE1ISNXS, TLB Invalidate by VA, + * EL1, Inner Shareable". * */ -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl) \ - ({ \ - unsigned long __ta = (addr) >> PAGE_SHIFT; \ - __ta &= GENMASK_ULL(36, 0); \ - __ta |= (unsigned long)(ttl) << 37; \ - __ta |= (unsigned long)(num) << 39; \ - __ta |= (unsigned long)(scale) << 44; \ - __ta |= get_trans_granule() << 46; \ - __ta |= (unsigned long)(asid) << 48; \ - __ta; \ +#define __TLBI_VADDR_RANGE(baddr, asid, scale, num, ttl) \ + ({ \ + unsigned long __ta = (baddr); \ + unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0; \ + __ta &= GENMASK_ULL(36, 0); \ + __ta |= __ttl << 37; \ + __ta |= (unsigned long)(num) << 39; \ + __ta |= (unsigned long)(scale) << 44; \ + __ta |= get_trans_granule() << 46; \ + __ta |= (unsigned long)(asid) << 48; \ + __ta; \ }) /* These macros are used by the TLBI RANGE feature. */ @@ -216,12 +225,16 @@ static inline unsigned long get_trans_granule(void) * CPUs, ensuring that any walk-cache entries associated with the * translation are also invalidated. * - * __flush_tlb_range(vma, start, end, stride, last_level) + * __flush_tlb_range(vma, start, end, stride, last_level, tlb_level) * Invalidate the virtual-address range '[start, end)' on all * CPUs for the user address space corresponding to 'vma->mm'. * The invalidation operations are issued at a granularity * determined by 'stride' and only affect any walk-cache entries - * if 'last_level' is equal to false. + * if 'last_level' is equal to false. tlb_level is the level at + * which the invalidation must take place. If the level is wrong, + * no invalidation may take place. In the case where the level + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will + * perform a non-hinted invalidation. * * * Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented @@ -345,34 +358,44 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) * @tlb_level: Translation Table level hint, if known * @tlbi_user: If 'true', call an additional __tlbi_user() * (typically for user ASIDs). 'flase' for IPA instructions + * @lpa2: If 'true', the lpa2 scheme is used as set out below * * When the CPU does not support TLB range operations, flush the TLB * entries one by one at the granularity of 'stride'. If the TLB * range ops are supported, then: * - * 1. The minimum range granularity is decided by 'scale', so multiple range + * 1. If FEAT_LPA2 is in use, the start address of a range operation must be + * 64KB aligned, so flush pages one by one until the alignment is reached + * using the non-range operations. This step is skipped if LPA2 is not in + * use. + * + * 2. The minimum range granularity is decided by 'scale', so multiple range * TLBI operations may be required. Start from scale = 3, flush the largest * possible number of pages ((num+1)*2^(5*scale+1)) that fit into the * requested range, then decrement scale and continue until one or zero pages - * are left. + * are left. We must start from highest scale to ensure 64KB start alignment + * is maintained in the LPA2 case. * - * 2. If there is 1 page remaining, flush it through non-range operations. Range - * operations can only span an even number of pages. + * 3. If there is 1 page remaining, flush it through non-range operations. Range + * operations can only span an even number of pages. We save this for last to + * ensure 64KB start alignment is maintained for the LPA2 case. * * Note that certain ranges can be represented by either num = 31 and * scale or num = 0 and scale + 1. The loop below favours the latter * since num is limited to 30 by the __TLBI_RANGE_NUM() macro. */ #define __flush_tlb_range_op(op, start, pages, stride, \ - asid, tlb_level, tlbi_user) \ + asid, tlb_level, tlbi_user, lpa2) \ do { \ int num = 0; \ int scale = 3; \ + int shift = lpa2 ? 16 : PAGE_SHIFT; \ unsigned long addr; \ \ while (pages > 0) { \ if (!system_supports_tlb_range() || \ - pages == 1) { \ + pages == 1 || \ + (lpa2 && start != ALIGN(start, SZ_64K))) { \ addr = __TLBI_VADDR(start, asid); \ __tlbi_level(op, addr, tlb_level); \ if (tlbi_user) \ @@ -384,8 +407,8 @@ do { \ \ num = __TLBI_RANGE_NUM(pages, scale); \ if (num >= 0) { \ - addr = __TLBI_VADDR_RANGE(start, asid, scale, \ - num, tlb_level); \ + addr = __TLBI_VADDR_RANGE(start >> shift, asid, \ + scale, num, tlb_level); \ __tlbi(r##op, addr); \ if (tlbi_user) \ __tlbi_user(r##op, addr); \ @@ -397,7 +420,7 @@ do { \ } while (0) #define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \ - __flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false) + __flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, kvm_lpa2_is_enabled()); static inline void __flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, @@ -427,9 +450,11 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, asid = ASID(vma->vm_mm); if (last_level) - __flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true); + __flush_tlb_range_op(vale1is, start, pages, stride, asid, + tlb_level, true, lpa2_is_enabled()); else - __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true); + __flush_tlb_range_op(vae1is, start, pages, stride, asid, + tlb_level, true, lpa2_is_enabled()); dsb(ish); mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end); @@ -441,9 +466,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, /* * We cannot use leaf-only invalidation here, since we may be invalidating * table entries as part of collapsing hugepages or moving page tables. - * Set the tlb_level to 0 because we can not get enough information here. + * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough + * information here. */ - __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0); + __flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN); } static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end) From e477c8c483913de92c9cc00b34459dc4d695529b Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 27 Nov 2023 11:17:29 +0000 Subject: [PATCH 04/87] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] PAGE_SIZE support is tested against possible minimum and maximum values for its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A). Acked-by: Catalin Marinas Signed-off-by: Anshuman Khandual Reviewed-by: Oliver Upton Signed-off-by: Ryan Roberts Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20231127111737.1897081-5-ryan.roberts@arm.com --- arch/arm64/include/asm/sysreg.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 5e65f51c10d2..48181cf6cc40 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -871,10 +871,12 @@ /* id_aa64mmfr0 */ #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN 0x0 +#define ID_AA64MMFR0_EL1_TGRAN4_LPA2 ID_AA64MMFR0_EL1_TGRAN4_52_BIT #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX 0x7 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN 0x0 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX 0x7 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN 0x1 +#define ID_AA64MMFR0_EL1_TGRAN16_LPA2 ID_AA64MMFR0_EL1_TGRAN16_52_BIT #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX 0xf #define ARM64_MIN_PARANGE_BITS 32 @@ -882,6 +884,7 @@ #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT 0x0 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE 0x1 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN 0x2 +#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2 0x3 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX 0x7 #ifdef CONFIG_ARM64_PA_BITS_52 @@ -892,11 +895,13 @@ #if defined(CONFIG_ARM64_4K_PAGES) #define ID_AA64MMFR0_EL1_TGRAN_SHIFT ID_AA64MMFR0_EL1_TGRAN4_SHIFT +#define ID_AA64MMFR0_EL1_TGRAN_LPA2 ID_AA64MMFR0_EL1_TGRAN4_52_BIT #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT #elif defined(CONFIG_ARM64_16K_PAGES) #define ID_AA64MMFR0_EL1_TGRAN_SHIFT ID_AA64MMFR0_EL1_TGRAN16_SHIFT +#define ID_AA64MMFR0_EL1_TGRAN_LPA2 ID_AA64MMFR0_EL1_TGRAN16_52_BIT #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT From b1366d21daaebb8e474e4169c5e557fbb37bfdc0 Mon Sep 17 00:00:00 2001 From: Ryan Roberts Date: Mon, 27 Nov 2023 11:17:30 +0000 Subject: [PATCH 05/87] arm64: Add ARM64_HAS_LPA2 CPU capability Expose FEAT_LPA2 as a capability so that we can take advantage of alternatives patching in the hypervisor. Although FEAT_LPA2 presence is advertised separately for stage1 and stage2, the expectation is that in practice both stages will either support or not support it. Therefore, we combine both into a single capability, allowing us to simplify the implementation. KVM requires support in both stages in order to use LPA2 since the same library is used for hyp stage 1 and guest stage 2 pgtables. Reviewed-by: Oliver Upton Signed-off-by: Ryan Roberts Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20231127111737.1897081-6-ryan.roberts@arm.com --- arch/arm64/include/asm/cpufeature.h | 5 ++++ arch/arm64/kernel/cpufeature.c | 39 +++++++++++++++++++++++++++++ arch/arm64/tools/cpucaps | 1 + 3 files changed, 45 insertions(+) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index f6d416fe49b0..acf109581ac0 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -819,6 +819,11 @@ static inline bool system_supports_tlb_range(void) return alternative_has_cap_unlikely(ARM64_HAS_TLB_RANGE); } +static inline bool system_supports_lpa2(void) +{ + return cpus_have_final_cap(ARM64_HAS_LPA2); +} + int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt); bool try_emulate_mrs(struct pt_regs *regs, u32 isn); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 646591c67e7a..7900ba7e157e 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1768,6 +1768,39 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry, return !meltdown_safe; } +#if defined(ID_AA64MMFR0_EL1_TGRAN_LPA2) && defined(ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2) +static bool has_lpa2_at_stage1(u64 mmfr0) +{ + unsigned int tgran; + + tgran = cpuid_feature_extract_unsigned_field(mmfr0, + ID_AA64MMFR0_EL1_TGRAN_SHIFT); + return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2; +} + +static bool has_lpa2_at_stage2(u64 mmfr0) +{ + unsigned int tgran; + + tgran = cpuid_feature_extract_unsigned_field(mmfr0, + ID_AA64MMFR0_EL1_TGRAN_2_SHIFT); + return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2; +} + +static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope) +{ + u64 mmfr0; + + mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); + return has_lpa2_at_stage1(mmfr0) && has_lpa2_at_stage2(mmfr0); +} +#else +static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope) +{ + return false; +} +#endif + #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 #define KPTI_NG_TEMP_VA (-(1UL << PMD_SHIFT)) @@ -2731,6 +2764,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP) }, + { + .desc = "52-bit Virtual Addressing for KVM (LPA2)", + .capability = ARM64_HAS_LPA2, + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .matches = has_lpa2, + }, {}, }; diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index b98c38288a9d..919eceb0b3da 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -37,6 +37,7 @@ HAS_GIC_PRIO_MASKING HAS_GIC_PRIO_RELAXED_SYNC HAS_HCX HAS_LDAPR +HAS_LPA2 HAS_LSE_ATOMICS HAS_MOPS HAS_NESTED_VIRT From ced242ba9d7cb3571f6e0f165f643cb832d52148 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 4 Dec 2023 14:36:04 +0000 Subject: [PATCH 06/87] KVM: arm64: Remove VPIPT I-cache handling We have some special handling for VPIPT I-cache in critical parts of the cache and TLB maintenance. Remove it. Reviewed-by: Zenghui Yu Reviewed-by: Anshuman Khandual Signed-off-by: Marc Zyngier Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20231204143606.1806432-2-maz@kernel.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/kvm_mmu.h | 7 ---- arch/arm64/kvm/hyp/nvhe/pkvm.c | 2 +- arch/arm64/kvm/hyp/nvhe/tlb.c | 61 -------------------------------- arch/arm64/kvm/hyp/vhe/tlb.c | 13 ------- 4 files changed, 1 insertion(+), 82 deletions(-) diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 49e0d4b36bd0..e3e793d0ec30 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -243,13 +243,6 @@ static inline size_t __invalidate_icache_max_range(void) static inline void __invalidate_icache_guest_page(void *va, size_t size) { - /* - * VPIPT I-cache maintenance must be done from EL2. See comment in the - * nVHE flavor of __kvm_tlb_flush_vmid_ipa(). - */ - if (icache_is_vpipt() && read_sysreg(CurrentEL) != CurrentEL_EL2) - return; - /* * Blow the whole I-cache if it is aliasing (i.e. VIPT) or the * invalidation range exceeds our arbitrary limit on invadations by diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c index 9d23a51d7f75..b29f15418c0a 100644 --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c @@ -12,7 +12,7 @@ #include #include -/* Used by icache_is_vpipt(). */ +/* Used by icache_is_aliasing(). */ unsigned long __icache_flags; /* Used by kvm_get_vttbr(). */ diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c index 1b265713d6be..a60fb13e2192 100644 --- a/arch/arm64/kvm/hyp/nvhe/tlb.c +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c @@ -105,28 +105,6 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, dsb(ish); isb(); - /* - * If the host is running at EL1 and we have a VPIPT I-cache, - * then we must perform I-cache maintenance at EL2 in order for - * it to have an effect on the guest. Since the guest cannot hit - * I-cache lines allocated with a different VMID, we don't need - * to worry about junk out of guest reset (we nuke the I-cache on - * VMID rollover), but we do need to be careful when remapping - * executable pages for the same guest. This can happen when KSM - * takes a CoW fault on an executable page, copies the page into - * a page that was previously mapped in the guest and then needs - * to invalidate the guest view of the I-cache for that page - * from EL1. To solve this, we invalidate the entire I-cache when - * unmapping a page from a guest if we have a VPIPT I-cache but - * the host is running at EL1. As above, we could do better if - * we had the VA. - * - * The moral of this story is: if you have a VPIPT I-cache, then - * you should be running with VHE enabled. - */ - if (icache_is_vpipt()) - icache_inval_all_pou(); - __tlb_switch_to_host(&cxt); } @@ -157,28 +135,6 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu, dsb(nsh); isb(); - /* - * If the host is running at EL1 and we have a VPIPT I-cache, - * then we must perform I-cache maintenance at EL2 in order for - * it to have an effect on the guest. Since the guest cannot hit - * I-cache lines allocated with a different VMID, we don't need - * to worry about junk out of guest reset (we nuke the I-cache on - * VMID rollover), but we do need to be careful when remapping - * executable pages for the same guest. This can happen when KSM - * takes a CoW fault on an executable page, copies the page into - * a page that was previously mapped in the guest and then needs - * to invalidate the guest view of the I-cache for that page - * from EL1. To solve this, we invalidate the entire I-cache when - * unmapping a page from a guest if we have a VPIPT I-cache but - * the host is running at EL1. As above, we could do better if - * we had the VA. - * - * The moral of this story is: if you have a VPIPT I-cache, then - * you should be running with VHE enabled. - */ - if (icache_is_vpipt()) - icache_inval_all_pou(); - __tlb_switch_to_host(&cxt); } @@ -205,10 +161,6 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu, dsb(ish); isb(); - /* See the comment in __kvm_tlb_flush_vmid_ipa() */ - if (icache_is_vpipt()) - icache_inval_all_pou(); - __tlb_switch_to_host(&cxt); } @@ -246,18 +198,5 @@ void __kvm_flush_vm_context(void) /* Same remark as in __tlb_switch_to_guest() */ dsb(ish); __tlbi(alle1is); - - /* - * VIPT and PIPT caches are not affected by VMID, so no maintenance - * is necessary across a VMID rollover. - * - * VPIPT caches constrain lookup and maintenance to the active VMID, - * so we need to invalidate lines with a stale VMID to avoid an ABA - * race after multiple rollovers. - * - */ - if (icache_is_vpipt()) - asm volatile("ic ialluis"); - dsb(ish); } diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c index b636b4111dbf..b32e2940df7d 100644 --- a/arch/arm64/kvm/hyp/vhe/tlb.c +++ b/arch/arm64/kvm/hyp/vhe/tlb.c @@ -216,18 +216,5 @@ void __kvm_flush_vm_context(void) { dsb(ishst); __tlbi(alle1is); - - /* - * VIPT and PIPT caches are not affected by VMID, so no maintenance - * is necessary across a VMID rollover. - * - * VPIPT caches constrain lookup and maintenance to the active VMID, - * so we need to invalidate lines with a stale VMID to avoid an ABA - * race after multiple rollovers. - * - */ - if (icache_is_vpipt()) - asm volatile("ic ialluis"); - dsb(ish); } From d8e12a0d3715fbcc26fb2baac979bd07ba4c08d0 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 4 Dec 2023 14:36:05 +0000 Subject: [PATCH 07/87] arm64: Kill detection of VPIPT i-cache policy Since the kernel will never run on a system with the VPIPT i-cache policy, drop the detection code altogether. Reviewed-by: Zenghui Yu Reviewed-by: Anshuman Khandual Signed-off-by: Marc Zyngier Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20231204143606.1806432-3-maz@kernel.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/cache.h | 6 ------ arch/arm64/kernel/cpuinfo.c | 5 ----- 2 files changed, 11 deletions(-) diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h index ceb368d33bf4..06a4670bdb0b 100644 --- a/arch/arm64/include/asm/cache.h +++ b/arch/arm64/include/asm/cache.h @@ -58,7 +58,6 @@ static inline unsigned int arch_slab_minalign(void) #define CTR_L1IP(ctr) SYS_FIELD_GET(CTR_EL0, L1Ip, ctr) #define ICACHEF_ALIASING 0 -#define ICACHEF_VPIPT 1 extern unsigned long __icache_flags; /* @@ -70,11 +69,6 @@ static inline int icache_is_aliasing(void) return test_bit(ICACHEF_ALIASING, &__icache_flags); } -static __always_inline int icache_is_vpipt(void) -{ - return test_bit(ICACHEF_VPIPT, &__icache_flags); -} - static inline u32 cache_type_cwg(void) { return SYS_FIELD_GET(CTR_EL0, CWG, read_cpuid_cachetype()); diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index a257da7b56fe..47043c0d95ec 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -36,8 +36,6 @@ static struct cpuinfo_arm64 boot_cpu_data; static inline const char *icache_policy_str(int l1ip) { switch (l1ip) { - case CTR_EL0_L1Ip_VPIPT: - return "VPIPT"; case CTR_EL0_L1Ip_VIPT: return "VIPT"; case CTR_EL0_L1Ip_PIPT: @@ -388,9 +386,6 @@ static void cpuinfo_detect_icache_policy(struct cpuinfo_arm64 *info) switch (l1ip) { case CTR_EL0_L1Ip_PIPT: break; - case CTR_EL0_L1Ip_VPIPT: - set_bit(ICACHEF_VPIPT, &__icache_flags); - break; case CTR_EL0_L1Ip_VIPT: default: /* Assume aliasing */ From f35c32ca6839f5777862dbe2138d02bf50b3dfa7 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 4 Dec 2023 14:36:06 +0000 Subject: [PATCH 08/87] arm64: Rename reserved values for CTR_EL0.L1Ip We now have *two* values for CTR_EL0.L1Ip that are reserved. Which makes things a bit awkward. In order to lift the ambiguity, rename RESERVED (0b01) to RESERVED_AIVIVT, and VPIPT (0b00) to RESERVED_VPIPT. This makes it clear which of these meant what, and I'm sure archeologists will find it useful... Reviewed-by: Zenghui Yu Signed-off-by: Marc Zyngier Reviewed-by: Anshuman Khandual Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20231204143606.1806432-4-maz@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 96cbeeab4eec..c5af75b23187 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2004,9 +2004,10 @@ Field 27:24 CWG Field 23:20 ERG Field 19:16 DminLine Enum 15:14 L1Ip - 0b00 VPIPT + # This was named as VPIPT in the ARM but now documented as reserved + 0b00 RESERVED_VPIPT # This is named as AIVIVT in the ARM but documented as reserved - 0b01 RESERVED + 0b01 RESERVED_AIVIVT 0b10 VIPT 0b11 PIPT EndEnum From a099bec7a81052a324d07c9be7c24dc92fbd8ad1 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Fri, 17 Nov 2023 21:56:19 +0900 Subject: [PATCH 09/87] arm64: vdso32: rename 32-bit debug vdso to vdso32.so.dbg 'make vdso_install' renames arch/arm64/kernel/vdso32/vdso.so.dbg to vdso32.so during installation, which allows 64-bit and 32-bit vdso files to be installed in the same directory. However, arm64 is the only architecture that requires this renaming. To simplify the vdso_install logic, rename the in-tree vdso file so its base name matches the installed file name. Signed-off-by: Masahiro Yamada Link: https://lore.kernel.org/r/20231117125620.1058300-1-masahiroy@kernel.org Signed-off-by: Will Deacon --- arch/arm64/Makefile | 2 +- arch/arm64/kernel/vdso32/Makefile | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 9a2d3723cd0f..47ecc4cff9d2 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -200,7 +200,7 @@ endif endif vdso-install-y += arch/arm64/kernel/vdso/vdso.so.dbg -vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so +vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso32.so.dbg include $(srctree)/scripts/Makefile.defconf diff --git a/arch/arm64/kernel/vdso32/Makefile b/arch/arm64/kernel/vdso32/Makefile index 1f911a76c5af..2266fcdff78a 100644 --- a/arch/arm64/kernel/vdso32/Makefile +++ b/arch/arm64/kernel/vdso32/Makefile @@ -118,7 +118,7 @@ endif VDSO_CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os # Build rules -targets := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso) vdso.so vdso.so.dbg vdso.so.raw +targets := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso) vdso.so vdso32.so.dbg vdso.so.raw c-obj-vdso := $(addprefix $(obj)/, $(c-obj-vdso)) c-obj-vdso-gettimeofday := $(addprefix $(obj)/, $(c-obj-vdso-gettimeofday)) asm-obj-vdso := $(addprefix $(obj)/, $(asm-obj-vdso)) @@ -127,15 +127,15 @@ obj-vdso := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso) targets += vdso.lds CPPFLAGS_vdso.lds += -P -C -U$(ARCH) -include/generated/vdso32-offsets.h: $(obj)/vdso.so.dbg FORCE +include/generated/vdso32-offsets.h: $(obj)/vdso32.so.dbg FORCE $(call if_changed,vdsosym) # Strip rule for vdso.so $(obj)/vdso.so: OBJCOPYFLAGS := -S -$(obj)/vdso.so: $(obj)/vdso.so.dbg FORCE +$(obj)/vdso.so: $(obj)/vdso32.so.dbg FORCE $(call if_changed,objcopy) -$(obj)/vdso.so.dbg: $(obj)/vdso.so.raw $(obj)/$(munge) FORCE +$(obj)/vdso32.so.dbg: $(obj)/vdso.so.raw $(obj)/$(munge) FORCE $(call if_changed,vdsomunge) # Link rule for the .so file, .lds has to be first From 103423ad7e56d6c756738823c332c414b07899e6 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 22 Nov 2023 13:37:54 +0000 Subject: [PATCH 10/87] arm64: Get rid of ARM64_HAS_NO_HW_PREFETCH Back in 2016, it was argued that implementations lacking a HW prefetcher could be helped by sprinkling a number of PRFM instructions in strategic locations. In 2023, the one platform that presumably needed this hack is no longer in active use (let alone maintained), and an quick experiment shows dropping this hack only leads to a 0.4% drop on a full kernel compilation (tested on a MT30-GS0 48 CPU system). Given that this is pretty much in the noise department and that it may give odd ideas to other implementers, drop the hack for good. Suggested-by: Will Deacon Suggested-by: Mark Rutland Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Link: https://lore.kernel.org/r/20231122133754.1240687-1-maz@kernel.org Signed-off-by: Will Deacon --- arch/arm64/kernel/cpufeature.c | 16 ---------------- arch/arm64/lib/copy_page.S | 11 ----------- arch/arm64/tools/cpucaps | 1 - 3 files changed, 28 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 646591c67e7a..b335da126e86 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1584,16 +1584,6 @@ static bool has_useable_gicv3_cpuif(const struct arm64_cpu_capabilities *entry, return has_sre; } -static bool has_no_hw_prefetch(const struct arm64_cpu_capabilities *entry, int __unused) -{ - u32 midr = read_cpuid_id(); - - /* Cavium ThunderX pass 1.x and 2.x */ - return midr_is_cpu_model_range(midr, MIDR_THUNDERX, - MIDR_CPU_VAR_REV(0, 0), - MIDR_CPU_VAR_REV(1, MIDR_REVISION_MASK)); -} - static bool has_cache_idc(const struct arm64_cpu_capabilities *entry, int scope) { @@ -2321,12 +2311,6 @@ static const struct arm64_cpu_capabilities arm64_features[] = { ARM64_CPUID_FIELDS(ID_AA64ISAR0_EL1, ATOMIC, IMP) }, #endif /* CONFIG_ARM64_LSE_ATOMICS */ - { - .desc = "Software prefetching using PRFM", - .capability = ARM64_HAS_NO_HW_PREFETCH, - .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, - .matches = has_no_hw_prefetch, - }, { .desc = "Virtualization Host Extensions", .capability = ARM64_HAS_VIRT_HOST_EXTN, diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S index c336d2ffdec5..6a56d7cf309d 100644 --- a/arch/arm64/lib/copy_page.S +++ b/arch/arm64/lib/copy_page.S @@ -18,13 +18,6 @@ * x1 - src */ SYM_FUNC_START(__pi_copy_page) -alternative_if ARM64_HAS_NO_HW_PREFETCH - // Prefetch three cache lines ahead. - prfm pldl1strm, [x1, #128] - prfm pldl1strm, [x1, #256] - prfm pldl1strm, [x1, #384] -alternative_else_nop_endif - ldp x2, x3, [x1] ldp x4, x5, [x1, #16] ldp x6, x7, [x1, #32] @@ -39,10 +32,6 @@ alternative_else_nop_endif 1: tst x0, #(PAGE_SIZE - 1) -alternative_if ARM64_HAS_NO_HW_PREFETCH - prfm pldl1strm, [x1, #384] -alternative_else_nop_endif - stnp x2, x3, [x0, #-256] ldp x2, x3, [x1] stnp x4, x5, [x0, #16 - 256] diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index b98c38288a9d..0eb2a2d2f783 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -40,7 +40,6 @@ HAS_LDAPR HAS_LSE_ATOMICS HAS_MOPS HAS_NESTED_VIRT -HAS_NO_HW_PREFETCH HAS_PAN HAS_S1PIE HAS_RAS_EXTN From 590f23b092401f29e410fd4ca67128fcc45192fc Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Thu, 23 Nov 2023 11:58:13 +0000 Subject: [PATCH 11/87] perf/arm-cmn: Fix HN-F class_occup_id events A subtle copy-paste error managed to slip through the reorganisation of these patches in development, and not only give some HN-F events the wrong type, but use that wrong type before the subsequent patch defined it. Too late to fix history, but we can at least fix the bug. Fixes: b1b7dc38e482 ("perf/arm-cmn: Refactor HN-F event selector macros") Reported-by: Jing Zhang Signed-off-by: Robin Murphy Link: https://lore.kernel.org/r/5a22439de84ff188ef76674798052448eb03a3e1.1700740693.git.robin.murphy@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm-cmn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index 014010d03588..86d970e74129 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -811,7 +811,7 @@ static umode_t arm_cmn_event_attr_is_visible(struct kobject *kobj, #define CMN_EVENT_HNF_OCC(_model, _name, _event) \ CMN_EVENT_HN_OCC(_model, hnf_##_name, CMN_TYPE_HNF, _event) #define CMN_EVENT_HNF_CLS(_model, _name, _event) \ - CMN_EVENT_HN_CLS(_model, hnf_##_name, CMN_TYPE_HNS, _event) + CMN_EVENT_HN_CLS(_model, hnf_##_name, CMN_TYPE_HNF, _event) #define CMN_EVENT_HNF_SNT(_model, _name, _event) \ CMN_EVENT_HN_SNT(_model, hnf_##_name, CMN_TYPE_HNF, _event) From 877806b9b41ea371defaaf58d815320f1a76384f Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Tue, 14 Nov 2023 11:46:56 +0530 Subject: [PATCH 12/87] drivers: perf: arm_pmuv3: Add new macro PMUV3_INIT_MAP_EVENT() This further compacts all remaining PMU init procedures requiring specific map_event functions via a new macro PMUV3_INIT_MAP_EVENT(). While here, it also changes generated init function names to match to those generated via the other macro PMUV3_INIT_SIMPLE(). This does not cause functional change. Cc: Will Deacon Cc: Mark Rutland Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: James Clark Signed-off-by: Anshuman Khandual Link: https://lore.kernel.org/r/20231114061656.337231-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmuv3.c | 61 +++++++++++++--------------------------- 1 file changed, 20 insertions(+), 41 deletions(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 6ca7be05229c..c136a6529014 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -1221,6 +1221,12 @@ static int name##_pmu_init(struct arm_pmu *cpu_pmu) \ return armv8_pmu_init(cpu_pmu, #name, armv8_pmuv3_map_event); \ } +#define PMUV3_INIT_MAP_EVENT(name, map_event) \ +static int name##_pmu_init(struct arm_pmu *cpu_pmu) \ +{ \ + return armv8_pmu_init(cpu_pmu, #name, map_event); \ +} + PMUV3_INIT_SIMPLE(armv8_pmuv3) PMUV3_INIT_SIMPLE(armv8_cortex_a34) @@ -1247,51 +1253,24 @@ PMUV3_INIT_SIMPLE(armv8_neoverse_v1) PMUV3_INIT_SIMPLE(armv8_nvidia_carmel) PMUV3_INIT_SIMPLE(armv8_nvidia_denver) -static int armv8_a35_pmu_init(struct arm_pmu *cpu_pmu) -{ - return armv8_pmu_init(cpu_pmu, "armv8_cortex_a35", armv8_a53_map_event); -} - -static int armv8_a53_pmu_init(struct arm_pmu *cpu_pmu) -{ - return armv8_pmu_init(cpu_pmu, "armv8_cortex_a53", armv8_a53_map_event); -} - -static int armv8_a57_pmu_init(struct arm_pmu *cpu_pmu) -{ - return armv8_pmu_init(cpu_pmu, "armv8_cortex_a57", armv8_a57_map_event); -} - -static int armv8_a72_pmu_init(struct arm_pmu *cpu_pmu) -{ - return armv8_pmu_init(cpu_pmu, "armv8_cortex_a72", armv8_a57_map_event); -} - -static int armv8_a73_pmu_init(struct arm_pmu *cpu_pmu) -{ - return armv8_pmu_init(cpu_pmu, "armv8_cortex_a73", armv8_a73_map_event); -} - -static int armv8_thunder_pmu_init(struct arm_pmu *cpu_pmu) -{ - return armv8_pmu_init(cpu_pmu, "armv8_cavium_thunder", armv8_thunder_map_event); -} - -static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu) -{ - return armv8_pmu_init(cpu_pmu, "armv8_brcm_vulcan", armv8_vulcan_map_event); -} +PMUV3_INIT_MAP_EVENT(armv8_cortex_a35, armv8_a53_map_event) +PMUV3_INIT_MAP_EVENT(armv8_cortex_a53, armv8_a53_map_event) +PMUV3_INIT_MAP_EVENT(armv8_cortex_a57, armv8_a57_map_event) +PMUV3_INIT_MAP_EVENT(armv8_cortex_a72, armv8_a57_map_event) +PMUV3_INIT_MAP_EVENT(armv8_cortex_a73, armv8_a73_map_event) +PMUV3_INIT_MAP_EVENT(armv8_cavium_thunder, armv8_thunder_map_event) +PMUV3_INIT_MAP_EVENT(armv8_brcm_vulcan, armv8_vulcan_map_event) static const struct of_device_id armv8_pmu_of_device_ids[] = { {.compatible = "arm,armv8-pmuv3", .data = armv8_pmuv3_pmu_init}, {.compatible = "arm,cortex-a34-pmu", .data = armv8_cortex_a34_pmu_init}, - {.compatible = "arm,cortex-a35-pmu", .data = armv8_a35_pmu_init}, - {.compatible = "arm,cortex-a53-pmu", .data = armv8_a53_pmu_init}, + {.compatible = "arm,cortex-a35-pmu", .data = armv8_cortex_a35_pmu_init}, + {.compatible = "arm,cortex-a53-pmu", .data = armv8_cortex_a53_pmu_init}, {.compatible = "arm,cortex-a55-pmu", .data = armv8_cortex_a55_pmu_init}, - {.compatible = "arm,cortex-a57-pmu", .data = armv8_a57_pmu_init}, + {.compatible = "arm,cortex-a57-pmu", .data = armv8_cortex_a57_pmu_init}, {.compatible = "arm,cortex-a65-pmu", .data = armv8_cortex_a65_pmu_init}, - {.compatible = "arm,cortex-a72-pmu", .data = armv8_a72_pmu_init}, - {.compatible = "arm,cortex-a73-pmu", .data = armv8_a73_pmu_init}, + {.compatible = "arm,cortex-a72-pmu", .data = armv8_cortex_a72_pmu_init}, + {.compatible = "arm,cortex-a73-pmu", .data = armv8_cortex_a73_pmu_init}, {.compatible = "arm,cortex-a75-pmu", .data = armv8_cortex_a75_pmu_init}, {.compatible = "arm,cortex-a76-pmu", .data = armv8_cortex_a76_pmu_init}, {.compatible = "arm,cortex-a77-pmu", .data = armv8_cortex_a77_pmu_init}, @@ -1309,8 +1288,8 @@ static const struct of_device_id armv8_pmu_of_device_ids[] = { {.compatible = "arm,neoverse-n1-pmu", .data = armv8_neoverse_n1_pmu_init}, {.compatible = "arm,neoverse-n2-pmu", .data = armv9_neoverse_n2_pmu_init}, {.compatible = "arm,neoverse-v1-pmu", .data = armv8_neoverse_v1_pmu_init}, - {.compatible = "cavium,thunder-pmu", .data = armv8_thunder_pmu_init}, - {.compatible = "brcm,vulcan-pmu", .data = armv8_vulcan_pmu_init}, + {.compatible = "cavium,thunder-pmu", .data = armv8_cavium_thunder_pmu_init}, + {.compatible = "brcm,vulcan-pmu", .data = armv8_brcm_vulcan_pmu_init}, {.compatible = "nvidia,carmel-pmu", .data = armv8_nvidia_carmel_pmu_init}, {.compatible = "nvidia,denver-pmu", .data = armv8_nvidia_denver_pmu_init}, {}, From ca6f537e459e2da4b331fe8928d1a0b0f9301f42 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 4 Dec 2023 11:58:47 +0000 Subject: [PATCH 13/87] drivers/perf: pmuv3: don't expose SW_INCR event in sysfs The SW_INCR event is somewhat unusual, and depends on the specific HW counter that it is programmed into. When programmed into PMEVCNTR, SW_INCR will count any writes to PMSWINC_EL0 with bit n set, ignoring writes to SW_INCR with bit n clear. Event rotation means that there's no fixed relationship between perf_events and HW counters, so this isn't all that useful. Further, we program PMUSERENR.{SW,EN}=={0,0}, which causes EL0 writes to PMSWINC_EL0 to be trapped and handled as UNDEFINED, resulting in a SIGILL to userspace. Given that, it's not a good idea to expose SW_INCR in sysfs. Hide it as we did for CHAIN back in commit: 4ba2578fa7b55701 ("arm64: perf: don't expose CHAIN event in sysfs") Signed-off-by: Mark Rutland Cc: Will Deacon Link: https://lore.kernel.org/r/20231204115847.2993026-1-mark.rutland@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmuv3.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index c136a6529014..bbde60f1b08c 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -169,7 +169,11 @@ armv8pmu_events_sysfs_show(struct device *dev, PMU_EVENT_ATTR_ID(name, armv8pmu_events_sysfs_show, config) static struct attribute *armv8_pmuv3_event_attrs[] = { - ARMV8_EVENT_ATTR(sw_incr, ARMV8_PMUV3_PERFCTR_SW_INCR), + /* + * Don't expose the sw_incr event in /sys. It's not usable as writes to + * PMSWINC_EL0 will trap as PMUSERENR.{SW,EN}=={0,0} and event rotation + * means we don't have a fixed event<->counter relationship regardless. + */ ARMV8_EVENT_ATTR(l1i_cache_refill, ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL), ARMV8_EVENT_ATTR(l1i_tlb_refill, ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL), ARMV8_EVENT_ATTR(l1d_cache_refill, ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL), From 38bbef7240b8c5f2dc4493eec356e2efbf2da5f4 Mon Sep 17 00:00:00 2001 From: Junhao He Date: Mon, 4 Dec 2023 19:04:25 +0800 Subject: [PATCH 14/87] drivers/perf: hisi: Fix some event id for HiSilicon UC pmu Some event id of HiSilicon uncore UC PMU driver is incorrect, fix them. Fixes: 312eca95e28d ("drivers/perf: hisi: Add support for HiSilicon UC PMU driver") Signed-off-by: Junhao He Reviewed-by: Yicong Yang Link: https://lore.kernel.org/r/20231204110425.20354-1-hejunhao3@huawei.com Signed-off-by: Will Deacon --- drivers/perf/hisilicon/hisi_uncore_uc_pmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/perf/hisilicon/hisi_uncore_uc_pmu.c b/drivers/perf/hisilicon/hisi_uncore_uc_pmu.c index 63da05e5831c..636fb79647c8 100644 --- a/drivers/perf/hisilicon/hisi_uncore_uc_pmu.c +++ b/drivers/perf/hisilicon/hisi_uncore_uc_pmu.c @@ -383,8 +383,8 @@ static struct attribute *hisi_uc_pmu_events_attr[] = { HISI_PMU_EVENT_ATTR(cpu_rd, 0x10), HISI_PMU_EVENT_ATTR(cpu_rd64, 0x17), HISI_PMU_EVENT_ATTR(cpu_rs64, 0x19), - HISI_PMU_EVENT_ATTR(cpu_mru, 0x1a), - HISI_PMU_EVENT_ATTR(cycles, 0x9c), + HISI_PMU_EVENT_ATTR(cpu_mru, 0x1c), + HISI_PMU_EVENT_ATTR(cycles, 0x95), HISI_PMU_EVENT_ATTR(spipe_hit, 0xb3), HISI_PMU_EVENT_ATTR(hpipe_hit, 0xdb), HISI_PMU_EVENT_ATTR(cring_rxdat_cnt, 0xfa), From 5cd7da19cb9714adbf56054e0a0bd044f49e2a8e Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Wed, 15 Nov 2023 14:58:04 +0530 Subject: [PATCH 15/87] arm: perf: Remove PMU locking Currently the 32-bit arm PMU drivers use the pmu_hw_events::lock spinlock in their arm_pmu::{start,stop,enable,disable}() callbacks to protect hardware state and event data. This locking is not necessary as the perf core code already provides mutual exclusion, disabling interrupts to serialize against the IRQ handler, and using perf_event_context::lock to protect against concurrent modifications of events cross-cpu. The locking was removed from the arm64 (now PMUv3) PMU driver in commit: 2a0e2a02e4b7 ("arm64: perf: Remove PMU locking") ... and the same reasoning applies to all the 32-bit PMU drivers. Remove the locking from the 32-bit PMU drivers. Cc: Mark Rutland Cc: Will Deacon Cc: Russell King Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Anshuman Khandual Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20231115092805.737822-2-anshuman.khandual@arm.com Signed-off-by: Will Deacon --- arch/arm/kernel/perf_event_v6.c | 28 ++++-------------- arch/arm/kernel/perf_event_v7.c | 44 ----------------------------- arch/arm/kernel/perf_event_xscale.c | 44 ++++++----------------------- 3 files changed, 13 insertions(+), 103 deletions(-) diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c index 1ae99deeec54..8fc080c9e4fb 100644 --- a/arch/arm/kernel/perf_event_v6.c +++ b/arch/arm/kernel/perf_event_v6.c @@ -268,10 +268,8 @@ static inline void armv6pmu_write_counter(struct perf_event *event, u64 value) static void armv6pmu_enable_event(struct perf_event *event) { - unsigned long val, mask, evt, flags; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + unsigned long val, mask, evt; struct hw_perf_event *hwc = &event->hw; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; if (ARMV6_CYCLE_COUNTER == idx) { @@ -294,12 +292,10 @@ static void armv6pmu_enable_event(struct perf_event *event) * Mask out the current event and set the counter to count the event * that we're interested in. */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = armv6_pmcr_read(); val &= ~mask; val |= evt; armv6_pmcr_write(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static irqreturn_t @@ -362,26 +358,20 @@ armv6pmu_handle_irq(struct arm_pmu *cpu_pmu) static void armv6pmu_start(struct arm_pmu *cpu_pmu) { - unsigned long flags, val; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + unsigned long val; - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = armv6_pmcr_read(); val |= ARMV6_PMCR_ENABLE; armv6_pmcr_write(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void armv6pmu_stop(struct arm_pmu *cpu_pmu) { - unsigned long flags, val; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + unsigned long val; - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = armv6_pmcr_read(); val &= ~ARMV6_PMCR_ENABLE; armv6_pmcr_write(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static int @@ -419,10 +409,8 @@ static void armv6pmu_clear_event_idx(struct pmu_hw_events *cpuc, static void armv6pmu_disable_event(struct perf_event *event) { - unsigned long val, mask, evt, flags; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + unsigned long val, mask, evt; struct hw_perf_event *hwc = &event->hw; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; if (ARMV6_CYCLE_COUNTER == idx) { @@ -444,20 +432,16 @@ static void armv6pmu_disable_event(struct perf_event *event) * of ETM bus signal assertion cycles. The external reporting should * be disabled and so this should never increment. */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = armv6_pmcr_read(); val &= ~mask; val |= evt; armv6_pmcr_write(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void armv6mpcore_pmu_disable_event(struct perf_event *event) { - unsigned long val, mask, flags, evt = 0; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + unsigned long val, mask, evt = 0; struct hw_perf_event *hwc = &event->hw; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; if (ARMV6_CYCLE_COUNTER == idx) { @@ -475,12 +459,10 @@ static void armv6mpcore_pmu_disable_event(struct perf_event *event) * Unlike UP ARMv6, we don't have a way of stopping the counters. We * simply disable the interrupt reporting. */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = armv6_pmcr_read(); val &= ~mask; val |= evt; armv6_pmcr_write(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static int armv6_map_event(struct perf_event *event) diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c index eb2190477da1..c890354b04e9 100644 --- a/arch/arm/kernel/perf_event_v7.c +++ b/arch/arm/kernel/perf_event_v7.c @@ -870,10 +870,8 @@ static void armv7_pmnc_dump_regs(struct arm_pmu *cpu_pmu) static void armv7pmu_enable_event(struct perf_event *event) { - unsigned long flags; struct hw_perf_event *hwc = &event->hw; struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; if (!armv7_pmnc_counter_valid(cpu_pmu, idx)) { @@ -886,7 +884,6 @@ static void armv7pmu_enable_event(struct perf_event *event) * Enable counter and interrupt, and set the counter to count * the event that we're interested in. */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); /* * Disable counter @@ -910,16 +907,12 @@ static void armv7pmu_enable_event(struct perf_event *event) * Enable counter */ armv7_pmnc_enable_counter(idx); - - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void armv7pmu_disable_event(struct perf_event *event) { - unsigned long flags; struct hw_perf_event *hwc = &event->hw; struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; if (!armv7_pmnc_counter_valid(cpu_pmu, idx)) { @@ -931,7 +924,6 @@ static void armv7pmu_disable_event(struct perf_event *event) /* * Disable counter and interrupt */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); /* * Disable counter @@ -942,8 +934,6 @@ static void armv7pmu_disable_event(struct perf_event *event) * Disable interrupt for this counter */ armv7_pmnc_disable_intens(idx); - - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu) @@ -1009,24 +999,14 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu) static void armv7pmu_start(struct arm_pmu *cpu_pmu) { - unsigned long flags; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); - - raw_spin_lock_irqsave(&events->pmu_lock, flags); /* Enable all counters */ armv7_pmnc_write(armv7_pmnc_read() | ARMV7_PMNC_E); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void armv7pmu_stop(struct arm_pmu *cpu_pmu) { - unsigned long flags; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); - - raw_spin_lock_irqsave(&events->pmu_lock, flags); /* Disable all counters */ armv7_pmnc_write(armv7_pmnc_read() & ~ARMV7_PMNC_E); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc, @@ -1492,14 +1472,10 @@ static void krait_clearpmu(u32 config_base) static void krait_pmu_disable_event(struct perf_event *event) { - unsigned long flags; struct hw_perf_event *hwc = &event->hw; int idx = hwc->idx; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); /* Disable counter and interrupt */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); /* Disable counter */ armv7_pmnc_disable_counter(idx); @@ -1512,23 +1488,17 @@ static void krait_pmu_disable_event(struct perf_event *event) /* Disable interrupt for this counter */ armv7_pmnc_disable_intens(idx); - - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void krait_pmu_enable_event(struct perf_event *event) { - unsigned long flags; struct hw_perf_event *hwc = &event->hw; int idx = hwc->idx; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); /* * Enable counter and interrupt, and set the counter to count * the event that we're interested in. */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); /* Disable counter */ armv7_pmnc_disable_counter(idx); @@ -1548,8 +1518,6 @@ static void krait_pmu_enable_event(struct perf_event *event) /* Enable counter */ armv7_pmnc_enable_counter(idx); - - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void krait_pmu_reset(void *info) @@ -1825,14 +1793,10 @@ static void scorpion_clearpmu(u32 config_base) static void scorpion_pmu_disable_event(struct perf_event *event) { - unsigned long flags; struct hw_perf_event *hwc = &event->hw; int idx = hwc->idx; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); /* Disable counter and interrupt */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); /* Disable counter */ armv7_pmnc_disable_counter(idx); @@ -1845,23 +1809,17 @@ static void scorpion_pmu_disable_event(struct perf_event *event) /* Disable interrupt for this counter */ armv7_pmnc_disable_intens(idx); - - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void scorpion_pmu_enable_event(struct perf_event *event) { - unsigned long flags; struct hw_perf_event *hwc = &event->hw; int idx = hwc->idx; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); /* * Enable counter and interrupt, and set the counter to count * the event that we're interested in. */ - raw_spin_lock_irqsave(&events->pmu_lock, flags); /* Disable counter */ armv7_pmnc_disable_counter(idx); @@ -1881,8 +1839,6 @@ static void scorpion_pmu_enable_event(struct perf_event *event) /* Enable counter */ armv7_pmnc_enable_counter(idx); - - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void scorpion_pmu_reset(void *info) diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c index f6cdcacfb96d..7a2ba1c689a7 100644 --- a/arch/arm/kernel/perf_event_xscale.c +++ b/arch/arm/kernel/perf_event_xscale.c @@ -203,10 +203,8 @@ xscale1pmu_handle_irq(struct arm_pmu *cpu_pmu) static void xscale1pmu_enable_event(struct perf_event *event) { - unsigned long val, mask, evt, flags; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + unsigned long val, mask, evt; struct hw_perf_event *hwc = &event->hw; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; switch (idx) { @@ -229,20 +227,16 @@ static void xscale1pmu_enable_event(struct perf_event *event) return; } - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = xscale1pmu_read_pmnc(); val &= ~mask; val |= evt; xscale1pmu_write_pmnc(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void xscale1pmu_disable_event(struct perf_event *event) { - unsigned long val, mask, evt, flags; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + unsigned long val, mask, evt; struct hw_perf_event *hwc = &event->hw; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; switch (idx) { @@ -263,12 +257,10 @@ static void xscale1pmu_disable_event(struct perf_event *event) return; } - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = xscale1pmu_read_pmnc(); val &= ~mask; val |= evt; xscale1pmu_write_pmnc(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static int @@ -300,26 +292,20 @@ static void xscalepmu_clear_event_idx(struct pmu_hw_events *cpuc, static void xscale1pmu_start(struct arm_pmu *cpu_pmu) { - unsigned long flags, val; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + unsigned long val; - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = xscale1pmu_read_pmnc(); val |= XSCALE_PMU_ENABLE; xscale1pmu_write_pmnc(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void xscale1pmu_stop(struct arm_pmu *cpu_pmu) { - unsigned long flags, val; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + unsigned long val; - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = xscale1pmu_read_pmnc(); val &= ~XSCALE_PMU_ENABLE; xscale1pmu_write_pmnc(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static inline u64 xscale1pmu_read_counter(struct perf_event *event) @@ -549,10 +535,8 @@ xscale2pmu_handle_irq(struct arm_pmu *cpu_pmu) static void xscale2pmu_enable_event(struct perf_event *event) { - unsigned long flags, ien, evtsel; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + unsigned long ien, evtsel; struct hw_perf_event *hwc = &event->hw; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; ien = xscale2pmu_read_int_enable(); @@ -587,18 +571,14 @@ static void xscale2pmu_enable_event(struct perf_event *event) return; } - raw_spin_lock_irqsave(&events->pmu_lock, flags); xscale2pmu_write_event_select(evtsel); xscale2pmu_write_int_enable(ien); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void xscale2pmu_disable_event(struct perf_event *event) { - unsigned long flags, ien, evtsel, of_flags; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + unsigned long ien, evtsel, of_flags; struct hw_perf_event *hwc = &event->hw; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); int idx = hwc->idx; ien = xscale2pmu_read_int_enable(); @@ -638,11 +618,9 @@ static void xscale2pmu_disable_event(struct perf_event *event) return; } - raw_spin_lock_irqsave(&events->pmu_lock, flags); xscale2pmu_write_event_select(evtsel); xscale2pmu_write_int_enable(ien); xscale2pmu_write_overflow_flags(of_flags); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static int @@ -663,26 +641,20 @@ out: static void xscale2pmu_start(struct arm_pmu *cpu_pmu) { - unsigned long flags, val; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + unsigned long val; - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = xscale2pmu_read_pmnc() & ~XSCALE_PMU_CNT64; val |= XSCALE_PMU_ENABLE; xscale2pmu_write_pmnc(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static void xscale2pmu_stop(struct arm_pmu *cpu_pmu) { - unsigned long flags, val; - struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + unsigned long val; - raw_spin_lock_irqsave(&events->pmu_lock, flags); val = xscale2pmu_read_pmnc(); val &= ~XSCALE_PMU_ENABLE; xscale2pmu_write_pmnc(val); - raw_spin_unlock_irqrestore(&events->pmu_lock, flags); } static inline u64 xscale2pmu_read_counter(struct perf_event *event) From 118eb89b1e7f6807776c012cffc5c9b07fd26164 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Wed, 15 Nov 2023 14:58:05 +0530 Subject: [PATCH 16/87] drivers: perf: arm_pmu: Drop 'pmu_lock' element from 'struct pmu_hw_events' As 'pmu_lock' element is not being used in any ARM PMU implementation, just drop this from 'struct pmu_hw_events'. Cc: Will Deacon Cc: Mark Rutland Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20231115092805.737822-3-anshuman.khandual@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmu.c | 1 - include/linux/perf/arm_pmu.h | 6 ------ 2 files changed, 7 deletions(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index d712a19e47ac..379479b50bdd 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -893,7 +893,6 @@ struct arm_pmu *armpmu_alloc(void) struct pmu_hw_events *events; events = per_cpu_ptr(pmu->hw_events, cpu); - raw_spin_lock_init(&events->pmu_lock); events->percpu_pmu = pmu; } diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 143fbc10ecfe..e2503d48ddee 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -59,12 +59,6 @@ struct pmu_hw_events { */ DECLARE_BITMAP(used_mask, ARMPMU_MAX_HWEVENTS); - /* - * Hardware lock to serialize accesses to PMU registers. Needed for the - * read/modify/write sequences. - */ - raw_spinlock_t pmu_lock; - /* * When using percpu IRQs, we need a percpu dev_id. Place it here as we * already have to allocate this struct per cpu. From afd83967e7bb9b41ee71153b868dcf94e43c2d7a Mon Sep 17 00:00:00 2001 From: Xu Yang Date: Mon, 20 Nov 2023 17:33:13 +0800 Subject: [PATCH 17/87] perf: fsl_imx8_ddr: Add AXI ID PORT CHANNEL filter support This is the extension of AXI ID filter. Filter is defined with 2 configuration registers per counter 1-3 (counter 0 is not used for filtering and lacks these registers). * Counter N MASK COMP register - AXI_ID and AXI_MASKING. * Counter N MUX CNTL register - AXI CHANNEL and AXI PORT. -- 0: address channel -- 1: data channel This filter is exposed to userspace as an additional (channel, port) pair. The definition of axi_channel is inverted in userspace, and it will be reverted in driver automatically. AXI filter of Perf Monitor in DDR Subsystem, only a single port0 exist, so axi_port is reserved which should be 0. e.g. perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd Signed-off-by: Xu Yang Reviewed-by: Frank Li Link: https://lore.kernel.org/r/20231120093317.2652866-1-xu.yang_2@nxp.com Signed-off-by: Will Deacon --- drivers/perf/fsl_imx8_ddr_perf.c | 39 ++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c index 92611c98120f..d0eae2d7e64b 100644 --- a/drivers/perf/fsl_imx8_ddr_perf.c +++ b/drivers/perf/fsl_imx8_ddr_perf.c @@ -19,6 +19,8 @@ #define COUNTER_READ 0x20 #define COUNTER_DPCR1 0x30 +#define COUNTER_MUX_CNTL 0x50 +#define COUNTER_MASK_COMP 0x54 #define CNTL_OVER 0x1 #define CNTL_CLEAR 0x2 @@ -32,6 +34,13 @@ #define CNTL_CSV_SHIFT 24 #define CNTL_CSV_MASK (0xFFU << CNTL_CSV_SHIFT) +#define READ_PORT_SHIFT 0 +#define READ_PORT_MASK (0x7 << READ_PORT_SHIFT) +#define READ_CHANNEL_REVERT 0x00000008 /* bit 3 for read channel select */ +#define WRITE_PORT_SHIFT 8 +#define WRITE_PORT_MASK (0x7 << WRITE_PORT_SHIFT) +#define WRITE_CHANNEL_REVERT 0x00000800 /* bit 11 for write channel select */ + #define EVENT_CYCLES_ID 0 #define EVENT_CYCLES_COUNTER 0 #define NUM_COUNTERS 4 @@ -50,6 +59,7 @@ static DEFINE_IDA(ddr_ida); /* DDR Perf hardware feature */ #define DDR_CAP_AXI_ID_FILTER 0x1 /* support AXI ID filter */ #define DDR_CAP_AXI_ID_FILTER_ENHANCED 0x3 /* support enhanced AXI ID filter */ +#define DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER 0x4 /* support AXI ID PORT CHANNEL filter */ struct fsl_ddr_devtype_data { unsigned int quirks; /* quirks needed for different DDR Perf core */ @@ -144,6 +154,7 @@ static const struct attribute_group ddr_perf_identifier_attr_group = { enum ddr_perf_filter_capabilities { PERF_CAP_AXI_ID_FILTER = 0, PERF_CAP_AXI_ID_FILTER_ENHANCED, + PERF_CAP_AXI_ID_PORT_CHANNEL_FILTER, PERF_CAP_AXI_ID_FEAT_MAX, }; @@ -157,6 +168,8 @@ static u32 ddr_perf_filter_cap_get(struct ddr_pmu *pmu, int cap) case PERF_CAP_AXI_ID_FILTER_ENHANCED: quirks &= DDR_CAP_AXI_ID_FILTER_ENHANCED; return quirks == DDR_CAP_AXI_ID_FILTER_ENHANCED; + case PERF_CAP_AXI_ID_PORT_CHANNEL_FILTER: + return !!(quirks & DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER); default: WARN(1, "unknown filter cap %d\n", cap); } @@ -187,6 +200,7 @@ static ssize_t ddr_perf_filter_cap_show(struct device *dev, static struct attribute *ddr_perf_filter_cap_attr[] = { PERF_FILTER_EXT_ATTR_ENTRY(filter, PERF_CAP_AXI_ID_FILTER), PERF_FILTER_EXT_ATTR_ENTRY(enhanced_filter, PERF_CAP_AXI_ID_FILTER_ENHANCED), + PERF_FILTER_EXT_ATTR_ENTRY(super_filter, PERF_CAP_AXI_ID_PORT_CHANNEL_FILTER), NULL, }; @@ -272,11 +286,15 @@ static const struct attribute_group ddr_perf_events_attr_group = { PMU_FORMAT_ATTR(event, "config:0-7"); PMU_FORMAT_ATTR(axi_id, "config1:0-15"); PMU_FORMAT_ATTR(axi_mask, "config1:16-31"); +PMU_FORMAT_ATTR(axi_port, "config2:0-2"); +PMU_FORMAT_ATTR(axi_channel, "config2:3-3"); static struct attribute *ddr_perf_format_attrs[] = { &format_attr_event.attr, &format_attr_axi_id.attr, &format_attr_axi_mask.attr, + &format_attr_axi_port.attr, + &format_attr_axi_channel.attr, NULL, }; @@ -530,6 +548,7 @@ static int ddr_perf_event_add(struct perf_event *event, int flags) int counter; int cfg = event->attr.config; int cfg1 = event->attr.config1; + int cfg2 = event->attr.config2; if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) { int i; @@ -553,6 +572,26 @@ static int ddr_perf_event_add(struct perf_event *event, int flags) return -EOPNOTSUPP; } + if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER) { + if (ddr_perf_is_filtered(event)) { + /* revert axi id masking(axi_mask) value */ + cfg1 ^= AXI_MASKING_REVERT; + writel(cfg1, pmu->base + COUNTER_MASK_COMP + ((counter - 1) << 4)); + + if (cfg == 0x41) { + /* revert axi read channel(axi_channel) value */ + cfg2 ^= READ_CHANNEL_REVERT; + cfg2 |= FIELD_PREP(READ_PORT_MASK, cfg2); + } else { + /* revert axi write channel(axi_channel) value */ + cfg2 ^= WRITE_CHANNEL_REVERT; + cfg2 |= FIELD_PREP(WRITE_PORT_MASK, cfg2); + } + + writel(cfg2, pmu->base + COUNTER_MUX_CNTL + ((counter - 1) << 4)); + } + } + pmu->events[counter] = event; hwc->idx = counter; From 9745295358f44cc5da4145b2a5ca3e2921d70841 Mon Sep 17 00:00:00 2001 From: Xu Yang Date: Mon, 20 Nov 2023 17:33:14 +0800 Subject: [PATCH 18/87] docs/perf: Add explanation for DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER quirk Add explanation for DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER quirk. Signed-off-by: Xu Yang Link: https://lore.kernel.org/r/20231120093317.2652866-2-xu.yang_2@nxp.com Signed-off-by: Will Deacon --- Documentation/admin-guide/perf/imx-ddr.rst | 45 ++++++++++++++++++---- 1 file changed, 37 insertions(+), 8 deletions(-) diff --git a/Documentation/admin-guide/perf/imx-ddr.rst b/Documentation/admin-guide/perf/imx-ddr.rst index 90926d0fb8ec..77418ae5a290 100644 --- a/Documentation/admin-guide/perf/imx-ddr.rst +++ b/Documentation/admin-guide/perf/imx-ddr.rst @@ -13,8 +13,8 @@ is one register for each counter. Counter 0 is special in that it always counts interrupt is raised. If any other counter overflows, it continues counting, and no interrupt is raised. -The "format" directory describes format of the config (event ID) and config1 -(AXI filtering) fields of the perf_event_attr structure, see /sys/bus/event_source/ +The "format" directory describes format of the config (event ID) and config1/2 +(AXI filter setting) fields of the perf_event_attr structure, see /sys/bus/event_source/ devices/imx8_ddr0/format/. The "events" directory describes the events types hardware supported that can be used with perf tool, see /sys/bus/event_source/ devices/imx8_ddr0/events/. The "caps" directory describes filter features implemented @@ -28,12 +28,11 @@ in DDR PMU, see /sys/bus/events_source/devices/imx8_ddr0/caps/. AXI filtering is only used by CSV modes 0x41 (axid-read) and 0x42 (axid-write) to count reading or writing matches filter setting. Filter setting is various from different DRAM controller implementations, which is distinguished by quirks -in the driver. You also can dump info from userspace, filter in "caps" directory -indicates whether PMU supports AXI ID filter or not; enhanced_filter indicates -whether PMU supports enhanced AXI ID filter or not. Value 0 for un-supported, and -value 1 for supported. +in the driver. You also can dump info from userspace, "caps" directory show the +type of AXI filter (filter, enhanced_filter and super_filter). Value 0 for +un-supported, and value 1 for supported. -* With DDR_CAP_AXI_ID_FILTER quirk(filter: 1, enhanced_filter: 0). +* With DDR_CAP_AXI_ID_FILTER quirk(filter: 1, enhanced_filter: 0, super_filter: 0). Filter is defined with two configuration parts: --AXI_ID defines AxID matching value. --AXI_MASKING defines which bits of AxID are meaningful for the matching. @@ -65,7 +64,37 @@ value 1 for supported. perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will monitor ARID=0x12 -* With DDR_CAP_AXI_ID_FILTER_ENHANCED quirk(filter: 1, enhanced_filter: 1). +* With DDR_CAP_AXI_ID_FILTER_ENHANCED quirk(filter: 1, enhanced_filter: 1, super_filter: 0). This is an extension to the DDR_CAP_AXI_ID_FILTER quirk which permits counting the number of bytes (as opposed to the number of bursts) from DDR read and write transactions concurrently with another set of data counters. + +* With DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER quirk(filter: 0, enhanced_filter: 0, super_filter: 1). + There is a limitation in previous AXI filter, it cannot filter different IDs + at the same time as the filter is shared between counters. This quirk is the + extension of AXI ID filter. One improvement is that counter 1-3 has their own + filter, means that it supports concurrently filter various IDs. Another + improvement is that counter 1-3 supports AXI PORT and CHANNEL selection. Support + selecting address channel or data channel. + + Filter is defined with 2 configuration registers per counter 1-3. + --Counter N MASK COMP register - including AXI_ID and AXI_MASKING. + --Counter N MUX CNTL register - including AXI CHANNEL and AXI PORT. + + - 0: address channel + - 1: data channel + + PMU in DDR subsystem, only one single port0 exists, so axi_port is reserved + which should be 0. + + .. code-block:: bash + + perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd + perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd + + .. note:: + + axi_channel is inverted in userspace, and it will be reverted in driver + automatically. So that users do not need specify axi_channel if want to + monitor data channel from DDR transactions, since data channel is more + meaningful. From 2fe44e7dcb86601f0e12735621ed1113536eb392 Mon Sep 17 00:00:00 2001 From: Xu Yang Date: Mon, 20 Nov 2023 17:33:15 +0800 Subject: [PATCH 19/87] dt-bindings: perf: fsl-imx-ddr: Add i.MX8DXL compatible Add a compatible for i.MX8DXL which is compatile with "fsl,imx8-ddr-pmu". Acked-by: Krzysztof Kozlowski Signed-off-by: Xu Yang Link: https://lore.kernel.org/r/20231120093317.2652866-3-xu.yang_2@nxp.com Signed-off-by: Will Deacon --- Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml b/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml index e9fad4b3de68..6c96a4204e5d 100644 --- a/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml +++ b/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml @@ -27,6 +27,9 @@ properties: - fsl,imx8mq-ddr-pmu - fsl,imx8mp-ddr-pmu - const: fsl,imx8m-ddr-pmu + - items: + - const: fsl,imx8dxl-ddr-pmu + - const: fsl,imx8-ddr-pmu reg: maxItems: 1 From 46fe448ec3b7a10253fcca7fe7e0d8f01a2b38e4 Mon Sep 17 00:00:00 2001 From: Xu Yang Date: Mon, 20 Nov 2023 17:33:16 +0800 Subject: [PATCH 20/87] perf: fsl_imx8_ddr: Add driver support for i.MX8DXL DDR Perf Add driver support for i.MX8DXL DDR Perf, which supports AXI ID PORT CHANNEL filter. Signed-off-by: Xu Yang Reviewed-by: Frank Li Link: https://lore.kernel.org/r/20231120093317.2652866-4-xu.yang_2@nxp.com Signed-off-by: Will Deacon --- drivers/perf/fsl_imx8_ddr_perf.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c index d0eae2d7e64b..7dbfaee372c7 100644 --- a/drivers/perf/fsl_imx8_ddr_perf.c +++ b/drivers/perf/fsl_imx8_ddr_perf.c @@ -92,6 +92,11 @@ static const struct fsl_ddr_devtype_data imx8mp_devtype_data = { .identifier = "i.MX8MP", }; +static const struct fsl_ddr_devtype_data imx8dxl_devtype_data = { + .quirks = DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER, + .identifier = "i.MX8DXL", +}; + static const struct of_device_id imx_ddr_pmu_dt_ids[] = { { .compatible = "fsl,imx8-ddr-pmu", .data = &imx8_devtype_data}, { .compatible = "fsl,imx8m-ddr-pmu", .data = &imx8m_devtype_data}, @@ -99,6 +104,7 @@ static const struct of_device_id imx_ddr_pmu_dt_ids[] = { { .compatible = "fsl,imx8mm-ddr-pmu", .data = &imx8mm_devtype_data}, { .compatible = "fsl,imx8mn-ddr-pmu", .data = &imx8mn_devtype_data}, { .compatible = "fsl,imx8mp-ddr-pmu", .data = &imx8mp_devtype_data}, + { .compatible = "fsl,imx8dxl-ddr-pmu", .data = &imx8dxl_devtype_data}, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, imx_ddr_pmu_dt_ids); From 8fd7588fd4eefe2e474b433f930eba99415ece9e Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 27 Nov 2023 00:10:45 +0900 Subject: [PATCH 21/87] arm64: replace with Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost") deprecated , which is now a wrapper of . Replace #include with #include . Signed-off-by: Masahiro Yamada Link: https://lore.kernel.org/r/20231126151045.1556686-1-masahiroy@kernel.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/assembler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 376a980f2bad..7b1975bf4b90 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -12,7 +12,7 @@ #ifndef __ASM_ASSEMBLER_H #define __ASM_ASSEMBLER_H -#include +#include #include #include From 75b5e0bf90bffaca4b1f19114065dc59f5cc161f Mon Sep 17 00:00:00 2001 From: Huang Shijie Date: Fri, 24 Nov 2023 11:15:13 +0800 Subject: [PATCH 22/87] arm64: irq: set the correct node for VMAP stack In current code, init_irq_stacks() will call cpu_to_node(). The cpu_to_node() depends on percpu "numa_node" which is initialized in: arch_call_rest_init() --> rest_init() -- kernel_init() --> kernel_init_freeable() --> smp_prepare_cpus() But init_irq_stacks() is called in init_IRQ() which is before arch_call_rest_init(). So in init_irq_stacks(), the cpu_to_node() does not work, it always return 0. In NUMA, it makes the node 1 cpu accesses the IRQ stack which is in the node 0. This patch fixes it by: 1.) export the early_cpu_to_node(), and use it in the init_irq_stacks(). 2.) change init_irq_stacks() to __init function. Reviewed-by: Catalin Marinas Signed-off-by: Huang Shijie Link: https://lore.kernel.org/r/20231124031513.81548-1-shijie@os.amperecomputing.com Signed-off-by: Will Deacon --- arch/arm64/kernel/irq.c | 5 +++-- drivers/base/arch_numa.c | 2 +- include/asm-generic/numa.h | 2 ++ 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c index 6ad5c6ef5329..9f253d8efe90 100644 --- a/arch/arm64/kernel/irq.c +++ b/arch/arm64/kernel/irq.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -51,13 +52,13 @@ static void init_irq_scs(void) } #ifdef CONFIG_VMAP_STACK -static void init_irq_stacks(void) +static void __init init_irq_stacks(void) { int cpu; unsigned long *p; for_each_possible_cpu(cpu) { - p = arch_alloc_vmap_stack(IRQ_STACK_SIZE, cpu_to_node(cpu)); + p = arch_alloc_vmap_stack(IRQ_STACK_SIZE, early_cpu_to_node(cpu)); per_cpu(irq_stack_ptr, cpu) = p; } } diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c index eaa31e567d1e..5b59d133b6af 100644 --- a/drivers/base/arch_numa.c +++ b/drivers/base/arch_numa.c @@ -144,7 +144,7 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid) unsigned long __per_cpu_offset[NR_CPUS] __read_mostly; EXPORT_SYMBOL(__per_cpu_offset); -static int __init early_cpu_to_node(int cpu) +int __init early_cpu_to_node(int cpu) { return cpu_to_node_map[cpu]; } diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h index 1a3ad6d29833..c32e0cf23c90 100644 --- a/include/asm-generic/numa.h +++ b/include/asm-generic/numa.h @@ -35,6 +35,7 @@ int __init numa_add_memblk(int nodeid, u64 start, u64 end); void __init numa_set_distance(int from, int to, int distance); void __init numa_free_distance(void); void __init early_map_cpu_to_node(unsigned int cpu, int nid); +int __init early_cpu_to_node(int cpu); void numa_store_cpu_info(unsigned int cpu); void numa_add_cpu(unsigned int cpu); void numa_remove_cpu(unsigned int cpu); @@ -46,6 +47,7 @@ static inline void numa_add_cpu(unsigned int cpu) { } static inline void numa_remove_cpu(unsigned int cpu) { } static inline void arch_numa_init(void) { } static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { } +static inline int early_cpu_to_node(int cpu) { return 0; } #endif /* CONFIG_NUMA */ From 365b1900c93a6a4860fce8c429e2d8b6e154a107 Mon Sep 17 00:00:00 2001 From: Tsung-Han Lin Date: Sun, 3 Dec 2023 09:18:04 +0800 Subject: [PATCH 23/87] Documentation/arch/arm64: Fix typo Should be 'if' here. Signed-off-by: Tsung-Han Lin Reviewed-by: Bagas Sanjaya Link: https://lore.kernel.org/r/20231203011804.27694-1-tsunghan.tw@gmail.com Signed-off-by: Will Deacon --- Documentation/arch/arm64/arm-acpi.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/arch/arm64/arm-acpi.rst b/Documentation/arch/arm64/arm-acpi.rst index a46c34fa9604..e59e4505d0d9 100644 --- a/Documentation/arch/arm64/arm-acpi.rst +++ b/Documentation/arch/arm64/arm-acpi.rst @@ -130,7 +130,7 @@ When an Arm system boots, it can either have DT information, ACPI tables, or in some very unusual cases, both. If no command line parameters are used, the kernel will try to use DT for device enumeration; if there is no DT present, the kernel will try to use ACPI tables, but only if they are present. -In neither is available, the kernel will not boot. If acpi=force is used +If neither is available, the kernel will not boot. If acpi=force is used on the command line, the kernel will attempt to use ACPI tables first, but fall back to DT if there are no ACPI tables present. The basic idea is that the kernel will not fail to boot unless it absolutely has no other choice. From 8885c7398fe56c49f14be6ce0ac202385f3cd818 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 27 Nov 2023 13:00:52 +0100 Subject: [PATCH 24/87] arm64: mm: Only map KPTI trampoline if it is going to be used Avoid creating the fixmap entries for the KPTI trampoline if KPTI is not in use. Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20231127120049.2258650-7-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/mm/mmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 15f6347d23b6..6c8078916f5e 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -674,6 +674,9 @@ static int __init map_entry_trampoline(void) { int i; + if (!arm64_kernel_unmapped_at_el0()) + return 0; + pgprot_t prot = kernel_exec_prot(); phys_addr_t pa_start = __pa_symbol(__entry_tramp_text_start); From 7540f70df98f5c46feb5fe2257f93f543e5821e5 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 27 Nov 2023 13:00:53 +0100 Subject: [PATCH 25/87] arm64: Kconfig: drop KAISER reference from KPTI option description KAISER is a reference to the KASLR hardening technique that already existed before Meltdown happened, and by now, it is sufficiently obscure that mentioning it does not actually clarify anything. So remove this reference, and replace it with KPTI. Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20231127120049.2258650-8-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b071a00425d..b67e6934316f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1549,7 +1549,7 @@ config ARCH_FORCE_MAX_ORDER Don't change if unsure. config UNMAP_KERNEL_AT_EL0 - bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT + bool "Unmap kernel when running in userspace (KPTI)" if EXPERT default y help Speculation attacks against some high-performance processors can From 1beef60e7d6b90be826a73f626204f55a4cd7640 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 24 Nov 2023 11:05:10 +0000 Subject: [PATCH 26/87] arm64: stacktrace: factor out kernel unwind state On arm64 we share some unwinding code between the regular kernel unwinder and the KVM hyp unwinder. Some of this common code only matters to the regular unwinder, e.g. the `kr_cur` and `task` fields in the common struct unwind_state. We're likely to add more state which only matters for regular kernel unwinding (or only for hyp unwinding). In preparation for such changes, this patch factors out the kernel-specific state into a new struct kunwind_state, and updates the kernel unwind code accordingly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Kalesh Singh Cc: Madhavan T. Venkataraman Cc: Mark Brown Cc: Puranjay Mohan Cc: Will Deacon Reviewed-by: Puranjay Mohan Reviewed-by: Kalesh Singh Reviewed-by: Madhavan T. Venkataraman Reviewed-by: Mark Brown Link: https://lore.kernel.org/r/20231124110511.2795958-2-mark.rutland@arm.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/stacktrace/common.h | 19 +--- arch/arm64/include/asm/stacktrace/nvhe.h | 2 +- arch/arm64/kernel/stacktrace.c | 113 +++++++++++++-------- 3 files changed, 74 insertions(+), 60 deletions(-) diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h index 508f734de46e..f63dc654e545 100644 --- a/arch/arm64/include/asm/stacktrace/common.h +++ b/arch/arm64/include/asm/stacktrace/common.h @@ -9,7 +9,6 @@ #ifndef __ASM_STACKTRACE_COMMON_H #define __ASM_STACKTRACE_COMMON_H -#include #include struct stack_info { @@ -23,12 +22,6 @@ struct stack_info { * @fp: The fp value in the frame record (or the real fp) * @pc: The lr value in the frame record (or the real lr) * - * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance - * associated with the most recently encountered replacement lr - * value. - * - * @task: The task being unwound. - * * @stack: The stack currently being unwound. * @stacks: An array of stacks which can be unwound. * @nr_stacks: The number of stacks in @stacks. @@ -36,10 +29,6 @@ struct stack_info { struct unwind_state { unsigned long fp; unsigned long pc; -#ifdef CONFIG_KRETPROBES - struct llist_node *kr_cur; -#endif - struct task_struct *task; struct stack_info stack; struct stack_info *stacks; @@ -66,14 +55,8 @@ static inline bool stackinfo_on_stack(const struct stack_info *info, return true; } -static inline void unwind_init_common(struct unwind_state *state, - struct task_struct *task) +static inline void unwind_init_common(struct unwind_state *state) { - state->task = task; -#ifdef CONFIG_KRETPROBES - state->kr_cur = NULL; -#endif - state->stack = stackinfo_get_unknown(); } diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h index 25ab83a315a7..44759281d0d4 100644 --- a/arch/arm64/include/asm/stacktrace/nvhe.h +++ b/arch/arm64/include/asm/stacktrace/nvhe.h @@ -31,7 +31,7 @@ static inline void kvm_nvhe_unwind_init(struct unwind_state *state, unsigned long fp, unsigned long pc) { - unwind_init_common(state, NULL); + unwind_init_common(state); state->fp = fp; state->pc = pc; diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index 17f66a74c745..aafc89192787 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -18,6 +19,31 @@ #include #include +/* + * Kernel unwind state + * + * @common: Common unwind state. + * @task: The task being unwound. + * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance + * associated with the most recently encountered replacement lr + * value. + */ +struct kunwind_state { + struct unwind_state common; + struct task_struct *task; +#ifdef CONFIG_KRETPROBES + struct llist_node *kr_cur; +#endif +}; + +static __always_inline void +kunwind_init(struct kunwind_state *state, + struct task_struct *task) +{ + unwind_init_common(&state->common); + state->task = task; +} + /* * Start an unwind from a pt_regs. * @@ -26,13 +52,13 @@ * The regs must be on a stack currently owned by the calling task. */ static __always_inline void -unwind_init_from_regs(struct unwind_state *state, - struct pt_regs *regs) +kunwind_init_from_regs(struct kunwind_state *state, + struct pt_regs *regs) { - unwind_init_common(state, current); + kunwind_init(state, current); - state->fp = regs->regs[29]; - state->pc = regs->pc; + state->common.fp = regs->regs[29]; + state->common.pc = regs->pc; } /* @@ -44,12 +70,12 @@ unwind_init_from_regs(struct unwind_state *state, * The function which invokes this must be noinline. */ static __always_inline void -unwind_init_from_caller(struct unwind_state *state) +kunwind_init_from_caller(struct kunwind_state *state) { - unwind_init_common(state, current); + kunwind_init(state, current); - state->fp = (unsigned long)__builtin_frame_address(1); - state->pc = (unsigned long)__builtin_return_address(0); + state->common.fp = (unsigned long)__builtin_frame_address(1); + state->common.pc = (unsigned long)__builtin_return_address(0); } /* @@ -63,35 +89,38 @@ unwind_init_from_caller(struct unwind_state *state) * call this for the current task. */ static __always_inline void -unwind_init_from_task(struct unwind_state *state, - struct task_struct *task) +kunwind_init_from_task(struct kunwind_state *state, + struct task_struct *task) { - unwind_init_common(state, task); + kunwind_init(state, task); - state->fp = thread_saved_fp(task); - state->pc = thread_saved_pc(task); + state->common.fp = thread_saved_fp(task); + state->common.pc = thread_saved_pc(task); } static __always_inline int -unwind_recover_return_address(struct unwind_state *state) +kunwind_recover_return_address(struct kunwind_state *state) { #ifdef CONFIG_FUNCTION_GRAPH_TRACER if (state->task->ret_stack && - (state->pc == (unsigned long)return_to_handler)) { + (state->common.pc == (unsigned long)return_to_handler)) { unsigned long orig_pc; - orig_pc = ftrace_graph_ret_addr(state->task, NULL, state->pc, - (void *)state->fp); - if (WARN_ON_ONCE(state->pc == orig_pc)) + orig_pc = ftrace_graph_ret_addr(state->task, NULL, + state->common.pc, + (void *)state->common.fp); + if (WARN_ON_ONCE(state->common.pc == orig_pc)) return -EINVAL; - state->pc = orig_pc; + state->common.pc = orig_pc; } #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ #ifdef CONFIG_KRETPROBES - if (is_kretprobe_trampoline(state->pc)) { - state->pc = kretprobe_find_ret_addr(state->task, - (void *)state->fp, - &state->kr_cur); + if (is_kretprobe_trampoline(state->common.pc)) { + unsigned long orig_pc; + orig_pc = kretprobe_find_ret_addr(state->task, + (void *)state->common.fp, + &state->kr_cur); + state->common.pc = orig_pc; } #endif /* CONFIG_KRETPROBES */ @@ -106,38 +135,38 @@ unwind_recover_return_address(struct unwind_state *state) * and the location (but not the fp value) of B. */ static __always_inline int -unwind_next(struct unwind_state *state) +kunwind_next(struct kunwind_state *state) { struct task_struct *tsk = state->task; - unsigned long fp = state->fp; + unsigned long fp = state->common.fp; int err; /* Final frame; nothing to unwind */ if (fp == (unsigned long)task_pt_regs(tsk)->stackframe) return -ENOENT; - err = unwind_next_frame_record(state); + err = unwind_next_frame_record(&state->common); if (err) return err; - state->pc = ptrauth_strip_kernel_insn_pac(state->pc); + state->common.pc = ptrauth_strip_kernel_insn_pac(state->common.pc); - return unwind_recover_return_address(state); + return kunwind_recover_return_address(state); } static __always_inline void -unwind(struct unwind_state *state, stack_trace_consume_fn consume_entry, - void *cookie) +do_kunwind(struct kunwind_state *state, stack_trace_consume_fn consume_entry, + void *cookie) { - if (unwind_recover_return_address(state)) + if (kunwind_recover_return_address(state)) return; while (1) { int ret; - if (!consume_entry(cookie, state->pc)) + if (!consume_entry(cookie, state->common.pc)) break; - ret = unwind_next(state); + ret = kunwind_next(state); if (ret < 0) break; } @@ -190,22 +219,24 @@ noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, STACKINFO_EFI, #endif }; - struct unwind_state state = { - .stacks = stacks, - .nr_stacks = ARRAY_SIZE(stacks), + struct kunwind_state state = { + .common = { + .stacks = stacks, + .nr_stacks = ARRAY_SIZE(stacks), + }, }; if (regs) { if (task != current) return; - unwind_init_from_regs(&state, regs); + kunwind_init_from_regs(&state, regs); } else if (task == current) { - unwind_init_from_caller(&state); + kunwind_init_from_caller(&state); } else { - unwind_init_from_task(&state, task); + kunwind_init_from_task(&state, task); } - unwind(&state, consume_entry, cookie); + do_kunwind(&state, consume_entry, cookie); } static bool dump_backtrace_entry(void *arg, unsigned long where) From 1aba06e7b2b49c1a0e905fcaf9c56d70164fbd8d Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 24 Nov 2023 11:05:11 +0000 Subject: [PATCH 27/87] arm64: stacktrace: factor out kunwind_stack_walk() Currently arm64 uses the generic arch_stack_walk() interface for all stack walking code. This only passes a PC value and cookie to the unwind callback, whereas we'd like to pass some additional information in some cases. For example, the BPF exception unwinder wants the FP, for reliable stacktrace we'll want to perform additional checks on other portions of unwind state, and we'd like to expand the information printed by dump_backtrace() to include provenance and reliability information. As preparation for all of the above, this patch factors the core unwind logic out of arch_stack_walk() and into a new kunwind_stack_walk() function which provides all of the unwind state to a callback function. The existing arch_stack_walk() interface is implemented atop this. The kunwind_stack_walk() function is intended to be a private implementation detail of unwinders in stacktrace.c, and not something to be exported generally to kernel code. It is __always_inline'd into its caller so that neither it or its caller appear in stactraces (which is the existing/required behavior for arch_stack_walk() and friends) and so that the compiler can optimize away some of the indirection. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Kalesh Singh Cc: Madhavan T. Venkataraman Cc: Mark Brown Cc: Puranjay Mohan Cc: Will Deacon Reviewed-by: Kalesh Singh Reviewed-by: Puranjay Mohan Reviewed-by: Madhavan T. Venkataraman Reviewed-by: Mark Brown Link: https://lore.kernel.org/r/20231124110511.2795958-3-mark.rutland@arm.com Signed-off-by: Will Deacon --- arch/arm64/kernel/stacktrace.c | 39 ++++++++++++++++++++++++++++------ 1 file changed, 33 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index aafc89192787..7f88028a00c0 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -154,8 +154,10 @@ kunwind_next(struct kunwind_state *state) return kunwind_recover_return_address(state); } +typedef bool (*kunwind_consume_fn)(const struct kunwind_state *state, void *cookie); + static __always_inline void -do_kunwind(struct kunwind_state *state, stack_trace_consume_fn consume_entry, +do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state, void *cookie) { if (kunwind_recover_return_address(state)) @@ -164,7 +166,7 @@ do_kunwind(struct kunwind_state *state, stack_trace_consume_fn consume_entry, while (1) { int ret; - if (!consume_entry(cookie, state->common.pc)) + if (!consume_state(state, cookie)) break; ret = kunwind_next(state); if (ret < 0) @@ -201,9 +203,10 @@ do_kunwind(struct kunwind_state *state, stack_trace_consume_fn consume_entry, : stackinfo_get_unknown(); \ }) -noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, - void *cookie, struct task_struct *task, - struct pt_regs *regs) +static __always_inline void +kunwind_stack_walk(kunwind_consume_fn consume_state, + void *cookie, struct task_struct *task, + struct pt_regs *regs) { struct stack_info stacks[] = { stackinfo_get_task(task), @@ -236,7 +239,31 @@ noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, kunwind_init_from_task(&state, task); } - do_kunwind(&state, consume_entry, cookie); + do_kunwind(&state, consume_state, cookie); +} + +struct kunwind_consume_entry_data { + stack_trace_consume_fn consume_entry; + void *cookie; +}; + +static bool +arch_kunwind_consume_entry(const struct kunwind_state *state, void *cookie) +{ + struct kunwind_consume_entry_data *data = cookie; + return data->consume_entry(data->cookie, state->common.pc); +} + +noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, + void *cookie, struct task_struct *task, + struct pt_regs *regs) +{ + struct kunwind_consume_entry_data data = { + .consume_entry = consume_entry, + .cookie = cookie, + }; + + kunwind_stack_walk(arch_kunwind_consume_entry, &data, task, regs); } static bool dump_backtrace_entry(void *arg, unsigned long where) From 33c1a7785a41216ec44d0fadc1890103e2db88b0 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Fri, 24 Nov 2023 15:22:21 +0000 Subject: [PATCH 28/87] kselftest/arm64: Improve output for skipped TPIDR2 ABI test When TPIDR2 is not supported the tpidr2 ABI test prints the same message for each skipped test: ok 1 skipped, TPIDR2 not supported which isn't ideal for test automation software since it tracks kselftest results based on the string used to describe the test. This is also not standard KTAP output, the expected format is: ok 1 # SKIP default_value Updated the program to generate this, using the same set of test names that we would run if the test actually executed. Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20231124-kselftest-arm64-tpidr2-skip-v1-1-e05d0ccef101@kernel.org Signed-off-by: Will Deacon --- tools/testing/selftests/arm64/abi/tpidr2.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/arm64/abi/tpidr2.c b/tools/testing/selftests/arm64/abi/tpidr2.c index 351a098b503a..02ee3a91b780 100644 --- a/tools/testing/selftests/arm64/abi/tpidr2.c +++ b/tools/testing/selftests/arm64/abi/tpidr2.c @@ -254,6 +254,12 @@ static int write_clone_read(void) putnum(++tests_run); \ putstr(" " #name "\n"); +#define skip_test(name) \ + tests_skipped++; \ + putstr("ok "); \ + putnum(++tests_run); \ + putstr(" # SKIP " #name "\n"); + int main(int argc, char **argv) { int ret, i; @@ -283,13 +289,11 @@ int main(int argc, char **argv) } else { putstr("# SME support not present\n"); - for (i = 0; i < EXPECTED_TESTS; i++) { - putstr("ok "); - putnum(i); - putstr(" skipped, TPIDR2 not supported\n"); - } - - tests_skipped += EXPECTED_TESTS; + skip_test(default_value); + skip_test(write_read); + skip_test(write_sleep_read); + skip_test(write_fork_read); + skip_test(write_clone_read); } print_summary(); From 48f7ab21f731f5a02ec34a4b83ae88c4a41d6a10 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Tue, 5 Dec 2023 14:24:44 +0000 Subject: [PATCH 29/87] kselftest/arm64: Log SVCR when the SME tests barf On failure we log the actual and expected value of the register we detect a mismatch in. For SME one obvious potential source of corruption would be if we had corrupted SVCR since changes in streaming mode will reset the register values, log the value to aid in understanding issues. Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20231205-arm64-kselftest-log-svcr-v1-1-b77abd9ee7f3@kernel.org Signed-off-by: Will Deacon --- tools/testing/selftests/arm64/fp/sve-test.S | 10 ++++++++++ tools/testing/selftests/arm64/fp/za-test.S | 6 ++++++ tools/testing/selftests/arm64/fp/zt-test.S | 5 +++++ 3 files changed, 21 insertions(+) diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S index 547d077e3517..fff60e2a25ad 100644 --- a/tools/testing/selftests/arm64/fp/sve-test.S +++ b/tools/testing/selftests/arm64/fp/sve-test.S @@ -515,6 +515,10 @@ function barf mov x11, x1 // actual data mov x12, x2 // data size +#ifdef SSVE + mrs x13, S3_3_C4_C2_2 +#endif + puts "Mismatch: PID=" mov x0, x20 bl putdec @@ -534,6 +538,12 @@ function barf bl dumphex puts "]\n" +#ifdef SSVE + puts "\tSVCR: " + mov x0, x13 + bl putdecn +#endif + mov x8, #__NR_getpid svc #0 // fpsimd.c acitivty log dump hack diff --git a/tools/testing/selftests/arm64/fp/za-test.S b/tools/testing/selftests/arm64/fp/za-test.S index 9dcd70911397..095b45531640 100644 --- a/tools/testing/selftests/arm64/fp/za-test.S +++ b/tools/testing/selftests/arm64/fp/za-test.S @@ -333,6 +333,9 @@ function barf // mov w8, #__NR_exit // svc #0 // end hack + + mrs x13, S3_3_C4_C2_2 + smstop mov x10, x0 // expected data mov x11, x1 // actual data @@ -356,6 +359,9 @@ function barf mov x1, x12 bl dumphex puts "]\n" + puts "\tSVCR: " + mov x0, x13 + bl putdecn mov x8, #__NR_getpid svc #0 diff --git a/tools/testing/selftests/arm64/fp/zt-test.S b/tools/testing/selftests/arm64/fp/zt-test.S index d63286397638..b5c81e81a379 100644 --- a/tools/testing/selftests/arm64/fp/zt-test.S +++ b/tools/testing/selftests/arm64/fp/zt-test.S @@ -267,6 +267,8 @@ function barf // mov w8, #__NR_exit // svc #0 // end hack + + mrs x13, S3_3_C4_C2_2 smstop mov x10, x0 // expected data mov x11, x1 // actual data @@ -287,6 +289,9 @@ function barf mov x1, x12 bl dumphex puts "]\n" + puts "\tSVCR: " + mov x0, x13 + bl putdecn mov x8, #__NR_getpid svc #0 From 86d1921c9d5a25ff057284f1208e731145e24508 Mon Sep 17 00:00:00 2001 From: Zenghui Yu Date: Wed, 6 Dec 2023 00:01:40 +0800 Subject: [PATCH 30/87] arm64: Delete the zero_za macro zero_za was introduced in commit ca8a4ebcff44 ("arm64/sme: Manually encode SME instructions") but doesn't appear to have any in kernel user. Drop it. Signed-off-by: Zenghui Yu Acked-by: Mark Brown Link: https://lore.kernel.org/r/20231205160140.1438-1-yuzenghui@huawei.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/fpsimdmacros.h | 8 -------- 1 file changed, 8 deletions(-) diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index cdf6a35e3994..cda81d009c9b 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -242,14 +242,6 @@ | (\nx << 5) .endm -/* - * Zero the entire ZA array - * ZERO ZA - */ -.macro zero_za - .inst 0xc00800ff -.endm - .macro __for from:req, to:req .if (\from) == (\to) _for__body %\from From 256f442895ed9846bddf020d95c112de830c336c Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Sat, 9 Dec 2023 01:02:47 +0000 Subject: [PATCH 31/87] arm64/sysreg: Update HFGITR_EL2 definiton to DDI0601 2023-09 The 2023-09 release of the architecture XML (DDI0601) adds a new field ATS1E1A to HFGITR_EL2, update our definition of the register to match. This was extracted from Faud Tabba's patch "KVM: arm64: Add latest HFGITR_EL2 FGT entries to nested virt" [Extracted the sysreg definition from Faud's original patch and reword subject to match -- broonie] Signed-off-by: Fuad Tabba Message-Id: <20231206100503.564090-4-tabba@google.com> Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-1-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 96cbeeab4eec..8faeab1ee024 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2102,7 +2102,9 @@ Fields HFGxTR_EL2 EndSysreg Sysreg HFGITR_EL2 3 4 1 1 6 -Res0 63:61 +Res0 63 +Field 62 ATS1E1A +Res0 61 Field 60 COSPRCTX Field 59 nGCSEPP Field 58 nGCSSTR_EL1 From 41bb68fbd016f0735798348dee2034f35cc06a17 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Sat, 9 Dec 2023 01:02:48 +0000 Subject: [PATCH 32/87] arm64/sysreg: Add definition for HAFGRTR_EL2 Add a definition of HAFGRTR_EL2 (fine grained trap control for the AMU) as per DDI0601 2023-09. This was extracted from Fuad Tabba's patch "KVM: arm64: Handle HAFGRTR_EL2 trapping in nested virt". Signed-off-by: Fuad Tabba Link: https://lore.kernel.org/r/20231206100503.564090-6-tabba@google.com [Extract sysreg update and rewrite commit message -- broonie] Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-2-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 43 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 8faeab1ee024..145b33f75a96 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2297,6 +2297,49 @@ Field 1 DBGBVRn_EL1 Field 0 DBGBCRn_EL1 EndSysreg +Sysreg HAFGRTR_EL2 3 4 3 1 6 +Res0 63:50 +Field 49 AMEVTYPER115_EL0 +Field 48 AMEVCNTR115_EL0 +Field 47 AMEVTYPER114_EL0 +Field 46 AMEVCNTR114_EL0 +Field 45 AMEVTYPER113_EL0 +Field 44 AMEVCNTR113_EL0 +Field 43 AMEVTYPER112_EL0 +Field 42 AMEVCNTR112_EL0 +Field 41 AMEVTYPER111_EL0 +Field 40 AMEVCNTR111_EL0 +Field 39 AMEVTYPER110_EL0 +Field 38 AMEVCNTR110_EL0 +Field 37 AMEVTYPER19_EL0 +Field 36 AMEVCNTR19_EL0 +Field 35 AMEVTYPER18_EL0 +Field 34 AMEVCNTR18_EL0 +Field 33 AMEVTYPER17_EL0 +Field 32 AMEVCNTR17_EL0 +Field 31 AMEVTYPER16_EL0 +Field 30 AMEVCNTR16_EL0 +Field 29 AMEVTYPER15_EL0 +Field 28 AMEVCNTR15_EL0 +Field 27 AMEVTYPER14_EL0 +Field 26 AMEVCNTR14_EL0 +Field 25 AMEVTYPER13_EL0 +Field 24 AMEVCNTR13_EL0 +Field 23 AMEVTYPER12_EL0 +Field 22 AMEVCNTR12_EL0 +Field 21 AMEVTYPER11_EL0 +Field 20 AMEVCNTR11_EL0 +Field 19 AMEVTYPER10_EL0 +Field 18 AMEVCNTR10_EL0 +Field 17 AMCNTEN1 +Res0 16:5 +Field 4 AMEVCNTR03_EL0 +Field 3 AMEVCNTR02_EL0 +Field 2 AMEVCNTR01_EL0 +Field 1 AMEVCNTR00_EL0 +Field 0 AMCNTEN0 +EndSysreg + Sysreg ZCR_EL2 3 4 1 2 0 Fields ZCR_ELx EndSysreg From c0c5a8ea96b877e903b40ed2345e73f83b0ed612 Mon Sep 17 00:00:00 2001 From: Joey Gouly Date: Sat, 9 Dec 2023 01:02:49 +0000 Subject: [PATCH 33/87] arm64/sysreg: add system register POR_EL{0,1} Add POR_EL{0,1} according to DDI0601 2023-03. Signed-off-by: Joey Gouly Reviewed-by: Mark Brown Acked-by: Catalin Marinas Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-3-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/sysreg.h | 13 +++++++++++++ arch/arm64/tools/sysreg | 12 ++++++++++++ 2 files changed, 25 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 5e65f51c10d2..9c2caf0efdc7 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -1039,6 +1039,19 @@ #define PIRx_ELx_PERM(idx, perm) ((perm) << ((idx) * 4)) +/* + * Permission Overlay Extension (POE) permission encodings. + */ +#define POE_NONE UL(0x0) +#define POE_R UL(0x1) +#define POE_X UL(0x2) +#define POE_RX UL(0x3) +#define POE_W UL(0x4) +#define POE_RW UL(0x5) +#define POE_XW UL(0x6) +#define POE_RXW UL(0x7) +#define POE_MASK UL(0xf) + #define ARM64_FEATURE_FIELD_BITS 4 /* Defined for compatibility only, do not add new users. */ diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 145b33f75a96..1d371a24da6e 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2555,6 +2555,18 @@ Sysreg PIR_EL2 3 4 10 2 3 Fields PIRx_ELx EndSysreg +Sysreg POR_EL0 3 3 10 2 4 +Fields PIRx_ELx +EndSysreg + +Sysreg POR_EL1 3 0 10 2 4 +Fields PIRx_ELx +EndSysreg + +Sysreg POR_EL12 3 5 10 2 4 +Fields PIRx_ELx +EndSysreg + Sysreg LORSA_EL1 3 0 10 4 0 Res0 63:52 Field 51:16 SA From 35768b23d830302c9b818a213a9c1e5efb618218 Mon Sep 17 00:00:00 2001 From: Joey Gouly Date: Sat, 9 Dec 2023 01:02:50 +0000 Subject: [PATCH 34/87] arm64/sysreg: update CPACR_EL1 register Add E0POE bit that traps accesses to POR_EL0 from EL0. Updated according to DDI0601 2023-03. Signed-off-by: Joey Gouly Reviewed-by: Mark Brown Acked-by: Catalin Marinas Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-4-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 1d371a24da6e..c2dbbaa22620 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1747,7 +1747,8 @@ Field 0 M EndSysreg SysregFields CPACR_ELx -Res0 63:29 +Res0 63:30 +Field 29 E0POE Field 28 TTA Res0 27:26 Field 25:24 SMEN From 9fb5dc53a1176402905a7dde6cd812bc01ce6831 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:51 +0000 Subject: [PATCH 35/87] arm64/sysreg: Add definition for ID_AA64PFR2_EL1 DDI0601 2023-09 defines a new system register ID_AA64PFR2_EL1 which enumerates FPMR and some new MTE features. Add a definition of this register. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-5-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index c2dbbaa22620..c48a3b8d00ad 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1002,6 +1002,27 @@ UnsignedEnum 3:0 BT EndEnum EndSysreg +Sysreg ID_AA64PFR2_EL1 3 0 0 4 2 +Res0 63:36 +UnsignedEnum 35:32 FPMR + 0b0000 NI + 0b0001 IMP +EndEnum +Res0 31:12 +UnsignedEnum 11:8 MTEFAR + 0b0000 NI + 0b0001 IMP +EndEnum +UnsignedEnum 7:4 MTESTOREONLY + 0b0000 NI + 0b0001 IMP +EndEnum +UnsignedEnum 3:0 MTEPERM + 0b0000 NI + 0b0001 IMP +EndEnum +EndSysreg + Sysreg ID_AA64ZFR0_EL1 3 0 0 4 4 Res0 63:60 UnsignedEnum 59:56 F64MM From 6e3dcfd139755d95f2c0d1f865f2e093d2b35c91 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:52 +0000 Subject: [PATCH 36/87] arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09 DDI0601 2023-09 defines some new fields in previously RES0 space in ID_AA64ISAR2_EL1, together with one new enum value. Update the system register definition to reflect this. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-6-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index c48a3b8d00ad..7af081c52ce7 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1365,7 +1365,14 @@ EndEnum EndSysreg Sysreg ID_AA64ISAR2_EL1 3 0 0 6 2 -Res0 63:56 +UnsignedEnum 63:60 ATS1A + 0b0000 NI + 0b0001 IMP +EndEnum +UnsignedEnum 59:56 LUT + 0b0000 NI + 0b0001 IMP +EndEnum UnsignedEnum 55:52 CSSC 0b0000 NI 0b0001 IMP @@ -1374,7 +1381,19 @@ UnsignedEnum 51:48 RPRFM 0b0000 NI 0b0001 IMP EndEnum -Res0 47:32 +Res0 47:44 +UnsignedEnum 43:40 PRFMSLC + 0b0000 NI + 0b0001 IMP +EndEnum +UnsignedEnum 39:36 SYSINSTR_128 + 0b0000 NI + 0b0001 IMP +EndEnum +UnsignedEnum 35:32 SYSREG_128 + 0b0000 NI + 0b0001 IMP +EndEnum UnsignedEnum 31:28 CLRBHB 0b0000 NI 0b0001 IMP @@ -1398,6 +1417,7 @@ UnsignedEnum 15:12 APA3 0b0011 PAuth2 0b0100 FPAC 0b0101 FPACCOMBINE + 0b0110 PAuth_LR EndEnum UnsignedEnum 11:8 GPA3 0b0000 NI From b5aefb668701c2b019011f6f8fe29814b8529ecd Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:53 +0000 Subject: [PATCH 37/87] arm64/sysreg: Add definition for ID_AA64ISAR3_EL1 DDI0601 2023-09 adds a new system register ID_AA64ISAR3_EL1 enumerating new floating point and TLB invalidation features. Add a defintion for it. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-7-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 7af081c52ce7..7dc7c76ee4ce 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1433,6 +1433,23 @@ UnsignedEnum 3:0 WFxT EndEnum EndSysreg +Sysreg ID_AA64ISAR3_EL1 3 0 0 6 3 +Res0 63:12 +UnsignedEnum 11:8 TLBIW + 0b0000 NI + 0b0001 IMP +EndEnum +UnsignedEnum 7:4 FAMINMAX + 0b0000 NI + 0b0001 IMP +EndEnum +UnsignedEnum 3:0 CPA + 0b0000 NI + 0b0001 IMP + 0b0010 CPA2 +EndEnum +EndSysreg + Sysreg ID_AA64MMFR0_EL1 3 0 0 7 0 UnsignedEnum 63:60 ECV 0b0000 NI From 9e4f409b07df14443ae4840f17f07e5025435e3d Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:54 +0000 Subject: [PATCH 38/87] arm64/sysreg: Add definition for ID_AA64FPFR0_EL1 DDI0601 2023-09 defines a new feature register ID_AA64FPFR0_EL1 which enumerates a number of FP8 related features. Add a definition for it. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-8-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 7dc7c76ee4ce..167906adae40 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1131,6 +1131,35 @@ EndEnum Res0 31:0 EndSysreg +Sysreg ID_AA64FPFR0_EL1 3 0 0 4 7 +Res0 63:32 +UnsignedEnum 31 F8CVT + 0b0 NI + 0b1 IMP +EndEnum +UnsignedEnum 30 F8FMA + 0b0 NI + 0b1 IMP +EndEnum +UnsignedEnum 29 F8DP4 + 0b0 NI + 0b1 IMP +EndEnum +UnsignedEnum 28 F8DP2 + 0b0 NI + 0b1 IMP +EndEnum +Res0 27:2 +UnsignedEnum 1 F8E4M3 + 0b0 NI + 0b1 IMP +EndEnum +UnsignedEnum 0 F8E5M2 + 0b0 NI + 0b1 IMP +EndEnum +EndSysreg + Sysreg ID_AA64DFR0_EL1 3 0 0 5 0 Enum 63:60 HPMN0 0b0000 UNPREDICTABLE From 8afe582d77000ad244b66ed278aedc4ab5ee1634 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:55 +0000 Subject: [PATCH 39/87] arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09 The 2023-09 release of DDI0601 defines a number of new feature enumeration fields in ID_AA64SMFR0_EL1. Add these fields. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-9-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 167906adae40..191b59db79d8 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1079,7 +1079,11 @@ UnsignedEnum 63 FA64 0b0 NI 0b1 IMP EndEnum -Res0 62:60 +Res0 62:61 +UnsignedEnum 60 LUTv2 + 0b0 NI + 0b1 IMP +EndEnum UnsignedEnum 59:56 SMEver 0b0000 SME 0b0001 SME2 @@ -1107,7 +1111,14 @@ UnsignedEnum 42 F16F16 0b0 NI 0b1 IMP EndEnum -Res0 41:40 +UnsignedEnum 41 F8F16 + 0b0 NI + 0b1 IMP +EndEnum +UnsignedEnum 40 F8F32 + 0b0 NI + 0b1 IMP +EndEnum UnsignedEnum 39:36 I8I32 0b0000 NI 0b1111 IMP @@ -1128,7 +1139,20 @@ UnsignedEnum 32 F32F32 0b0 NI 0b1 IMP EndEnum -Res0 31:0 +Res0 31 +UnsignedEnum 30 SF8FMA + 0b0 NI + 0b1 IMP +EndEnum +UnsignedEnum 29 SF8DP4 + 0b0 NI + 0b1 IMP +EndEnum +UnsignedEnum 28 SF8DP2 + 0b0 NI + 0b1 IMP +EndEnum +Res0 27:0 EndSysreg Sysreg ID_AA64FPFR0_EL1 3 0 0 4 7 From a6052284a9f9bcfb982edfe00044ecdfdf72eaa7 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:56 +0000 Subject: [PATCH 40/87] arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09 DDI0601 2023-09 defines some new fields in SCTLR_EL1 controlling new MTE and floating point features. Update our sysreg definition to reflect these. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-10-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 191b59db79d8..48df1ffafe21 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1791,7 +1791,8 @@ Field 63 TIDCP Field 62 SPINTMASK Field 61 NMI Field 60 EnTP2 -Res0 59:58 +Field 59 TCSO +Field 58 TCSO0 Field 57 EPAN Field 56 EnALS Field 55 EnAS0 @@ -1820,7 +1821,7 @@ EndEnum Field 37 ITFSB Field 36 BT1 Field 35 BT0 -Res0 34 +Field 34 EnFPM Field 33 MSCEn Field 32 CMOW Field 31 EnIA From 126cb3a60d35cc2ce7db090b087e00ff85b12cfc Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:57 +0000 Subject: [PATCH 41/87] arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09 DDI0601 2023-09 defines new fields in HCRX_EL2 controlling access to new system registers, update our definition of HCRX_EL2 to reflect this. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-11-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 48df1ffafe21..9b405c999adf 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2458,7 +2458,9 @@ Fields ZCR_ELx EndSysreg Sysreg HCRX_EL2 3 4 1 2 2 -Res0 63:23 +Res0 63:25 +Field 24 PACMEn +Field 23 EnFPM Field 22 GCSEn Field 21 EnIDCP128 Field 20 EnSDERR From e3a649ecf8b9253cb1d05ceb085544472b06446f Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:58 +0000 Subject: [PATCH 42/87] arm64/sysreg: Add definition for FPMR DDI0601 2023-09 defines a new sysrem register FPMR (Floating Point Mode Register) which configures the new FP8 features. Add a definition of this register. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-12-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 9b405c999adf..2698dcd49765 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2139,6 +2139,29 @@ Field 1 ZA Field 0 SM EndSysreg +Sysreg FPMR 3 3 4 4 2 +Res0 63:38 +Field 37:32 LSCALE2 +Field 31:24 NSCALE +Res0 23 +Field 22:16 LSCALE +Field 15 OSC +Field 14 OSM +Res0 13:9 +UnsignedEnum 8:6 F8D + 0b000 E5M2 + 0b001 E4M3 +EndEnum +UnsignedEnum 5:3 F8S2 + 0b000 E5M2 + 0b001 E4M3 +EndEnum +UnsignedEnum 2:0 F8S1 + 0b000 E5M2 + 0b001 E4M3 +EndEnum +EndSysreg + SysregFields HFGxTR_EL2 Field 63 nAMAIR2_EL1 Field 62 nMAIR2_EL1 From e94e06d8a7960fd840ea92021ca1bf1362ea67f8 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sat, 9 Dec 2023 01:02:59 +0000 Subject: [PATCH 43/87] arm64/sysreg: Add new system registers for GCS FEAT_GCS introduces a number of new system registers. Add the registers available up to EL2 to sysreg as per DDI0601 2022-12. Signed-off-by: Mark Brown Reviewed-by: Fuad Tabba Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-13-45284e538474@kernel.org Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 55 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 2698dcd49765..2c4b6665c5bf 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1903,6 +1903,41 @@ Sysreg SMCR_EL1 3 0 1 2 6 Fields SMCR_ELx EndSysreg +SysregFields GCSCR_ELx +Res0 63:10 +Field 9 STREn +Field 8 PUSHMEn +Res0 7 +Field 6 EXLOCKEN +Field 5 RVCHKEN +Res0 4:1 +Field 0 PCRSEL +EndSysregFields + +Sysreg GCSCR_EL1 3 0 2 5 0 +Fields GCSCR_ELx +EndSysreg + +SysregFields GCSPR_ELx +Field 63:3 PTR +Res0 2:0 +EndSysregFields + +Sysreg GCSPR_EL1 3 0 2 5 1 +Fields GCSPR_ELx +EndSysreg + +Sysreg GCSCRE0_EL1 3 0 2 5 2 +Res0 63:11 +Field 10 nTR +Field 9 STREn +Field 8 PUSHMEn +Res0 7:6 +Field 5 RVCHKEN +Res0 4:1 +Field 0 PCRSEL +EndSysreg + Sysreg ALLINT 3 0 4 3 0 Res0 63:14 Field 13 ALLINT @@ -2133,6 +2168,10 @@ Field 4 DZP Field 3:0 BS EndSysreg +Sysreg GCSPR_EL0 3 3 2 5 1 +Fields GCSPR_ELx +EndSysreg + Sysreg SVCR 3 3 4 2 2 Res0 63:2 Field 1 ZA @@ -2531,6 +2570,14 @@ Sysreg SMCR_EL2 3 4 1 2 6 Fields SMCR_ELx EndSysreg +Sysreg GCSCR_EL2 3 4 2 5 0 +Fields GCSCR_ELx +EndSysreg + +Sysreg GCSPR_EL2 3 4 2 5 1 +Fields GCSPR_ELx +EndSysreg + Sysreg DACR32_EL2 3 4 3 0 0 Res0 63:32 Field 31:30 D15 @@ -2590,6 +2637,14 @@ Sysreg SMCR_EL12 3 5 1 2 6 Fields SMCR_ELx EndSysreg +Sysreg GCSCR_EL12 3 5 2 5 0 +Fields GCSCR_ELx +EndSysreg + +Sysreg GCSPR_EL12 3 5 2 5 1 +Fields GCSPR_ELx +EndSysreg + Sysreg FAR_EL12 3 5 6 0 0 Field 63:0 ADDR EndSysreg From 79c03ed4b896513d226abdd83214b8436efdc22c Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 10 Dec 2023 18:48:18 +0100 Subject: [PATCH 44/87] drivers/perf: Remove usage of the deprecated ida_simple_xx() API ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by: Christophe JAILLET Link: https://lore.kernel.org/r/85b0b73a1b2f743dd5db15d4765c7685100de27f.1702230488.git.christophe.jaillet@wanadoo.fr Signed-off-by: Will Deacon --- drivers/perf/fsl_imx9_ddr_perf.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/perf/fsl_imx9_ddr_perf.c b/drivers/perf/fsl_imx9_ddr_perf.c index 5cf770a1bc31..9685645bfe04 100644 --- a/drivers/perf/fsl_imx9_ddr_perf.c +++ b/drivers/perf/fsl_imx9_ddr_perf.c @@ -617,7 +617,7 @@ static int ddr_perf_probe(struct platform_device *pdev) platform_set_drvdata(pdev, pmu); - pmu->id = ida_simple_get(&ddr_ida, 0, 0, GFP_KERNEL); + pmu->id = ida_alloc(&ddr_ida, GFP_KERNEL); name = devm_kasprintf(&pdev->dev, GFP_KERNEL, DDR_PERF_DEV_NAME "%d", pmu->id); if (!name) { ret = -ENOMEM; @@ -674,7 +674,7 @@ cpuhp_instance_err: cpuhp_remove_multi_state(pmu->cpuhp_state); cpuhp_state_err: format_string_err: - ida_simple_remove(&ddr_ida, pmu->id); + ida_free(&ddr_ida, pmu->id); dev_warn(&pdev->dev, "i.MX9 DDR Perf PMU failed (%d), disabled\n", ret); return ret; } @@ -688,7 +688,7 @@ static int ddr_perf_remove(struct platform_device *pdev) perf_pmu_unregister(&pmu->pmu); - ida_simple_remove(&ddr_ida, pmu->id); + ida_free(&ddr_ida, pmu->id); return 0; } From 5ca8ab55084de7b92a5979a2d9fa233158cb1ac2 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 12 Dec 2023 09:26:38 +0000 Subject: [PATCH 45/87] drivers/perf: arm_dsu_pmu: Remove kerneldoc-style comment syntax For some reason, the Arm DSU PMU driver uses kerneldoc-style comment syntax (i.e. /** ) for non-kerneldoc comments. This makes the robots very angry indeed, so just revert these to normal comments to stop the noise. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202312092000.8ltwotjt-lkp@intel.com/ Signed-off-by: Will Deacon --- drivers/perf/arm_dsu_pmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c index 8223c49bd082..7ec4498e312f 100644 --- a/drivers/perf/arm_dsu_pmu.c +++ b/drivers/perf/arm_dsu_pmu.c @@ -371,7 +371,7 @@ static inline u32 dsu_pmu_get_reset_overflow(void) return __dsu_pmu_get_reset_overflow(); } -/** +/* * dsu_pmu_set_event_period: Set the period for the counter. * * All DSU PMU event counters, except the cycle counter are 32bit @@ -602,7 +602,7 @@ static struct dsu_pmu *dsu_pmu_alloc(struct platform_device *pdev) return dsu_pmu; } -/** +/* * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster * from device tree. */ @@ -632,7 +632,7 @@ static int dsu_pmu_dt_get_cpus(struct device *dev, cpumask_t *mask) return 0; } -/** +/* * dsu_pmu_acpi_get_cpus: Get the list of CPUs in the cluster * from ACPI. */ From 9343c790e6de7edd2bab17572832f4f21c748dc2 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:13 +0000 Subject: [PATCH 46/87] arm: perf: Remove inlines from arm_pmuv3.c These are all static and in one compilation unit so the inline has no effect on the binary. Except if FTRACE is enabled, then 3 functions which were already not inlined now get the nops added which allows them to be traced. Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-2-james.clark@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmuv3.c | 46 ++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index bbde60f1b08c..09478e2b825e 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -304,12 +304,12 @@ PMU_FORMAT_ATTR(rdpmc, "config1:1"); static int sysctl_perf_user_access __read_mostly; -static inline bool armv8pmu_event_is_64bit(struct perf_event *event) +static bool armv8pmu_event_is_64bit(struct perf_event *event) { return event->attr.config1 & 0x1; } -static inline bool armv8pmu_event_want_user_access(struct perf_event *event) +static bool armv8pmu_event_want_user_access(struct perf_event *event) { return event->attr.config1 & 0x2; } @@ -401,7 +401,7 @@ static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu) return (IS_ENABLED(CONFIG_ARM64) && is_pmuv3p5(cpu_pmu->pmuver)); } -static inline bool armv8pmu_event_has_user_read(struct perf_event *event) +static bool armv8pmu_event_has_user_read(struct perf_event *event) { return event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT; } @@ -411,7 +411,7 @@ static inline bool armv8pmu_event_has_user_read(struct perf_event *event) * except when we have allocated the 64bit cycle counter (for CPU * cycles event) or when user space counter access is enabled. */ -static inline bool armv8pmu_event_is_chained(struct perf_event *event) +static bool armv8pmu_event_is_chained(struct perf_event *event) { int idx = event->hw.idx; struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); @@ -432,36 +432,36 @@ static inline bool armv8pmu_event_is_chained(struct perf_event *event) #define ARMV8_IDX_TO_COUNTER(x) \ (((x) - ARMV8_IDX_COUNTER0) & ARMV8_PMU_COUNTER_MASK) -static inline u64 armv8pmu_pmcr_read(void) +static u64 armv8pmu_pmcr_read(void) { return read_pmcr(); } -static inline void armv8pmu_pmcr_write(u64 val) +static void armv8pmu_pmcr_write(u64 val) { val &= ARMV8_PMU_PMCR_MASK; isb(); write_pmcr(val); } -static inline int armv8pmu_has_overflowed(u32 pmovsr) +static int armv8pmu_has_overflowed(u32 pmovsr) { return pmovsr & ARMV8_PMU_OVERFLOWED_MASK; } -static inline int armv8pmu_counter_has_overflowed(u32 pmnc, int idx) +static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx) { return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx)); } -static inline u64 armv8pmu_read_evcntr(int idx) +static u64 armv8pmu_read_evcntr(int idx) { u32 counter = ARMV8_IDX_TO_COUNTER(idx); return read_pmevcntrn(counter); } -static inline u64 armv8pmu_read_hw_counter(struct perf_event *event) +static u64 armv8pmu_read_hw_counter(struct perf_event *event) { int idx = event->hw.idx; u64 val = armv8pmu_read_evcntr(idx); @@ -523,14 +523,14 @@ static u64 armv8pmu_read_counter(struct perf_event *event) return armv8pmu_unbias_long_counter(event, value); } -static inline void armv8pmu_write_evcntr(int idx, u64 value) +static void armv8pmu_write_evcntr(int idx, u64 value) { u32 counter = ARMV8_IDX_TO_COUNTER(idx); write_pmevcntrn(counter, value); } -static inline void armv8pmu_write_hw_counter(struct perf_event *event, +static void armv8pmu_write_hw_counter(struct perf_event *event, u64 value) { int idx = event->hw.idx; @@ -556,7 +556,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value) armv8pmu_write_hw_counter(event, value); } -static inline void armv8pmu_write_evtype(int idx, u32 val) +static void armv8pmu_write_evtype(int idx, u32 val) { u32 counter = ARMV8_IDX_TO_COUNTER(idx); @@ -564,7 +564,7 @@ static inline void armv8pmu_write_evtype(int idx, u32 val) write_pmevtypern(counter, val); } -static inline void armv8pmu_write_event_type(struct perf_event *event) +static void armv8pmu_write_event_type(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; int idx = hwc->idx; @@ -598,7 +598,7 @@ static u32 armv8pmu_event_cnten_mask(struct perf_event *event) return mask; } -static inline void armv8pmu_enable_counter(u32 mask) +static void armv8pmu_enable_counter(u32 mask) { /* * Make sure event configuration register writes are visible before we @@ -608,7 +608,7 @@ static inline void armv8pmu_enable_counter(u32 mask) write_pmcntenset(mask); } -static inline void armv8pmu_enable_event_counter(struct perf_event *event) +static void armv8pmu_enable_event_counter(struct perf_event *event) { struct perf_event_attr *attr = &event->attr; u32 mask = armv8pmu_event_cnten_mask(event); @@ -620,7 +620,7 @@ static inline void armv8pmu_enable_event_counter(struct perf_event *event) armv8pmu_enable_counter(mask); } -static inline void armv8pmu_disable_counter(u32 mask) +static void armv8pmu_disable_counter(u32 mask) { write_pmcntenclr(mask); /* @@ -630,7 +630,7 @@ static inline void armv8pmu_disable_counter(u32 mask) isb(); } -static inline void armv8pmu_disable_event_counter(struct perf_event *event) +static void armv8pmu_disable_event_counter(struct perf_event *event) { struct perf_event_attr *attr = &event->attr; u32 mask = armv8pmu_event_cnten_mask(event); @@ -642,18 +642,18 @@ static inline void armv8pmu_disable_event_counter(struct perf_event *event) armv8pmu_disable_counter(mask); } -static inline void armv8pmu_enable_intens(u32 mask) +static void armv8pmu_enable_intens(u32 mask) { write_pmintenset(mask); } -static inline void armv8pmu_enable_event_irq(struct perf_event *event) +static void armv8pmu_enable_event_irq(struct perf_event *event) { u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx); armv8pmu_enable_intens(BIT(counter)); } -static inline void armv8pmu_disable_intens(u32 mask) +static void armv8pmu_disable_intens(u32 mask) { write_pmintenclr(mask); isb(); @@ -662,13 +662,13 @@ static inline void armv8pmu_disable_intens(u32 mask) isb(); } -static inline void armv8pmu_disable_event_irq(struct perf_event *event) +static void armv8pmu_disable_event_irq(struct perf_event *event) { u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx); armv8pmu_disable_intens(BIT(counter)); } -static inline u32 armv8pmu_getreset_flags(void) +static u32 armv8pmu_getreset_flags(void) { u32 value; From 62e1f212e5fe7624249212813ee96202e0c31430 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:14 +0000 Subject: [PATCH 47/87] arm: perf/kvm: Use GENMASK for ARMV8_PMU_PMCR_N This is so that FIELD_GET and FIELD_PREP can be used and that the fields are in a consistent format to arm64/tools/sysreg Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-3-james.clark@arm.com Signed-off-by: Will Deacon --- arch/arm64/kvm/pmu-emul.c | 8 +++----- arch/arm64/kvm/sys_regs.c | 4 ++-- drivers/perf/arm_pmuv3.c | 4 ++-- include/linux/perf/arm_pmuv3.h | 3 +-- 4 files changed, 8 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index fe99b3dab6ce..3d9467ff73bc 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -267,9 +267,8 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu) { - u64 val = kvm_vcpu_read_pmcr(vcpu) >> ARMV8_PMU_PMCR_N_SHIFT; + u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu)); - val &= ARMV8_PMU_PMCR_N_MASK; if (val == 0) return BIT(ARMV8_PMU_CYCLE_IDX); else @@ -1136,8 +1135,7 @@ u8 kvm_arm_pmu_get_pmuver_limit(void) */ u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu) { - u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0) & - ~(ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT); + u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0); - return pmcr | ((u64)vcpu->kvm->arch.pmcr_n << ARMV8_PMU_PMCR_N_SHIFT); + return u64_replace_bits(pmcr, vcpu->kvm->arch.pmcr_n, ARMV8_PMU_PMCR_N); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 4735e1b37fb3..ff45d688bd7d 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -877,7 +877,7 @@ static bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx) u64 pmcr, val; pmcr = kvm_vcpu_read_pmcr(vcpu); - val = (pmcr >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK; + val = FIELD_GET(ARMV8_PMU_PMCR_N, pmcr); if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX) { kvm_inject_undefined(vcpu); return false; @@ -1143,7 +1143,7 @@ static int get_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, u64 val) { - u8 new_n = (val >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK; + u8 new_n = FIELD_GET(ARMV8_PMU_PMCR_N, val); struct kvm *kvm = vcpu->kvm; mutex_lock(&kvm->arch.config_lock); diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 09478e2b825e..374e973a4e42 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -15,6 +15,7 @@ #include #include +#include #include #include #include @@ -1111,8 +1112,7 @@ static void __armv8pmu_probe_pmu(void *info) probe->present = true; /* Read the nb of CNTx counters supported from PMNC */ - cpu_pmu->num_events = (armv8pmu_pmcr_read() >> ARMV8_PMU_PMCR_N_SHIFT) - & ARMV8_PMU_PMCR_N_MASK; + cpu_pmu->num_events = FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()); /* Add the CPU cycles counter */ cpu_pmu->num_events += 1; diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index 9c226adf938a..ed62bd75cec7 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -215,8 +215,7 @@ #define ARMV8_PMU_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/ #define ARMV8_PMU_PMCR_LC (1 << 6) /* Overflow on 64 bit cycle counter */ #define ARMV8_PMU_PMCR_LP (1 << 7) /* Long event counter enable */ -#define ARMV8_PMU_PMCR_N_SHIFT 11 /* Number of counters supported */ -#define ARMV8_PMU_PMCR_N_MASK 0x1f +#define ARMV8_PMU_PMCR_N GENMASK(15, 11) /* Number of counters supported */ #define ARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */ /* From 2f6a00f30600417ee2737f2b1229c75663f1e3c9 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:15 +0000 Subject: [PATCH 48/87] arm: perf: Use GENMASK for PMMIR fields This is so that FIELD_GET and FIELD_PREP can be used and that the fields are in a consistent format to arm64/tools/sysreg Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-4-james.clark@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmuv3.c | 8 +++----- include/linux/perf/arm_pmuv3.h | 9 +++------ 2 files changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 374e973a4e42..36bc00494f56 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -332,7 +332,7 @@ static ssize_t slots_show(struct device *dev, struct device_attribute *attr, { struct pmu *pmu = dev_get_drvdata(dev); struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu); - u32 slots = cpu_pmu->reg_pmmir & ARMV8_PMU_SLOTS_MASK; + u32 slots = FIELD_GET(ARMV8_PMU_SLOTS, cpu_pmu->reg_pmmir); return sysfs_emit(page, "0x%08x\n", slots); } @@ -344,8 +344,7 @@ static ssize_t bus_slots_show(struct device *dev, struct device_attribute *attr, { struct pmu *pmu = dev_get_drvdata(dev); struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu); - u32 bus_slots = (cpu_pmu->reg_pmmir >> ARMV8_PMU_BUS_SLOTS_SHIFT) - & ARMV8_PMU_BUS_SLOTS_MASK; + u32 bus_slots = FIELD_GET(ARMV8_PMU_BUS_SLOTS, cpu_pmu->reg_pmmir); return sysfs_emit(page, "0x%08x\n", bus_slots); } @@ -357,8 +356,7 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr, { struct pmu *pmu = dev_get_drvdata(dev); struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu); - u32 bus_width = (cpu_pmu->reg_pmmir >> ARMV8_PMU_BUS_WIDTH_SHIFT) - & ARMV8_PMU_BUS_WIDTH_MASK; + u32 bus_width = FIELD_GET(ARMV8_PMU_BUS_WIDTH, cpu_pmu->reg_pmmir); u32 val = 0; /* Encoded as Log2(number of bytes), plus one */ diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index ed62bd75cec7..1bc7678c10d4 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -250,12 +250,9 @@ #define ARMV8_PMU_USERENR_ER (1 << 3) /* Event counter can be read at EL0 */ /* PMMIR_EL1.SLOTS mask */ -#define ARMV8_PMU_SLOTS_MASK 0xff - -#define ARMV8_PMU_BUS_SLOTS_SHIFT 8 -#define ARMV8_PMU_BUS_SLOTS_MASK 0xff -#define ARMV8_PMU_BUS_WIDTH_SHIFT 16 -#define ARMV8_PMU_BUS_WIDTH_MASK 0xf +#define ARMV8_PMU_SLOTS GENMASK(7, 0) +#define ARMV8_PMU_BUS_SLOTS GENMASK(15, 8) +#define ARMV8_PMU_BUS_WIDTH GENMASK(19, 16) /* * This code is really good From d30f09b6d7de5d159dbb537f9d67dceb67409420 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:16 +0000 Subject: [PATCH 49/87] arm: perf: Convert remaining fields to use GENMASK Convert the remaining fields to use either GENMASK or be built from other fields. These all already started at bit 0 so don't need a code change for the lack of _SHIFT. Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-5-james.clark@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmuv3.c | 2 +- include/linux/perf/arm_pmuv3.h | 18 +++++++++++++----- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 36bc00494f56..a93b4cf88562 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -675,7 +675,7 @@ static u32 armv8pmu_getreset_flags(void) value = read_pmovsclr(); /* Write to clear flags */ - value &= ARMV8_PMU_OVSR_MASK; + value &= ARMV8_PMU_OVERFLOWED_MASK; write_pmovsclr(value); return value; diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index 1bc7678c10d4..daa63542242d 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -216,19 +216,25 @@ #define ARMV8_PMU_PMCR_LC (1 << 6) /* Overflow on 64 bit cycle counter */ #define ARMV8_PMU_PMCR_LP (1 << 7) /* Long event counter enable */ #define ARMV8_PMU_PMCR_N GENMASK(15, 11) /* Number of counters supported */ -#define ARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */ +/* Mask for writable bits */ +#define ARMV8_PMU_PMCR_MASK (ARMV8_PMU_PMCR_E | ARMV8_PMU_PMCR_P | \ + ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_D | \ + ARMV8_PMU_PMCR_X | ARMV8_PMU_PMCR_DP | \ + ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP) /* * PMOVSR: counters overflow flag status reg */ -#define ARMV8_PMU_OVSR_MASK 0xffffffff /* Mask for writable bits */ -#define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_OVSR_MASK +#define ARMV8_PMU_OVSR_P GENMASK(30, 0) +#define ARMV8_PMU_OVSR_C BIT(31) +/* Mask for writable bits is both P and C fields */ +#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C) /* * PMXEVTYPER: Event selection reg */ #define ARMV8_PMU_EVTYPE_MASK 0xc800ffff /* Mask for writable bits */ -#define ARMV8_PMU_EVTYPE_EVENT 0xffff /* Mask for EVENT bits */ +#define ARMV8_PMU_EVTYPE_EVENT GENMASK(15, 0) /* Mask for EVENT bits */ /* * Event filters for PMUv3 @@ -243,11 +249,13 @@ /* * PMUSERENR: user enable reg */ -#define ARMV8_PMU_USERENR_MASK 0xf /* Mask for writable bits */ #define ARMV8_PMU_USERENR_EN (1 << 0) /* PMU regs can be accessed at EL0 */ #define ARMV8_PMU_USERENR_SW (1 << 1) /* PMSWINC can be written at EL0 */ #define ARMV8_PMU_USERENR_CR (1 << 2) /* Cycle counter can be read at EL0 */ #define ARMV8_PMU_USERENR_ER (1 << 3) /* Event counter can be read at EL0 */ +/* Mask for writable bits */ +#define ARMV8_PMU_USERENR_MASK (ARMV8_PMU_USERENR_EN | ARMV8_PMU_USERENR_SW | \ + ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_ER) /* PMMIR_EL1.SLOTS mask */ #define ARMV8_PMU_SLOTS GENMASK(7, 0) From 3115ee021bfb04efde2e96507bfcc1330261a6a1 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:17 +0000 Subject: [PATCH 50/87] arm64: perf: Include threshold control fields in PMEVTYPER mask FEAT_PMUv3_TH (Armv8.8) adds two new fields to PMEVTYPER, so include them in the mask. These aren't writable on 32 bit kernels as they are in the high part of the register, so only include them for arm64. It would be difficult to do this statically in the asm header files for each platform without resulting in circular includes or #ifdefs inline in the code. For that reason the ARMV8_PMU_EVTYPE_MASK definition has been removed and the mask is constructed programmatically. Reviewed-by: Suzuki K Poulose Reviewed-by: Anshuman Khandual Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-6-james.clark@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmuv3.c | 9 ++++++++- include/linux/perf/arm_pmuv3.h | 3 ++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index a93b4cf88562..441bf73ee3d5 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -558,8 +558,15 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value) static void armv8pmu_write_evtype(int idx, u32 val) { u32 counter = ARMV8_IDX_TO_COUNTER(idx); + unsigned long mask = ARMV8_PMU_EVTYPE_EVENT | + ARMV8_PMU_INCLUDE_EL2 | + ARMV8_PMU_EXCLUDE_EL0 | + ARMV8_PMU_EXCLUDE_EL1; - val &= ARMV8_PMU_EVTYPE_MASK; + if (IS_ENABLED(CONFIG_ARM64)) + mask |= ARMV8_PMU_EVTYPE_TC | ARMV8_PMU_EVTYPE_TH; + + val &= mask; write_pmevtypern(counter, val); } diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index daa63542242d..91957b3468e9 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -233,8 +233,9 @@ /* * PMXEVTYPER: Event selection reg */ -#define ARMV8_PMU_EVTYPE_MASK 0xc800ffff /* Mask for writable bits */ #define ARMV8_PMU_EVTYPE_EVENT GENMASK(15, 0) /* Mask for EVENT bits */ +#define ARMV8_PMU_EVTYPE_TH GENMASK(43, 32) +#define ARMV8_PMU_EVTYPE_TC GENMASK(63, 61) /* * Event filters for PMUv3 From f6da86969a3c284466ab6080764b2ed91689f262 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:18 +0000 Subject: [PATCH 51/87] arm: pmu: Share user ABI format mechanism with SPE This mechanism makes it much easier to define and read new attributes so move it to the arm_pmu.h header so that it can be shared. At the same time update the existing format attributes to use it. GENMASK has to be changed to GENMASK_ULL because the config fields are 64 bits even on arm32 where this will also be used now. Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-7-james.clark@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmuv3.c | 21 ++++++++++++++++----- drivers/perf/arm_spe_pmu.c | 22 ---------------------- include/linux/perf/arm_pmu.h | 22 ++++++++++++++++++++++ 3 files changed, 38 insertions(+), 27 deletions(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 441bf73ee3d5..8a573db81da1 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -299,20 +299,31 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = { .is_visible = armv8pmu_event_attr_is_visible, }; -PMU_FORMAT_ATTR(event, "config:0-15"); -PMU_FORMAT_ATTR(long, "config1:0"); -PMU_FORMAT_ATTR(rdpmc, "config1:1"); +/* User ABI */ +#define ATTR_CFG_FLD_event_CFG config +#define ATTR_CFG_FLD_event_LO 0 +#define ATTR_CFG_FLD_event_HI 15 +#define ATTR_CFG_FLD_long_CFG config1 +#define ATTR_CFG_FLD_long_LO 0 +#define ATTR_CFG_FLD_long_HI 0 +#define ATTR_CFG_FLD_rdpmc_CFG config1 +#define ATTR_CFG_FLD_rdpmc_LO 1 +#define ATTR_CFG_FLD_rdpmc_HI 1 + +GEN_PMU_FORMAT_ATTR(event); +GEN_PMU_FORMAT_ATTR(long); +GEN_PMU_FORMAT_ATTR(rdpmc); static int sysctl_perf_user_access __read_mostly; static bool armv8pmu_event_is_64bit(struct perf_event *event) { - return event->attr.config1 & 0x1; + return ATTR_CFG_GET_FLD(&event->attr, long); } static bool armv8pmu_event_want_user_access(struct perf_event *event) { - return event->attr.config1 & 0x2; + return ATTR_CFG_GET_FLD(&event->attr, rdpmc); } static struct attribute *armv8_pmuv3_format_attrs[] = { diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index d2b0cbf0e0c4..b622d75d8c9e 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -206,28 +206,6 @@ static const struct attribute_group arm_spe_pmu_cap_group = { #define ATTR_CFG_FLD_inv_event_filter_LO 0 #define ATTR_CFG_FLD_inv_event_filter_HI 63 -/* Why does everything I do descend into this? */ -#define __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \ - (lo) == (hi) ? #cfg ":" #lo "\n" : #cfg ":" #lo "-" #hi - -#define _GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \ - __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) - -#define GEN_PMU_FORMAT_ATTR(name) \ - PMU_FORMAT_ATTR(name, \ - _GEN_PMU_FORMAT_ATTR(ATTR_CFG_FLD_##name##_CFG, \ - ATTR_CFG_FLD_##name##_LO, \ - ATTR_CFG_FLD_##name##_HI)) - -#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi) \ - ((((attr)->cfg) >> lo) & GENMASK(hi - lo, 0)) - -#define ATTR_CFG_GET_FLD(attr, name) \ - _ATTR_CFG_GET_FLD(attr, \ - ATTR_CFG_FLD_##name##_CFG, \ - ATTR_CFG_FLD_##name##_LO, \ - ATTR_CFG_FLD_##name##_HI) - GEN_PMU_FORMAT_ATTR(ts_enable); GEN_PMU_FORMAT_ATTR(pa_enable); GEN_PMU_FORMAT_ATTR(pct_enable); diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index e2503d48ddee..b3b34f6670cf 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -183,4 +183,26 @@ void armpmu_free_irq(int irq, int cpu); #define ARMV8_SPE_PDEV_NAME "arm,spe-v1" #define ARMV8_TRBE_PDEV_NAME "arm,trbe" +/* Why does everything I do descend into this? */ +#define __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \ + (lo) == (hi) ? #cfg ":" #lo "\n" : #cfg ":" #lo "-" #hi + +#define _GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \ + __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) + +#define GEN_PMU_FORMAT_ATTR(name) \ + PMU_FORMAT_ATTR(name, \ + _GEN_PMU_FORMAT_ATTR(ATTR_CFG_FLD_##name##_CFG, \ + ATTR_CFG_FLD_##name##_LO, \ + ATTR_CFG_FLD_##name##_HI)) + +#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi) \ + ((((attr)->cfg) >> lo) & GENMASK_ULL(hi - lo, 0)) + +#define ATTR_CFG_GET_FLD(attr, name) \ + _ATTR_CFG_GET_FLD(attr, \ + ATTR_CFG_FLD_##name##_CFG, \ + ATTR_CFG_FLD_##name##_LO, \ + ATTR_CFG_FLD_##name##_HI) + #endif /* __ARM_PMU_H__ */ From a5f4ca68f348ac059efd6a3d7ad4040aed1c0818 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:19 +0000 Subject: [PATCH 52/87] perf/arm_dmc620: Remove duplicate format attribute #defines These were copied from the SPE driver, but now they're in the arm_pmu.h header so delete them and use the header instead. Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-8-james.clark@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_dmc620_pmu.c | 22 +--------------------- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c index 30cea6859574..9de9dc8db8db 100644 --- a/drivers/perf/arm_dmc620_pmu.c +++ b/drivers/perf/arm_dmc620_pmu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -189,27 +190,6 @@ static const struct attribute_group dmc620_pmu_events_attr_group = { #define ATTR_CFG_FLD_clkdiv2_LO 9 #define ATTR_CFG_FLD_clkdiv2_HI 9 -#define __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \ - (lo) == (hi) ? #cfg ":" #lo "\n" : #cfg ":" #lo "-" #hi - -#define _GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \ - __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) - -#define GEN_PMU_FORMAT_ATTR(name) \ - PMU_FORMAT_ATTR(name, \ - _GEN_PMU_FORMAT_ATTR(ATTR_CFG_FLD_##name##_CFG, \ - ATTR_CFG_FLD_##name##_LO, \ - ATTR_CFG_FLD_##name##_HI)) - -#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi) \ - ((((attr)->cfg) >> lo) & GENMASK_ULL(hi - lo, 0)) - -#define ATTR_CFG_GET_FLD(attr, name) \ - _ATTR_CFG_GET_FLD(attr, \ - ATTR_CFG_FLD_##name##_CFG, \ - ATTR_CFG_FLD_##name##_LO, \ - ATTR_CFG_FLD_##name##_HI) - GEN_PMU_FORMAT_ATTR(mask); GEN_PMU_FORMAT_ATTR(match); GEN_PMU_FORMAT_ATTR(invert); From c7b98bf0fc79bd2d91f6ef84e07b5f648d43c13e Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:20 +0000 Subject: [PATCH 53/87] KVM: selftests: aarch64: Update tools copy of arm_pmuv3.h Now that ARMV8_PMU_PMCR_N is made with GENMASK, update usages to treat it as a pre-shifted mask. Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-9-james.clark@arm.com Signed-off-by: Will Deacon --- tools/include/perf/arm_pmuv3.h | 43 +++++++++++-------- .../kvm/aarch64/vpmu_counter_access.c | 5 +-- 2 files changed, 28 insertions(+), 20 deletions(-) diff --git a/tools/include/perf/arm_pmuv3.h b/tools/include/perf/arm_pmuv3.h index e822d49fb5b8..1e397d55384e 100644 --- a/tools/include/perf/arm_pmuv3.h +++ b/tools/include/perf/arm_pmuv3.h @@ -218,45 +218,54 @@ #define ARMV8_PMU_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/ #define ARMV8_PMU_PMCR_LC (1 << 6) /* Overflow on 64 bit cycle counter */ #define ARMV8_PMU_PMCR_LP (1 << 7) /* Long event counter enable */ -#define ARMV8_PMU_PMCR_N_SHIFT 11 /* Number of counters supported */ -#define ARMV8_PMU_PMCR_N_MASK 0x1f -#define ARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */ +#define ARMV8_PMU_PMCR_N GENMASK(15, 11) /* Number of counters supported */ +/* Mask for writable bits */ +#define ARMV8_PMU_PMCR_MASK (ARMV8_PMU_PMCR_E | ARMV8_PMU_PMCR_P | \ + ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_D | \ + ARMV8_PMU_PMCR_X | ARMV8_PMU_PMCR_DP | \ + ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP) /* * PMOVSR: counters overflow flag status reg */ -#define ARMV8_PMU_OVSR_MASK 0xffffffff /* Mask for writable bits */ -#define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_OVSR_MASK +#define ARMV8_PMU_OVSR_P GENMASK(30, 0) +#define ARMV8_PMU_OVSR_C BIT(31) +/* Mask for writable bits is both P and C fields */ +#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C) /* * PMXEVTYPER: Event selection reg */ -#define ARMV8_PMU_EVTYPE_MASK 0xc800ffff /* Mask for writable bits */ -#define ARMV8_PMU_EVTYPE_EVENT 0xffff /* Mask for EVENT bits */ +#define ARMV8_PMU_EVTYPE_EVENT GENMASK(15, 0) /* Mask for EVENT bits */ +#define ARMV8_PMU_EVTYPE_TH GENMASK(43, 32) +#define ARMV8_PMU_EVTYPE_TC GENMASK(63, 61) /* * Event filters for PMUv3 */ -#define ARMV8_PMU_EXCLUDE_EL1 (1U << 31) -#define ARMV8_PMU_EXCLUDE_EL0 (1U << 30) -#define ARMV8_PMU_INCLUDE_EL2 (1U << 27) +#define ARMV8_PMU_EXCLUDE_EL1 (1U << 31) +#define ARMV8_PMU_EXCLUDE_EL0 (1U << 30) +#define ARMV8_PMU_EXCLUDE_NS_EL1 (1U << 29) +#define ARMV8_PMU_EXCLUDE_NS_EL0 (1U << 28) +#define ARMV8_PMU_INCLUDE_EL2 (1U << 27) +#define ARMV8_PMU_EXCLUDE_EL3 (1U << 26) /* * PMUSERENR: user enable reg */ -#define ARMV8_PMU_USERENR_MASK 0xf /* Mask for writable bits */ #define ARMV8_PMU_USERENR_EN (1 << 0) /* PMU regs can be accessed at EL0 */ #define ARMV8_PMU_USERENR_SW (1 << 1) /* PMSWINC can be written at EL0 */ #define ARMV8_PMU_USERENR_CR (1 << 2) /* Cycle counter can be read at EL0 */ #define ARMV8_PMU_USERENR_ER (1 << 3) /* Event counter can be read at EL0 */ +/* Mask for writable bits */ +#define ARMV8_PMU_USERENR_MASK (ARMV8_PMU_USERENR_EN | ARMV8_PMU_USERENR_SW | \ + ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_ER) /* PMMIR_EL1.SLOTS mask */ -#define ARMV8_PMU_SLOTS_MASK 0xff - -#define ARMV8_PMU_BUS_SLOTS_SHIFT 8 -#define ARMV8_PMU_BUS_SLOTS_MASK 0xff -#define ARMV8_PMU_BUS_WIDTH_SHIFT 16 -#define ARMV8_PMU_BUS_WIDTH_MASK 0xf +#define ARMV8_PMU_SLOTS GENMASK(7, 0) +#define ARMV8_PMU_BUS_SLOTS GENMASK(15, 8) +#define ARMV8_PMU_BUS_WIDTH GENMASK(19, 16) +#define ARMV8_PMU_THWIDTH GENMASK(23, 20) /* * This code is really good diff --git a/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c b/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c index 5ea78986e665..9d51b5691349 100644 --- a/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c +++ b/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c @@ -42,13 +42,12 @@ struct pmreg_sets { static uint64_t get_pmcr_n(uint64_t pmcr) { - return (pmcr >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK; + return FIELD_GET(ARMV8_PMU_PMCR_N, pmcr); } static void set_pmcr_n(uint64_t *pmcr, uint64_t pmcr_n) { - *pmcr = *pmcr & ~(ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT); - *pmcr |= (pmcr_n << ARMV8_PMU_PMCR_N_SHIFT); + u64p_replace_bits((__u64 *) pmcr, pmcr_n, ARMV8_PMU_PMCR_N); } static uint64_t get_counters_mask(uint64_t n) From 186c91aaf54989a9c74869dcc6ba031313d8e2b8 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:21 +0000 Subject: [PATCH 54/87] arm: pmu: Move error message and -EOPNOTSUPP to individual PMUs -EPERM or -EINVAL always get converted to -EOPNOTSUPP, so replace them. This will allow __hw_perf_event_init() to return a different code or not print that particular message for a different error in the next commit. Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-10-james.clark@arm.com Signed-off-by: Will Deacon --- arch/arm/kernel/perf_event_v7.c | 6 ++++-- drivers/perf/apple_m1_cpu_pmu.c | 6 ++++-- drivers/perf/arm_pmu.c | 11 +++++------ drivers/perf/arm_pmuv3.c | 6 ++++-- 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c index c890354b04e9..a3322e2b3ea4 100644 --- a/arch/arm/kernel/perf_event_v7.c +++ b/arch/arm/kernel/perf_event_v7.c @@ -1052,8 +1052,10 @@ static int armv7pmu_set_event_filter(struct hw_perf_event *event, { unsigned long config_base = 0; - if (attr->exclude_idle) - return -EPERM; + if (attr->exclude_idle) { + pr_debug("ARM performance counters do not support mode exclusion\n"); + return -EOPNOTSUPP; + } if (attr->exclude_user) config_base |= ARMV7_EXCLUDE_USER; if (attr->exclude_kernel) diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c index cd2de44b61b9..f322e5ca1114 100644 --- a/drivers/perf/apple_m1_cpu_pmu.c +++ b/drivers/perf/apple_m1_cpu_pmu.c @@ -524,8 +524,10 @@ static int m1_pmu_set_event_filter(struct hw_perf_event *event, { unsigned long config_base = 0; - if (!attr->exclude_guest) - return -EINVAL; + if (!attr->exclude_guest) { + pr_debug("ARM performance counters do not support mode exclusion\n"); + return -EOPNOTSUPP; + } if (!attr->exclude_kernel) config_base |= M1_PMU_CFG_COUNT_KERNEL; if (!attr->exclude_user) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 379479b50bdd..8458fe2cebb4 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -445,7 +445,7 @@ __hw_perf_event_init(struct perf_event *event) { struct arm_pmu *armpmu = to_arm_pmu(event->pmu); struct hw_perf_event *hwc = &event->hw; - int mapping; + int mapping, ret; hwc->flags = 0; mapping = armpmu->map_event(event); @@ -470,11 +470,10 @@ __hw_perf_event_init(struct perf_event *event) /* * Check whether we need to exclude the counter from certain modes. */ - if (armpmu->set_event_filter && - armpmu->set_event_filter(hwc, &event->attr)) { - pr_debug("ARM performance counters do not support " - "mode exclusion\n"); - return -EOPNOTSUPP; + if (armpmu->set_event_filter) { + ret = armpmu->set_event_filter(hwc, &event->attr); + if (ret) + return ret; } /* diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 8a573db81da1..2ba215f74d92 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -936,8 +936,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, { unsigned long config_base = 0; - if (attr->exclude_idle) - return -EPERM; + if (attr->exclude_idle) { + pr_debug("ARM performance counters do not support mode exclusion\n"); + return -EOPNOTSUPP; + } /* * If we're running in hyp mode, then we *are* the hypervisor. From 816c26754447e8b28d6c604e1f5b1d205b2586ee Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:22 +0000 Subject: [PATCH 55/87] arm64: perf: Add support for event counting threshold FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on events whose count meets a specified threshold condition. For example if PMEVTYPERn.TC (Threshold Control) is set to 0b101 (Greater than or equal, count), and the threshold is set to 2, then the PMU counter will now only increment by 1 when an event would have previously incremented the PMU counter by 2 or more on a single processor cycle. Three new Perf event config fields, 'threshold', 'threshold_compare' and 'threshold_count' have been added to control the feature. threshold_compare maps to the upper two bits of PMEVTYPERn.TC and threshold_count maps to the first bit of TC. These separate attributes have been picked rather than enumerating all the possible combinations of the TC field as in the Arm ARM. The attributes would be used on a Perf command line like this: $ perf stat -e stall_slot/threshold=2,threshold_compare=2/ A new capability for reading out the maximum supported threshold value has also been added: $ cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max 0x000000ff If a threshold higher than threshold_max is provided, then an error is generated. If FEAT_PMUv3_TH isn't implemented or a 32 bit kernel is running, then threshold_max reads zero, and attempting to set a threshold value will also result in an error. The threshold is per PMU counter, and there are potentially different threshold_max values per PMU type on heterogeneous systems. Bits higher than 32 now need to be written into PMEVTYPER, so armv8pmu_write_evtype() has to be updated to take an unsigned long value rather than u32 which gives the correct behavior on both aarch32 and 64. Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-11-james.clark@arm.com Signed-off-by: Will Deacon --- drivers/perf/arm_pmuv3.c | 79 +++++++++++++++++++++++++++++++++- include/linux/perf/arm_pmuv3.h | 1 + 2 files changed, 79 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 2ba215f74d92..23fa6c5da82c 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -309,10 +309,22 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = { #define ATTR_CFG_FLD_rdpmc_CFG config1 #define ATTR_CFG_FLD_rdpmc_LO 1 #define ATTR_CFG_FLD_rdpmc_HI 1 +#define ATTR_CFG_FLD_threshold_count_CFG config1 /* PMEVTYPER.TC[0] */ +#define ATTR_CFG_FLD_threshold_count_LO 2 +#define ATTR_CFG_FLD_threshold_count_HI 2 +#define ATTR_CFG_FLD_threshold_compare_CFG config1 /* PMEVTYPER.TC[2:1] */ +#define ATTR_CFG_FLD_threshold_compare_LO 3 +#define ATTR_CFG_FLD_threshold_compare_HI 4 +#define ATTR_CFG_FLD_threshold_CFG config1 /* PMEVTYPER.TH */ +#define ATTR_CFG_FLD_threshold_LO 5 +#define ATTR_CFG_FLD_threshold_HI 16 GEN_PMU_FORMAT_ATTR(event); GEN_PMU_FORMAT_ATTR(long); GEN_PMU_FORMAT_ATTR(rdpmc); +GEN_PMU_FORMAT_ATTR(threshold_count); +GEN_PMU_FORMAT_ATTR(threshold_compare); +GEN_PMU_FORMAT_ATTR(threshold); static int sysctl_perf_user_access __read_mostly; @@ -326,10 +338,27 @@ static bool armv8pmu_event_want_user_access(struct perf_event *event) return ATTR_CFG_GET_FLD(&event->attr, rdpmc); } +static u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr) +{ + u8 th_compare = ATTR_CFG_GET_FLD(attr, threshold_compare); + u8 th_count = ATTR_CFG_GET_FLD(attr, threshold_count); + + /* + * The count bit is always the bottom bit of the full control field, and + * the comparison is the upper two bits, but it's not explicitly + * labelled in the Arm ARM. For the Perf interface we split it into two + * fields, so reconstruct it here. + */ + return (th_compare << 1) | th_count; +} + static struct attribute *armv8_pmuv3_format_attrs[] = { &format_attr_event.attr, &format_attr_long.attr, &format_attr_rdpmc.attr, + &format_attr_threshold.attr, + &format_attr_threshold_compare.attr, + &format_attr_threshold_count.attr, NULL, }; @@ -379,10 +408,38 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr, static DEVICE_ATTR_RO(bus_width); +static u32 threshold_max(struct arm_pmu *cpu_pmu) +{ + /* + * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be + * impossible to write the threshold in the upper 32 bits of PMEVTYPER. + */ + if (IS_ENABLED(CONFIG_ARM)) + return 0; + + /* + * The largest value that can be written to PMEVTYPER_EL0.TH is + * (2 ^ PMMIR.THWIDTH) - 1. + */ + return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1; +} + +static ssize_t threshold_max_show(struct device *dev, + struct device_attribute *attr, char *page) +{ + struct pmu *pmu = dev_get_drvdata(dev); + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu); + + return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu)); +} + +static DEVICE_ATTR_RO(threshold_max); + static struct attribute *armv8_pmuv3_caps_attrs[] = { &dev_attr_slots.attr, &dev_attr_bus_slots.attr, &dev_attr_bus_width.attr, + &dev_attr_threshold_max.attr, NULL, }; @@ -566,7 +623,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value) armv8pmu_write_hw_counter(event, value); } -static void armv8pmu_write_evtype(int idx, u32 val) +static void armv8pmu_write_evtype(int idx, unsigned long val) { u32 counter = ARMV8_IDX_TO_COUNTER(idx); unsigned long mask = ARMV8_PMU_EVTYPE_EVENT | @@ -935,6 +992,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, struct perf_event_attr *attr) { unsigned long config_base = 0; + struct perf_event *perf_event = container_of(attr, struct perf_event, + attr); + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu); + u32 th; if (attr->exclude_idle) { pr_debug("ARM performance counters do not support mode exclusion\n"); @@ -968,6 +1029,22 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, if (attr->exclude_user) config_base |= ARMV8_PMU_EXCLUDE_EL0; + /* + * If FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will + * be 0 and will also trigger this check, preventing it from being used. + */ + th = ATTR_CFG_GET_FLD(attr, threshold); + if (th > threshold_max(cpu_pmu)) { + pr_debug("PMU event threshold exceeds max value\n"); + return -EINVAL; + } + + if (IS_ENABLED(CONFIG_ARM64) && th) { + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th); + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC, + armv8pmu_event_threshold_control(attr)); + } + /* * Install the filter into config_base as this is used to * construct the event type. diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index 91957b3468e9..0f4d62ef3a9a 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -262,6 +262,7 @@ #define ARMV8_PMU_SLOTS GENMASK(7, 0) #define ARMV8_PMU_BUS_SLOTS GENMASK(15, 8) #define ARMV8_PMU_BUS_WIDTH GENMASK(19, 16) +#define ARMV8_PMU_THWIDTH GENMASK(23, 20) /* * This code is really good From bd690638e2c27dcea1d56376aa4bf3995d82ccfc Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 11 Dec 2023 16:13:23 +0000 Subject: [PATCH 56/87] Documentation: arm64: Document the PMU event counting threshold feature Add documentation for the new Perf event open parameters and the threshold_max capability file. Reviewed-by: Anshuman Khandual Reviewed-by: Suzuki K Poulose Acked-by: Namhyung Kim Signed-off-by: James Clark Link: https://lore.kernel.org/r/20231211161331.1277825-12-james.clark@arm.com Signed-off-by: Will Deacon --- Documentation/arch/arm64/perf.rst | 72 +++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst index 1f87b57c2332..997fd716b82f 100644 --- a/Documentation/arch/arm64/perf.rst +++ b/Documentation/arch/arm64/perf.rst @@ -164,3 +164,75 @@ and should be used to mask the upper bits as needed. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c .. _tools/lib/perf/tests/test-evsel.c: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c + +Event Counting Threshold +========================================== + +Overview +-------- + +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on +events whose count meets a specified threshold condition. For example if +threshold_compare is set to 2 ('Greater than or equal'), and the +threshold is set to 2, then the PMU counter will now only increment by +when an event would have previously incremented the PMU counter by 2 or +more on a single processor cycle. + +To increment by 1 after passing the threshold condition instead of the +number of events on that cycle, add the 'threshold_count' option to the +commandline. + +How-to +------ + +These are the parameters for controlling the feature: + +.. list-table:: + :header-rows: 1 + + * - Parameter + - Description + * - threshold + - Value to threshold the event by. A value of 0 means that + thresholding is disabled and the other parameters have no effect. + * - threshold_compare + - | Comparison function to use, with the following values supported: + | + | 0: Not-equal + | 1: Equals + | 2: Greater-than-or-equal + | 3: Less-than + * - threshold_count + - If this is set, count by 1 after passing the threshold condition + instead of the value of the event on this cycle. + +The threshold, threshold_compare and threshold_count values can be +provided per event, for example: + +.. code-block:: sh + + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ + +In this example the stall_slot event will count by 2 or more on every +cycle where 2 or more stalls happen. And dtlb_walk will count by 1 on +every cycle where the number of dtlb walks were less than 10. + +The maximum supported threshold value can be read from the caps of each +PMU, for example: + +.. code-block:: sh + + cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max + + 0x000000ff + +If a value higher than this is given, then opening the event will result +in an error. The highest possible maximum is 4095, as the config field +for threshold is limited to 12 bits, and the Perf tool will refuse to +parse higher values. + +If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read +0, and attempting to set a threshold value will also result in an error. +threshold_max will also read as 0 on aarch32 guests, even if the host +is running on hardware with the feature. From 3dfdc2750c6cdc6a5ebf5effb07f92db761de35d Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:15:57 +0100 Subject: [PATCH 57/87] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime In subsequent patches, mark portions of the early C code will be marked as __init. Unfortunarely, __init implies __latent_entropy, and this would result in the early C code being instrumented in an unsafe manner. Disable the latent entropy plugin for the early C code. Acked-by: Mark Rutland Signed-off-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231129111555.3594833-44-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/pi/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile index 4c0ea3cd4ea4..c844a0546d7f 100644 --- a/arch/arm64/kernel/pi/Makefile +++ b/arch/arm64/kernel/pi/Makefile @@ -3,6 +3,7 @@ KBUILD_CFLAGS := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \ -Os -DDISABLE_BRANCH_PROFILING $(DISABLE_STACKLEAK_PLUGIN) \ + $(DISABLE_LATENT_ENTROPY_PLUGIN) \ $(call cc-option,-mbranch-protection=none) \ -I$(srctree)/scripts/dtc/libfdt -fno-stack-protector \ -include $(srctree)/include/linux/hidden.h \ From a22fc8e102dc475e91dc13e6e1e395f4d95ae684 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:15:58 +0100 Subject: [PATCH 58/87] arm64: mm: Take potential load offset into account when KASLR is off We enable CONFIG_RELOCATABLE even when CONFIG_RANDOMIZE_BASE is disabled, and this permits the loader (i.e., EFI) to place the kernel anywhere in physical memory as long as the base address is 64k aligned. This means that the 'KASLR' case described in the header that defines the size of the statically allocated page tables could take effect even when CONFIG_RANDMIZE_BASE=n. So check for CONFIG_RELOCATABLE instead. Signed-off-by: Ard Biesheuvel Reviewed-by: Anshuman Khandual Link: https://lore.kernel.org/r/20231129111555.3594833-45-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/kernel-pgtable.h | 27 ++++++------------------- 1 file changed, 6 insertions(+), 21 deletions(-) diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h index 85d26143faa5..83ddb14b95a5 100644 --- a/arch/arm64/include/asm/kernel-pgtable.h +++ b/arch/arm64/include/asm/kernel-pgtable.h @@ -37,27 +37,12 @@ /* - * If KASLR is enabled, then an offset K is added to the kernel address - * space. The bottom 21 bits of this offset are zero to guarantee 2MB - * alignment for PA and VA. - * - * For each pagetable level of the swapper, we know that the shift will - * be larger than 21 (for the 4KB granule case we use section maps thus - * the smallest shift is actually 30) thus there is the possibility that - * KASLR can increase the number of pagetable entries by 1, so we make - * room for this extra entry. - * - * Note KASLR cannot increase the number of required entries for a level - * by more than one because it increments both the virtual start and end - * addresses equally (the extra entry comes from the case where the end - * address is just pushed over a boundary and the start address isn't). + * A relocatable kernel may execute from an address that differs from the one at + * which it was linked. In the worst case, its runtime placement may intersect + * with two adjacent PGDIR entries, which means that an additional page table + * may be needed at each subordinate level. */ - -#ifdef CONFIG_RANDOMIZE_BASE -#define EARLY_KASLR (1) -#else -#define EARLY_KASLR (0) -#endif +#define EXTRA_PAGE __is_defined(CONFIG_RELOCATABLE) #define SPAN_NR_ENTRIES(vstart, vend, shift) \ ((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1) @@ -83,7 +68,7 @@ + EARLY_PGDS((vstart), (vend), add) /* each PGDIR needs a next level page table */ \ + EARLY_PUDS((vstart), (vend), add) /* each PUD needs a next level page table */ \ + EARLY_PMDS((vstart), (vend), add)) /* each PMD needs a next level page table */ -#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR)) +#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE)) /* the initial ID map may need two extra pages if it needs to be extended */ #if VA_BITS < 48 From 376f5a3bd7e21337dc41f7abb56c2c74ac63038a Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:15:59 +0100 Subject: [PATCH 59/87] arm64: mm: get rid of kimage_vaddr global variable We store the address of _text in kimage_vaddr, but since commit 09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings decision"), we no longer reference this variable from modules so we no longer need to export it. In fact, we don't need it at all so let's just get rid of it. Acked-by: Mark Rutland Signed-off-by: Ard Biesheuvel Reviewed-by: Anshuman Khandual Link: https://lore.kernel.org/r/20231129111555.3594833-46-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/memory.h | 6 ++---- arch/arm64/kernel/head.S | 2 +- arch/arm64/mm/mmu.c | 3 --- 3 files changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index fde4186cc387..b8d726f951ae 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -182,6 +182,7 @@ #include #include #include +#include #if VA_BITS > 48 extern u64 vabits_actual; @@ -193,15 +194,12 @@ extern s64 memstart_addr; /* PHYS_OFFSET - the physical address of the start of memory. */ #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) -/* the virtual base of the kernel image */ -extern u64 kimage_vaddr; - /* the offset between the kernel virtual and physical mappings */ extern u64 kimage_voffset; static inline unsigned long kaslr_offset(void) { - return kimage_vaddr - KIMAGE_VADDR; + return (u64)&_text - KIMAGE_VADDR; } #ifdef CONFIG_RANDOMIZE_BASE diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 7b236994f0e1..cab7f91949d8 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -482,7 +482,7 @@ SYM_FUNC_START_LOCAL(__primary_switched) str_l x21, __fdt_pointer, x5 // Save FDT pointer - ldr_l x4, kimage_vaddr // Save the offset between + adrp x4, _text // Save the offset between sub x4, x4, x0 // the kernel virtual and str_l x4, kimage_voffset, x5 // physical mappings diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 15f6347d23b6..03c73e9197ac 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -52,9 +52,6 @@ u64 vabits_actual __ro_after_init = VA_BITS_MIN; EXPORT_SYMBOL(vabits_actual); #endif -u64 kimage_vaddr __ro_after_init = (u64)&_text; -EXPORT_SYMBOL(kimage_vaddr); - u64 kimage_voffset __ro_after_init; EXPORT_SYMBOL(kimage_voffset); From cbc59c9a4e5785796ccac9a975a94cb52c87feb1 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:16:10 +0100 Subject: [PATCH 60/87] arm64: idreg-override: Omit non-NULL checks for override pointer Now that override pointers are always set, we can drop the various non-NULL checks that we have in the code. Signed-off-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231129111555.3594833-57-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/idreg-override.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 3addc09f8746..536bc33859bc 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -216,9 +216,6 @@ static void __init match_options(const char *cmdline) for (i = 0; i < ARRAY_SIZE(regs); i++) { int f; - if (!regs[i]->override) - continue; - for (f = 0; strlen(regs[i]->fields[f].name); f++) { u64 shift = regs[i]->fields[f].shift; u64 width = regs[i]->fields[f].width ?: 4; @@ -319,10 +316,8 @@ asmlinkage void __init init_feature_override(u64 boot_status) int i; for (i = 0; i < ARRAY_SIZE(regs); i++) { - if (regs[i]->override) { - regs[i]->override->val = 0; - regs[i]->override->mask = 0; - } + regs[i]->override->val = 0; + regs[i]->override->mask = 0; } __boot_status = boot_status; @@ -330,9 +325,8 @@ asmlinkage void __init init_feature_override(u64 boot_status) parse_cmdline(); for (i = 0; i < ARRAY_SIZE(regs); i++) { - if (regs[i]->override) - dcache_clean_inval_poc((unsigned long)regs[i]->override, - (unsigned long)regs[i]->override + - sizeof(*regs[i]->override)); + dcache_clean_inval_poc((unsigned long)regs[i]->override, + (unsigned long)regs[i]->override + + sizeof(*regs[i]->override)); } } From 01fd29092a35833ef87bd13c0a025e726550d646 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:16:11 +0100 Subject: [PATCH 61/87] arm64: idreg-override: Prepare for place relative reloc patching The ID reg override handling code uses a rather elaborate data structure that relies on statically initialized absolute address values in pointer fields. This means that this code cannot run until relocation fixups have been applied, and this is unfortunate, because it means we cannot discover overrides for KASLR or LVA/LPA without creating the kernel mapping and performing the relocations first. This can be solved by switching to place-relative relocations, which can be applied by the linker at build time. This means some additional arithmetic is required when dereferencing these pointers, as we can no longer dereference the pointer members directly. So let's implement this for idreg-override.c in a preliminary way, i.e., convert all the references in code to use a special accessor that produces the correct absolute value at runtime. Signed-off-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231129111555.3594833-58-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/idreg-override.c | 89 +++++++++++++++++++----------- 1 file changed, 56 insertions(+), 33 deletions(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 536bc33859bc..ca1b8d2dbe99 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -21,14 +21,25 @@ static u64 __boot_status __initdata; +// temporary __prel64 related definitions +// to be removed when this code is moved under pi/ + +#define __prel64_initconst __initconst + +#define PREL64(type, name) union { type *name; } + +#define prel64_pointer(__d) (__d) + +typedef bool filter_t(u64 val); + struct ftr_set_desc { char name[FTR_DESC_NAME_LEN]; - struct arm64_ftr_override *override; + PREL64(struct arm64_ftr_override, override); struct { char name[FTR_DESC_FIELD_LEN]; u8 shift; u8 width; - bool (*filter)(u64 val); + PREL64(filter_t, filter); } fields[]; }; @@ -46,7 +57,7 @@ static bool __init mmfr1_vh_filter(u64 val) val == 0); } -static const struct ftr_set_desc mmfr1 __initconst = { +static const struct ftr_set_desc mmfr1 __prel64_initconst = { .name = "id_aa64mmfr1", .override = &id_aa64mmfr1_override, .fields = { @@ -70,7 +81,7 @@ static bool __init pfr0_sve_filter(u64 val) return true; } -static const struct ftr_set_desc pfr0 __initconst = { +static const struct ftr_set_desc pfr0 __prel64_initconst = { .name = "id_aa64pfr0", .override = &id_aa64pfr0_override, .fields = { @@ -94,7 +105,7 @@ static bool __init pfr1_sme_filter(u64 val) return true; } -static const struct ftr_set_desc pfr1 __initconst = { +static const struct ftr_set_desc pfr1 __prel64_initconst = { .name = "id_aa64pfr1", .override = &id_aa64pfr1_override, .fields = { @@ -105,7 +116,7 @@ static const struct ftr_set_desc pfr1 __initconst = { }, }; -static const struct ftr_set_desc isar1 __initconst = { +static const struct ftr_set_desc isar1 __prel64_initconst = { .name = "id_aa64isar1", .override = &id_aa64isar1_override, .fields = { @@ -117,7 +128,7 @@ static const struct ftr_set_desc isar1 __initconst = { }, }; -static const struct ftr_set_desc isar2 __initconst = { +static const struct ftr_set_desc isar2 __prel64_initconst = { .name = "id_aa64isar2", .override = &id_aa64isar2_override, .fields = { @@ -128,7 +139,7 @@ static const struct ftr_set_desc isar2 __initconst = { }, }; -static const struct ftr_set_desc smfr0 __initconst = { +static const struct ftr_set_desc smfr0 __prel64_initconst = { .name = "id_aa64smfr0", .override = &id_aa64smfr0_override, .fields = { @@ -149,7 +160,7 @@ static bool __init hvhe_filter(u64 val) ID_AA64MMFR1_EL1_VH_SHIFT)); } -static const struct ftr_set_desc sw_features __initconst = { +static const struct ftr_set_desc sw_features __prel64_initconst = { .name = "arm64_sw", .override = &arm64_sw_feature_override, .fields = { @@ -159,14 +170,15 @@ static const struct ftr_set_desc sw_features __initconst = { }, }; -static const struct ftr_set_desc * const regs[] __initconst = { - &mmfr1, - &pfr0, - &pfr1, - &isar1, - &isar2, - &smfr0, - &sw_features, +static const +PREL64(const struct ftr_set_desc, reg) regs[] __prel64_initconst = { + { &mmfr1 }, + { &pfr0 }, + { &pfr1 }, + { &isar1 }, + { &isar2 }, + { &smfr0 }, + { &sw_features }, }; static const struct { @@ -214,15 +226,20 @@ static void __init match_options(const char *cmdline) int i; for (i = 0; i < ARRAY_SIZE(regs); i++) { + const struct ftr_set_desc *reg = prel64_pointer(regs[i].reg); + struct arm64_ftr_override *override; int f; - for (f = 0; strlen(regs[i]->fields[f].name); f++) { - u64 shift = regs[i]->fields[f].shift; - u64 width = regs[i]->fields[f].width ?: 4; + override = prel64_pointer(reg->override); + + for (f = 0; strlen(reg->fields[f].name); f++) { + u64 shift = reg->fields[f].shift; + u64 width = reg->fields[f].width ?: 4; u64 mask = GENMASK_ULL(shift + width - 1, shift); + bool (*filter)(u64 val); u64 v; - if (find_field(cmdline, regs[i], f, &v)) + if (find_field(cmdline, reg, f, &v)) continue; /* @@ -230,16 +247,16 @@ static void __init match_options(const char *cmdline) * it by setting the value to the all-ones while * clearing the mask... Yes, this is fragile. */ - if (regs[i]->fields[f].filter && - !regs[i]->fields[f].filter(v)) { - regs[i]->override->val |= mask; - regs[i]->override->mask &= ~mask; + filter = prel64_pointer(reg->fields[f].filter); + if (filter && !filter(v)) { + override->val |= mask; + override->mask &= ~mask; continue; } - regs[i]->override->val &= ~mask; - regs[i]->override->val |= (v << shift) & mask; - regs[i]->override->mask |= mask; + override->val &= ~mask; + override->val |= (v << shift) & mask; + override->mask |= mask; return; } @@ -313,11 +330,16 @@ void init_feature_override(u64 boot_status); asmlinkage void __init init_feature_override(u64 boot_status) { + struct arm64_ftr_override *override; + const struct ftr_set_desc *reg; int i; for (i = 0; i < ARRAY_SIZE(regs); i++) { - regs[i]->override->val = 0; - regs[i]->override->mask = 0; + reg = prel64_pointer(regs[i].reg); + override = prel64_pointer(reg->override); + + override->val = 0; + override->mask = 0; } __boot_status = boot_status; @@ -325,8 +347,9 @@ asmlinkage void __init init_feature_override(u64 boot_status) parse_cmdline(); for (i = 0; i < ARRAY_SIZE(regs); i++) { - dcache_clean_inval_poc((unsigned long)regs[i]->override, - (unsigned long)regs[i]->override + - sizeof(*regs[i]->override)); + reg = prel64_pointer(regs[i].reg); + override = prel64_pointer(reg->override); + dcache_clean_inval_poc((unsigned long)override, + (unsigned long)(override + 1)); } } From dc3f5aae06381b43bc9d0d416bd15ee1682940e9 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:16:12 +0100 Subject: [PATCH 62/87] arm64: idreg-override: Avoid parameq() and parameqn() The only way parameq() and parameqn() deviate from the ordinary string and memory routines is that they ignore the difference between dashes and underscores. Since we copy each command line argument into a buffer before passing it to parameq() and parameqn() numerous times, let's just convert all dashes to underscores just once, and update the alias array accordingly. This also helps reduce the dependency on kernel APIs that are no longer available once we move this code into the early mini C runtime. Signed-off-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231129111555.3594833-59-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/idreg-override.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index ca1b8d2dbe99..1eca93446345 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -185,8 +185,8 @@ static const struct { char alias[FTR_ALIAS_NAME_LEN]; char feature[FTR_ALIAS_OPTION_LEN]; } aliases[] __initconst = { - { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, - { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, + { "kvm_arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, + { "kvm_arm.mode=protected", "id_aa64mmfr1.vh=0" }, { "arm64.nosve", "id_aa64pfr0.sve=0" }, { "arm64.nosme", "id_aa64pfr1.sme=0" }, { "arm64.nobti", "id_aa64pfr1.bt=0" }, @@ -215,7 +215,7 @@ static int __init find_field(const char *cmdline, len = snprintf(opt, ARRAY_SIZE(opt), "%s.%s=", reg->name, reg->fields[f].name); - if (!parameqn(cmdline, opt, len)) + if (memcmp(cmdline, opt, len)) return -1; return kstrtou64(cmdline + len, 0, v); @@ -272,23 +272,29 @@ static __init void __parse_cmdline(const char *cmdline, bool parse_aliases) cmdline = skip_spaces(cmdline); - for (len = 0; cmdline[len] && !isspace(cmdline[len]); len++); + /* terminate on "--" appearing on the command line by itself */ + if (cmdline[0] == '-' && cmdline[1] == '-' && isspace(cmdline[2])) + return; + + for (len = 0; cmdline[len] && !isspace(cmdline[len]); len++) { + if (len >= sizeof(buf) - 1) + break; + if (cmdline[len] == '-') + buf[len] = '_'; + else + buf[len] = cmdline[len]; + } if (!len) return; - len = min(len, ARRAY_SIZE(buf) - 1); - memcpy(buf, cmdline, len); - buf[len] = '\0'; - - if (strcmp(buf, "--") == 0) - return; + buf[len] = 0; cmdline += len; match_options(buf); for (i = 0; parse_aliases && i < ARRAY_SIZE(aliases); i++) - if (parameq(buf, aliases[i].alias)) + if (!memcmp(buf, aliases[i].alias, len + 1)) __parse_cmdline(aliases[i].feature, false); } while (1); } From bcf1eed3f8a0b83824b41da82d226cf36320be76 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:16:13 +0100 Subject: [PATCH 63/87] arm64: idreg-override: avoid strlen() to check for empty strings strlen() is a costly way to decide whether a string is empty, as in that case, the first character will be NUL so we can check for that directly. Signed-off-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231129111555.3594833-60-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/idreg-override.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 1eca93446345..8b22ca523186 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -232,7 +232,7 @@ static void __init match_options(const char *cmdline) override = prel64_pointer(reg->override); - for (f = 0; strlen(reg->fields[f].name); f++) { + for (f = 0; reg->fields[f].name[0] != '\0'; f++) { u64 shift = reg->fields[f].shift; u64 width = reg->fields[f].width ?: 4; u64 mask = GENMASK_ULL(shift + width - 1, shift); From 060260a6be47ae163b0c71a6d0902d065b68f3d2 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:16:14 +0100 Subject: [PATCH 64/87] arm64: idreg-override: Avoid sprintf() for simple string concatenation Instead of using sprintf() with the "%s.%s=" format, where the first string argument is always the same in the inner loop of match_options(), use simple memcpy() for string concatenation, and move the first copy to the outer loop. This removes the dependency on sprintf(), which will be difficult to fulfil when we move this code into the early mini C runtime. Signed-off-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231129111555.3594833-61-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/idreg-override.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 8b22ca523186..cf1df5f6fbc1 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -206,14 +206,15 @@ static int __init parse_nokaslr(char *unused) } early_param("nokaslr", parse_nokaslr); -static int __init find_field(const char *cmdline, +static int __init find_field(const char *cmdline, char *opt, int len, const struct ftr_set_desc *reg, int f, u64 *v) { - char opt[FTR_DESC_NAME_LEN + FTR_DESC_FIELD_LEN + 2]; - int len; + int flen = strlen(reg->fields[f].name); - len = snprintf(opt, ARRAY_SIZE(opt), "%s.%s=", - reg->name, reg->fields[f].name); + // append '=' to obtain '.=' + memcpy(opt + len, reg->fields[f].name, flen); + len += flen; + opt[len++] = '='; if (memcmp(cmdline, opt, len)) return -1; @@ -223,15 +224,21 @@ static int __init find_field(const char *cmdline, static void __init match_options(const char *cmdline) { + char opt[FTR_DESC_NAME_LEN + FTR_DESC_FIELD_LEN + 2]; int i; for (i = 0; i < ARRAY_SIZE(regs); i++) { const struct ftr_set_desc *reg = prel64_pointer(regs[i].reg); struct arm64_ftr_override *override; + int len = strlen(reg->name); int f; override = prel64_pointer(reg->override); + // set opt[] to '.' + memcpy(opt, reg->name, len); + opt[len++] = '.'; + for (f = 0; reg->fields[f].name[0] != '\0'; f++) { u64 shift = reg->fields[f].shift; u64 width = reg->fields[f].width ?: 4; @@ -239,7 +246,7 @@ static void __init match_options(const char *cmdline) bool (*filter)(u64 val); u64 v; - if (find_field(cmdline, reg, f, &v)) + if (find_field(cmdline, opt, len, reg, f, &v)) continue; /* From ea48626f8f0efc697555d17bb5853df92461e280 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:16:15 +0100 Subject: [PATCH 65/87] arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit All ID register value overrides are =0 with the exception of the nokaslr pseudo feature which uses =1. In order to remove the dependency on kstrtou64(), which is part of the core kernel and no longer usable once we move idreg-override into the early mini C runtime, let's just parse a single hex digit (with optional leading 0x) and set the output value accordingly. Signed-off-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231129111555.3594833-62-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/idreg-override.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index cf1df5f6fbc1..9646f94094ed 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -206,6 +206,20 @@ static int __init parse_nokaslr(char *unused) } early_param("nokaslr", parse_nokaslr); +static int __init parse_hexdigit(const char *p, u64 *v) +{ + // skip "0x" if it comes next + if (p[0] == '0' && tolower(p[1]) == 'x') + p += 2; + + // check whether the RHS is a single hex digit + if (!isxdigit(p[0]) || (p[1] && !isspace(p[1]))) + return -EINVAL; + + *v = tolower(*p) - (isdigit(*p) ? '0' : 'a' - 10); + return 0; +} + static int __init find_field(const char *cmdline, char *opt, int len, const struct ftr_set_desc *reg, int f, u64 *v) { @@ -219,7 +233,7 @@ static int __init find_field(const char *cmdline, char *opt, int len, if (memcmp(cmdline, opt, len)) return -1; - return kstrtou64(cmdline + len, 0, v); + return parse_hexdigit(cmdline + len, v); } static void __init match_options(const char *cmdline) From 50f176175e96ea5d7cbd8536c0dd774de796ef63 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 29 Nov 2023 12:16:16 +0100 Subject: [PATCH 66/87] arm64/kernel: Move 'nokaslr' parsing out of early idreg code Parsing and ignoring 'nokaslr' can be done from anywhere, except from the code that runs very early and is therefore built with limitations on the kind of relocations it is permitted to use. So move it to a source file that is part of the ordinary kernel build. Signed-off-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231129111555.3594833-63-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/idreg-override.c | 7 ------- arch/arm64/kernel/kaslr.c | 7 +++++++ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 9646f94094ed..e30fd9e32ef3 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -199,13 +199,6 @@ static const struct { { "nokaslr", "arm64_sw.nokaslr=1" }, }; -static int __init parse_nokaslr(char *unused) -{ - /* nokaslr param handling is done by early cpufeature code */ - return 0; -} -early_param("nokaslr", parse_nokaslr); - static int __init parse_hexdigit(const char *p, u64 *v) { // skip "0x" if it comes next diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c index 94a269cd1f07..12c7f3c8ba76 100644 --- a/arch/arm64/kernel/kaslr.c +++ b/arch/arm64/kernel/kaslr.c @@ -36,3 +36,10 @@ void __init kaslr_init(void) pr_info("KASLR enabled\n"); __kaslr_is_enabled = true; } + +static int __init parse_nokaslr(char *unused) +{ + /* nokaslr param handling is done by early cpufeature code */ + return 0; +} +early_param("nokaslr", parse_nokaslr); From 9b19700e623f96222c69ecb2adecb1a3e3664cc0 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 8 Dec 2023 12:32:20 +0100 Subject: [PATCH 67/87] arm64: fpsimd: Drop unneeded 'busy' flag Kernel mode NEON will preserve the user mode FPSIMD state by saving it into the task struct before clobbering the registers. In order to avoid the need for preserving kernel mode state too, we disallow nested use of kernel mode NEON, i..e, use in softirq context while the interrupted task context was using kernel mode NEON too. Originally, this policy was implemented using a per-CPU flag which was exposed via may_use_simd(), requiring the users of the kernel mode NEON to deal with the possibility that it might return false, and having NEON and non-NEON code paths. This policy was changed by commit 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs disabled"), and now, softirq processing is disabled entirely instead, and so may_use_simd() can never fail when called from task or softirq context. This means we can drop the fpsimd_context_busy flag entirely, and instead, ensure that we disable softirq processing in places where we formerly relied on the flag for preventing races in the FPSIMD preserve routines. Signed-off-by: Ard Biesheuvel Reviewed-by: Mark Brown Tested-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20231208113218.3001940-7-ardb@google.com [will: Folded in fix from CAMj1kXFhzbJRyWHELCivQW1yJaF=p07LLtbuyXYX3G1WtsdyQg@mail.gmail.com] Signed-off-by: Will Deacon --- arch/arm64/include/asm/simd.h | 11 +------ arch/arm64/kernel/fpsimd.c | 55 +++++++++-------------------------- 2 files changed, 15 insertions(+), 51 deletions(-) diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h index 6a75d7ecdcaa..8e86c9e70e48 100644 --- a/arch/arm64/include/asm/simd.h +++ b/arch/arm64/include/asm/simd.h @@ -12,8 +12,6 @@ #include #include -DECLARE_PER_CPU(bool, fpsimd_context_busy); - #ifdef CONFIG_KERNEL_MODE_NEON /* @@ -28,17 +26,10 @@ static __must_check inline bool may_use_simd(void) /* * We must make sure that the SVE has been initialized properly * before using the SIMD in kernel. - * fpsimd_context_busy is only set while preemption is disabled, - * and is clear whenever preemption is enabled. Since - * this_cpu_read() is atomic w.r.t. preemption, fpsimd_context_busy - * cannot change under our feet -- if it's set we cannot be - * migrated, and if it's clear we cannot be migrated to a CPU - * where it is set. */ return !WARN_ON(!system_capabilities_finalized()) && system_supports_fpsimd() && - !in_hardirq() && !irqs_disabled() && !in_nmi() && - !this_cpu_read(fpsimd_context_busy); + !in_hardirq() && !irqs_disabled() && !in_nmi(); } #else /* ! CONFIG_KERNEL_MODE_NEON */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 1559c706d32d..37ee261b117d 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -85,13 +85,13 @@ * softirq kicks in. Upon vcpu_put(), KVM will save the vcpu FP state and * flag the register state as invalid. * - * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may - * save the task's FPSIMD context back to task_struct from softirq context. - * To prevent this from racing with the manipulation of the task's FPSIMD state - * from task context and thereby corrupting the state, it is necessary to - * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE - * flag with {, __}get_cpu_fpsimd_context(). This will still allow softirqs to - * run but prevent them to use FPSIMD. + * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may be + * called from softirq context, which will save the task's FPSIMD context back + * to task_struct. To prevent this from racing with the manipulation of the + * task's FPSIMD state from task context and thereby corrupting the state, it + * is necessary to protect any manipulation of a task's fpsimd_state or + * TIF_FOREIGN_FPSTATE flag with get_cpu_fpsimd_context(), which will suspend + * softirq servicing entirely until put_cpu_fpsimd_context() is called. * * For a certain task, the sequence may look something like this: * - the task gets scheduled in; if both the task's fpsimd_cpu field @@ -209,27 +209,14 @@ static inline void sme_free(struct task_struct *t) { } #endif -DEFINE_PER_CPU(bool, fpsimd_context_busy); -EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy); - static void fpsimd_bind_task_to_cpu(void); -static void __get_cpu_fpsimd_context(void) -{ - bool busy = __this_cpu_xchg(fpsimd_context_busy, true); - - WARN_ON(busy); -} - /* * Claim ownership of the CPU FPSIMD context for use by the calling context. * * The caller may freely manipulate the FPSIMD context metadata until * put_cpu_fpsimd_context() is called. * - * The double-underscore version must only be called if you know the task - * can't be preempted. - * * On RT kernels local_bh_disable() is not sufficient because it only * serializes soft interrupt related sections via a local lock, but stays * preemptible. Disabling preemption is the right choice here as bottom @@ -242,14 +229,6 @@ static void get_cpu_fpsimd_context(void) local_bh_disable(); else preempt_disable(); - __get_cpu_fpsimd_context(); -} - -static void __put_cpu_fpsimd_context(void) -{ - bool busy = __this_cpu_xchg(fpsimd_context_busy, false); - - WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */ } /* @@ -261,18 +240,12 @@ static void __put_cpu_fpsimd_context(void) */ static void put_cpu_fpsimd_context(void) { - __put_cpu_fpsimd_context(); if (!IS_ENABLED(CONFIG_PREEMPT_RT)) local_bh_enable(); else preempt_enable(); } -static bool have_cpu_fpsimd_context(void) -{ - return !preemptible() && __this_cpu_read(fpsimd_context_busy); -} - unsigned int task_get_vl(const struct task_struct *task, enum vec_type type) { return task->thread.vl[type]; @@ -383,7 +356,7 @@ static void task_fpsimd_load(void) bool restore_ffr; WARN_ON(!system_supports_fpsimd()); - WARN_ON(!have_cpu_fpsimd_context()); + WARN_ON(preemptible()); if (system_supports_sve() || system_supports_sme()) { switch (current->thread.fp_type) { @@ -467,7 +440,7 @@ static void fpsimd_save(void) unsigned int vl; WARN_ON(!system_supports_fpsimd()); - WARN_ON(!have_cpu_fpsimd_context()); + WARN_ON(preemptible()); if (test_thread_flag(TIF_FOREIGN_FPSTATE)) return; @@ -1507,7 +1480,7 @@ void fpsimd_thread_switch(struct task_struct *next) if (!system_supports_fpsimd()) return; - __get_cpu_fpsimd_context(); + WARN_ON_ONCE(!irqs_disabled()); /* Save unsaved fpsimd state, if any: */ fpsimd_save(); @@ -1523,8 +1496,6 @@ void fpsimd_thread_switch(struct task_struct *next) update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE, wrong_task || wrong_cpu); - - __put_cpu_fpsimd_context(); } static void fpsimd_flush_thread_vl(enum vec_type type) @@ -1826,13 +1797,15 @@ static void fpsimd_flush_cpu_state(void) */ void fpsimd_save_and_flush_cpu_state(void) { + unsigned long flags; + if (!system_supports_fpsimd()) return; WARN_ON(preemptible()); - __get_cpu_fpsimd_context(); + local_irq_save(flags); fpsimd_save(); fpsimd_flush_cpu_state(); - __put_cpu_fpsimd_context(); + local_irq_restore(flags); } #ifdef CONFIG_KERNEL_MODE_NEON From aefbab8e77eb16b56e18f24b85a09ebf4dc60e93 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 8 Dec 2023 12:32:21 +0100 Subject: [PATCH 68/87] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch Currently, the FPSIMD register file is not preserved and restored along with the general registers on exception entry/exit or context switch. For this reason, we disable preemption when enabling FPSIMD for kernel mode use in task context, and suspend the processing of softirqs so that there are no concurrent uses in the kernel. (Kernel mode FPSIMD may not be used at all in other contexts). Disabling preemption while doing CPU intensive work on inputs of potentially unbounded size is bad for real-time performance, which is why we try and ensure that SIMD crypto code does not operate on more than ~4k at a time, which is an arbitrary limit and requires assembler code to implement efficiently. We can avoid the need for disabling preemption if we can ensure that any in-kernel users of the NEON will not lose the FPSIMD register state across a context switch. And given that disabling softirqs implicitly disables preemption as well, we will also have to ensure that a softirq that runs code using FPSIMD can safely interrupt an in-kernel user. So introduce a thread_info flag TIF_KERNEL_FPSTATE, and modify the context switch hook for FPSIMD to preserve and restore the kernel mode FPSIMD to/from struct thread_struct when it is set. This avoids any scheduling blackouts due to prolonged use of FPSIMD in kernel mode, without the need for manual yielding. In order to support softirq processing while FPSIMD is being used in kernel task context, use the same flag to decide whether the kernel mode FPSIMD state needs to be preserved and restored before allowing FPSIMD to be used in softirq context. Signed-off-by: Ard Biesheuvel Reviewed-by: Mark Brown Reviewed-by: Mark Rutland Link: https://lore.kernel.org/r/20231208113218.3001940-8-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/processor.h | 2 + arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/fpsimd.c | 92 ++++++++++++++++++++++------ 3 files changed, 77 insertions(+), 18 deletions(-) diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index e5bc54522e71..ce6eebd6c08b 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -167,6 +167,8 @@ struct thread_struct { unsigned long fault_address; /* fault info */ unsigned long fault_code; /* ESR_EL1 value */ struct debug_info debug; /* debugging */ + + struct user_fpsimd_state kernel_fpsimd_state; #ifdef CONFIG_ARM64_PTR_AUTH struct ptrauth_keys_user keys_user; #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index 553d1bc559c6..e72a3bf9e563 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -80,6 +80,7 @@ void arch_setup_new_exec(void); #define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */ #define TIF_SME 27 /* SME in use */ #define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */ +#define TIF_KERNEL_FPSTATE 29 /* Task is in a kernel mode FPSIMD section */ #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 37ee261b117d..e4595a8d731f 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -357,6 +357,7 @@ static void task_fpsimd_load(void) WARN_ON(!system_supports_fpsimd()); WARN_ON(preemptible()); + WARN_ON(test_thread_flag(TIF_KERNEL_FPSTATE)); if (system_supports_sve() || system_supports_sme()) { switch (current->thread.fp_type) { @@ -379,7 +380,7 @@ static void task_fpsimd_load(void) default: /* * This indicates either a bug in - * fpsimd_save() or memory corruption, we + * fpsimd_save_user_state() or memory corruption, we * should always record an explicit format * when we save. We always at least have the * memory allocated for FPSMID registers so @@ -430,7 +431,7 @@ static void task_fpsimd_load(void) * than via current, if we are saving KVM state then it will have * ensured that the type of registers to save is set in last->to_save. */ -static void fpsimd_save(void) +static void fpsimd_save_user_state(void) { struct cpu_fp_state const *last = this_cpu_ptr(&fpsimd_last_state); @@ -861,7 +862,7 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type, if (task == current) { get_cpu_fpsimd_context(); - fpsimd_save(); + fpsimd_save_user_state(); } fpsimd_flush_task_state(task); @@ -1473,6 +1474,16 @@ void do_fpsimd_exc(unsigned long esr, struct pt_regs *regs) current); } +static void fpsimd_load_kernel_state(struct task_struct *task) +{ + fpsimd_load_state(&task->thread.kernel_fpsimd_state); +} + +static void fpsimd_save_kernel_state(struct task_struct *task) +{ + fpsimd_save_state(&task->thread.kernel_fpsimd_state); +} + void fpsimd_thread_switch(struct task_struct *next) { bool wrong_task, wrong_cpu; @@ -1483,19 +1494,28 @@ void fpsimd_thread_switch(struct task_struct *next) WARN_ON_ONCE(!irqs_disabled()); /* Save unsaved fpsimd state, if any: */ - fpsimd_save(); + if (test_thread_flag(TIF_KERNEL_FPSTATE)) + fpsimd_save_kernel_state(current); + else + fpsimd_save_user_state(); - /* - * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's - * state. For kernel threads, FPSIMD registers are never loaded - * and wrong_task and wrong_cpu will always be true. - */ - wrong_task = __this_cpu_read(fpsimd_last_state.st) != - &next->thread.uw.fpsimd_state; - wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id(); + if (test_tsk_thread_flag(next, TIF_KERNEL_FPSTATE)) { + fpsimd_load_kernel_state(next); + set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE); + } else { + /* + * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's + * state. For kernel threads, FPSIMD registers are never + * loaded with user mode FPSIMD state and so wrong_task and + * wrong_cpu will always be true. + */ + wrong_task = __this_cpu_read(fpsimd_last_state.st) != + &next->thread.uw.fpsimd_state; + wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id(); - update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE, - wrong_task || wrong_cpu); + update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE, + wrong_task || wrong_cpu); + } } static void fpsimd_flush_thread_vl(enum vec_type type) @@ -1585,7 +1605,7 @@ void fpsimd_preserve_current_state(void) return; get_cpu_fpsimd_context(); - fpsimd_save(); + fpsimd_save_user_state(); put_cpu_fpsimd_context(); } @@ -1803,7 +1823,7 @@ void fpsimd_save_and_flush_cpu_state(void) return; WARN_ON(preemptible()); local_irq_save(flags); - fpsimd_save(); + fpsimd_save_user_state(); fpsimd_flush_cpu_state(); local_irq_restore(flags); } @@ -1837,10 +1857,37 @@ void kernel_neon_begin(void) get_cpu_fpsimd_context(); /* Save unsaved fpsimd state, if any: */ - fpsimd_save(); + if (test_thread_flag(TIF_KERNEL_FPSTATE)) { + BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq()); + fpsimd_save_kernel_state(current); + } else { + fpsimd_save_user_state(); + + /* + * Set the thread flag so that the kernel mode FPSIMD state + * will be context switched along with the rest of the task + * state. + * + * On non-PREEMPT_RT, softirqs may interrupt task level kernel + * mode FPSIMD, but the task will not be preemptible so setting + * TIF_KERNEL_FPSTATE for those would be both wrong (as it + * would mark the task context FPSIMD state as requiring a + * context switch) and unnecessary. + * + * On PREEMPT_RT, softirqs are serviced from a separate thread, + * which is scheduled as usual, and this guarantees that these + * softirqs are not interrupting use of the FPSIMD in kernel + * mode in task context. So in this case, setting the flag here + * is always appropriate. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq()) + set_thread_flag(TIF_KERNEL_FPSTATE); + } /* Invalidate any task state remaining in the fpsimd regs: */ fpsimd_flush_cpu_state(); + + put_cpu_fpsimd_context(); } EXPORT_SYMBOL_GPL(kernel_neon_begin); @@ -1858,7 +1905,16 @@ void kernel_neon_end(void) if (!system_supports_fpsimd()) return; - put_cpu_fpsimd_context(); + /* + * If we are returning from a nested use of kernel mode FPSIMD, restore + * the task context kernel mode FPSIMD state. This can only happen when + * running in softirq context on non-PREEMPT_RT. + */ + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && in_serving_softirq() && + test_thread_flag(TIF_KERNEL_FPSTATE)) + fpsimd_load_kernel_state(current); + else + clear_thread_flag(TIF_KERNEL_FPSTATE); } EXPORT_SYMBOL_GPL(kernel_neon_end); From 2632e25217696712681dd1f3ecc0d71624ea3b23 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 8 Dec 2023 12:32:22 +0100 Subject: [PATCH 69/87] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD Now that kernel mode FPSIMD state is context switched along with other task state, we can enable the existing logic that keeps track of which task's FPSIMD state the CPU is holding in its registers. If it is the context of the task that we are switching to, we can elide the reload of the FPSIMD state from memory. Note that we also need to check whether the FPSIMD state on this CPU is the most recent: if a task gets migrated away and back again, the state in memory may be more recent than the state in the CPU. So add another CPU id field to task_struct to keep track of this. (We could reuse the existing CPU id field used for user mode context, but that might result in user state to be discarded unnecessarily, given that two distinct CPUs could be holding the most recent user mode state and the most recent kernel mode state) Signed-off-by: Ard Biesheuvel Reviewed-by: Mark Brown Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20231208113218.3001940-9-ardb@google.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/processor.h | 1 + arch/arm64/kernel/fpsimd.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index ce6eebd6c08b..5b0a04810b23 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -169,6 +169,7 @@ struct thread_struct { struct debug_info debug; /* debugging */ struct user_fpsimd_state kernel_fpsimd_state; + unsigned int kernel_fpsimd_cpu; #ifdef CONFIG_ARM64_PTR_AUTH struct ptrauth_keys_user keys_user; #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index e4595a8d731f..e714d0cd5d1e 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1476,12 +1476,30 @@ void do_fpsimd_exc(unsigned long esr, struct pt_regs *regs) static void fpsimd_load_kernel_state(struct task_struct *task) { + struct cpu_fp_state *last = this_cpu_ptr(&fpsimd_last_state); + + /* + * Elide the load if this CPU holds the most recent kernel mode + * FPSIMD context of the current task. + */ + if (last->st == &task->thread.kernel_fpsimd_state && + task->thread.kernel_fpsimd_cpu == smp_processor_id()) + return; + fpsimd_load_state(&task->thread.kernel_fpsimd_state); } static void fpsimd_save_kernel_state(struct task_struct *task) { + struct cpu_fp_state cpu_fp_state = { + .st = &task->thread.kernel_fpsimd_state, + .to_save = FP_STATE_FPSIMD, + }; + fpsimd_save_state(&task->thread.kernel_fpsimd_state); + fpsimd_bind_state_to_cpu(&cpu_fp_state); + + task->thread.kernel_fpsimd_cpu = smp_processor_id(); } void fpsimd_thread_switch(struct task_struct *next) From eb183b2cd0a6549992eca3c4ada0b1bc1d9340f5 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 13 Dec 2023 09:47:52 +0000 Subject: [PATCH 70/87] Revert "perf/arm_dmc620: Remove duplicate format attribute #defines" This reverts commit a5f4ca68f348ac059efd6a3d7ad4040aed1c0818. Pulling in the Arm-specific 'linux/perf/arm_pmu.h' header breaks the allmodconfig build for x86: > In file included from drivers/perf/arm_dmc620_pmu.c:26: > include/linux/perf/arm_pmu.h:15:10: fatal error: asm/cputype.h: No such file or directory > 15 | #include > | ^~~~~~~~~~~~~~~ Just put things back like they were so that the driver can continue to be compile-tested on a variety of architectures. Link: https://lore.kernel.org/r/20231213100931.12d9d85e@canb.auug.org.au Reported-by: Stephen Rothwell Signed-off-by: Will Deacon --- drivers/perf/arm_dmc620_pmu.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c index 9de9dc8db8db..30cea6859574 100644 --- a/drivers/perf/arm_dmc620_pmu.c +++ b/drivers/perf/arm_dmc620_pmu.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include @@ -190,6 +189,27 @@ static const struct attribute_group dmc620_pmu_events_attr_group = { #define ATTR_CFG_FLD_clkdiv2_LO 9 #define ATTR_CFG_FLD_clkdiv2_HI 9 +#define __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \ + (lo) == (hi) ? #cfg ":" #lo "\n" : #cfg ":" #lo "-" #hi + +#define _GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \ + __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) + +#define GEN_PMU_FORMAT_ATTR(name) \ + PMU_FORMAT_ATTR(name, \ + _GEN_PMU_FORMAT_ATTR(ATTR_CFG_FLD_##name##_CFG, \ + ATTR_CFG_FLD_##name##_LO, \ + ATTR_CFG_FLD_##name##_HI)) + +#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi) \ + ((((attr)->cfg) >> lo) & GENMASK_ULL(hi - lo, 0)) + +#define ATTR_CFG_GET_FLD(attr, name) \ + _ATTR_CFG_GET_FLD(attr, \ + ATTR_CFG_FLD_##name##_CFG, \ + ATTR_CFG_FLD_##name##_LO, \ + ATTR_CFG_FLD_##name##_HI) + GEN_PMU_FORMAT_ATTR(mask); GEN_PMU_FORMAT_ATTR(match); GEN_PMU_FORMAT_ATTR(invert); From 7b1a09e44dc64f4f5930659b6d14a27183c00705 Mon Sep 17 00:00:00 2001 From: Huang Shijie Date: Wed, 13 Dec 2023 09:20:46 +0800 Subject: [PATCH 71/87] arm64: irq: set the correct node for shadow call stack The init_irq_stacks() has been changed to use the correct node: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?id=75b5e0bf90bf The init_irq_scs() has the same issue with init_irq_stacks(): cpu_to_node() is not initialized yet, it does not work. This patch uses early_cpu_to_node() to set the init_irq_scs() with the correct node. Signed-off-by: Huang Shijie Reviewed-by: Catalin Marinas Link: https://lore.kernel.org/r/20231213012046.12014-1-shijie@os.amperecomputing.com Signed-off-by: Will Deacon --- arch/arm64/kernel/irq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c index 9f253d8efe90..85087e2df564 100644 --- a/arch/arm64/kernel/irq.c +++ b/arch/arm64/kernel/irq.c @@ -48,7 +48,7 @@ static void init_irq_scs(void) for_each_possible_cpu(cpu) per_cpu(irq_shadow_call_stack_ptr, cpu) = - scs_alloc(cpu_to_node(cpu)); + scs_alloc(early_cpu_to_node(cpu)); } #ifdef CONFIG_VMAP_STACK From cae40614cdd61a6601dc87c6e07c06bf642a125b Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Fri, 8 Dec 2023 10:56:48 +0800 Subject: [PATCH 72/87] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Alibaba's T-Head Yitan 710 SoC includes Synopsys' DesignWare Core PCIe controller which implements PMU for performance and functional debugging to facilitate system maintenance. Document it to provide guidance on how to use it. Signed-off-by: Shuai Xue Reviewed-by: Baolin Wang Reviewed-by: Jonathan Cameron Reviewed-by: Yicong Yang Tested-by: Ilkka Koskinen Link: https://lore.kernel.org/r/20231208025652.87192-2-xueshuai@linux.alibaba.com Signed-off-by: Will Deacon --- .../admin-guide/perf/dwc_pcie_pmu.rst | 94 +++++++++++++++++++ Documentation/admin-guide/perf/index.rst | 1 + 2 files changed, 95 insertions(+) create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst new file mode 100644 index 000000000000..d47cd229d710 --- /dev/null +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst @@ -0,0 +1,94 @@ +====================================================================== +Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU) +====================================================================== + +DesignWare Cores (DWC) PCIe PMU +=============================== + +The PMU is a PCIe configuration space register block provided by each PCIe Root +Port in a Vendor-Specific Extended Capability named RAS D.E.S (Debug, Error +injection, and Statistics). + +As the name indicates, the RAS DES capability supports system level +debugging, AER error injection, and collection of statistics. To facilitate +collection of statistics, Synopsys DesignWare Cores PCIe controller +provides the following two features: + +- one 64-bit counter for Time Based Analysis (RX/TX data throughput and + time spent in each low-power LTSSM state) and +- one 32-bit counter for Event Counting (error and non-error events for + a specified lane) + +Note: There is no interrupt for counter overflow. + +Time Based Analysis +------------------- + +Using this feature you can obtain information regarding RX/TX data +throughput and time spent in each low-power LTSSM state by the controller. +The PMU measures data in two categories: + +- Group#0: Percentage of time the controller stays in LTSSM states. +- Group#1: Amount of data processed (Units of 16 bytes). + +Lane Event counters +------------------- + +Using this feature you can obtain Error and Non-Error information in +specific lane by the controller. The PMU event is selected by all of: + +- Group i +- Event j within the Group i +- Lane k + +Some of the events only exist for specific configurations. + +DesignWare Cores (DWC) PCIe PMU Driver +======================================= + +This driver adds PMU devices for each PCIe Root Port named based on the BDF of +the Root Port. For example, + + 30:03.0 PCI bridge: Device 1ded:8000 (rev 01) + +the PMU device name for this Root Port is dwc_rootport_3018. + +The DWC PCIe PMU driver registers a perf PMU driver, which provides +description of available events and configuration options in sysfs, see +/sys/bus/event_source/devices/dwc_rootport_{bdf}. + +The "format" directory describes format of the config fields of the +perf_event_attr structure. The "events" directory provides configuration +templates for all documented events. For example, +"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1". + +The "perf list" command shall list the available events from sysfs, e.g.:: + + $# perf list | grep dwc_rootport + <...> + dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event] + <...> + dwc_rootport_3018/rx_memory_read,lane=?/ [Kernel PMU event] + +Time Based Analysis Event Usage +------------------------------- + +Example usage of counting PCIe RX TLP data payload (Units of bytes):: + + $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ + +The average RX/TX bandwidth can be calculated using the following formula: + + PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window + PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window + +Lane Event Usage +------------------------------- + +Each lane has the same event set and to avoid generating a list of hundreds +of events, the user need to specify the lane ID explicitly, e.g.:: + + $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/ + +The driver does not support sampling, therefore "perf record" will not +work. Per-task (without "-a") perf sessions are not supported. diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index a2e6f2c81146..f4a4513c526f 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -19,6 +19,7 @@ Performance monitor support arm_dsu_pmu thunderx2-pmu alibaba_pmu + dwc_pcie_pmu nvidia-pmu meson-ddr-pmu cxl From ad6534c626fedd818718d76c36d69c7d8e7b61cc Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Fri, 8 Dec 2023 10:56:49 +0800 Subject: [PATCH 73/87] PCI: Add Alibaba Vendor ID to linux/pci_ids.h The Alibaba Vendor ID (0x1ded) is now used by Alibaba elasticRDMA ("erdma") and will be shared with the upcoming PCIe PMU ("dwc_pcie_pmu"). Move the Vendor ID to linux/pci_ids.h so that it can shared by several drivers later. Signed-off-by: Shuai Xue Acked-by: Bjorn Helgaas # pci_ids.h Tested-by: Ilkka Koskinen Link: https://lore.kernel.org/r/20231208025652.87192-3-xueshuai@linux.alibaba.com Signed-off-by: Will Deacon --- drivers/infiniband/hw/erdma/erdma_hw.h | 2 -- include/linux/pci_ids.h | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/erdma/erdma_hw.h b/drivers/infiniband/hw/erdma/erdma_hw.h index 9d316fdc6f9a..a155519a862f 100644 --- a/drivers/infiniband/hw/erdma/erdma_hw.h +++ b/drivers/infiniband/hw/erdma/erdma_hw.h @@ -11,8 +11,6 @@ #include /* PCIe device related definition. */ -#define PCI_VENDOR_ID_ALIBABA 0x1ded - #define ERDMA_PCI_WIDTH 64 #define ERDMA_FUNC_BAR 0 #define ERDMA_MISX_BAR 2 diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 275799b5f535..844ffdac8d7d 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2605,6 +2605,8 @@ #define PCI_VENDOR_ID_TEKRAM 0x1de1 #define PCI_DEVICE_ID_TEKRAM_DC290 0xdc29 +#define PCI_VENDOR_ID_ALIBABA 0x1ded + #define PCI_VENDOR_ID_TEHUTI 0x1fc9 #define PCI_DEVICE_ID_TEHUTI_3009 0x3009 #define PCI_DEVICE_ID_TEHUTI_3010 0x3010 From ac16087134b837d42b75bb1c741070b6c142f258 Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Fri, 8 Dec 2023 10:56:50 +0800 Subject: [PATCH 74/87] PCI: Move pci_clear_and_set_dword() helper to PCI header The clear and set pattern is commonly used for accessing PCI config, move the helper pci_clear_and_set_dword() from aspm.c into PCI header. In addition, rename to pci_clear_and_set_config_dword() to retain the "config" information and match the other accessors. No functional change intended. Signed-off-by: Shuai Xue Acked-by: Bjorn Helgaas Tested-by: Ilkka Koskinen Link: https://lore.kernel.org/r/20231208025652.87192-4-xueshuai@linux.alibaba.com Signed-off-by: Will Deacon --- drivers/pci/access.c | 12 ++++++++ drivers/pci/pcie/aspm.c | 65 +++++++++++++++++++---------------------- include/linux/pci.h | 2 ++ 3 files changed, 44 insertions(+), 35 deletions(-) diff --git a/drivers/pci/access.c b/drivers/pci/access.c index 6554a2e89d36..6449056b57dd 100644 --- a/drivers/pci/access.c +++ b/drivers/pci/access.c @@ -598,3 +598,15 @@ int pci_write_config_dword(const struct pci_dev *dev, int where, return pci_bus_write_config_dword(dev->bus, dev->devfn, where, val); } EXPORT_SYMBOL(pci_write_config_dword); + +void pci_clear_and_set_config_dword(const struct pci_dev *dev, int pos, + u32 clear, u32 set) +{ + u32 val; + + pci_read_config_dword(dev, pos, &val); + val &= ~clear; + val |= set; + pci_write_config_dword(dev, pos, val); +} +EXPORT_SYMBOL(pci_clear_and_set_config_dword); diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index 50b04ae5c394..a720a8e3ced2 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -426,17 +426,6 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint) } } -static void pci_clear_and_set_dword(struct pci_dev *pdev, int pos, - u32 clear, u32 set) -{ - u32 val; - - pci_read_config_dword(pdev, pos, &val); - val &= ~clear; - val |= set; - pci_write_config_dword(pdev, pos, val); -} - /* Calculate L1.2 PM substate timing parameters */ static void aspm_calc_l12_info(struct pcie_link_state *link, u32 parent_l1ss_cap, u32 child_l1ss_cap) @@ -501,10 +490,12 @@ static void aspm_calc_l12_info(struct pcie_link_state *link, cl1_2_enables = cctl1 & PCI_L1SS_CTL1_L1_2_MASK; if (pl1_2_enables || cl1_2_enables) { - pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_L1_2_MASK, 0); - pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_L1_2_MASK, 0); + pci_clear_and_set_config_dword(child, + child->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_L1_2_MASK, 0); + pci_clear_and_set_config_dword(parent, + parent->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_L1_2_MASK, 0); } /* Program T_POWER_ON times in both ports */ @@ -512,22 +503,26 @@ static void aspm_calc_l12_info(struct pcie_link_state *link, pci_write_config_dword(child, child->l1ss + PCI_L1SS_CTL2, ctl2); /* Program Common_Mode_Restore_Time in upstream device */ - pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_CM_RESTORE_TIME, ctl1); + pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_CM_RESTORE_TIME, ctl1); /* Program LTR_L1.2_THRESHOLD time in both ports */ - pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_LTR_L12_TH_VALUE | - PCI_L1SS_CTL1_LTR_L12_TH_SCALE, ctl1); - pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_LTR_L12_TH_VALUE | - PCI_L1SS_CTL1_LTR_L12_TH_SCALE, ctl1); + pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_LTR_L12_TH_VALUE | + PCI_L1SS_CTL1_LTR_L12_TH_SCALE, + ctl1); + pci_clear_and_set_config_dword(child, child->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_LTR_L12_TH_VALUE | + PCI_L1SS_CTL1_LTR_L12_TH_SCALE, + ctl1); if (pl1_2_enables || cl1_2_enables) { - pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1, 0, - pl1_2_enables); - pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1, 0, - cl1_2_enables); + pci_clear_and_set_config_dword(parent, + parent->l1ss + PCI_L1SS_CTL1, 0, + pl1_2_enables); + pci_clear_and_set_config_dword(child, + child->l1ss + PCI_L1SS_CTL1, 0, + cl1_2_enables); } } @@ -687,10 +682,10 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state) */ /* Disable all L1 substates */ - pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_L1SS_MASK, 0); - pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_L1SS_MASK, 0); + pci_clear_and_set_config_dword(child, child->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_L1SS_MASK, 0); + pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_L1SS_MASK, 0); /* * If needed, disable L1, and it gets enabled later * in pcie_config_aspm_link(). @@ -713,10 +708,10 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state) val |= PCI_L1SS_CTL1_PCIPM_L1_2; /* Enable what we need to enable */ - pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_L1SS_MASK, val); - pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1, - PCI_L1SS_CTL1_L1SS_MASK, val); + pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_L1SS_MASK, val); + pci_clear_and_set_config_dword(child, child->l1ss + PCI_L1SS_CTL1, + PCI_L1SS_CTL1_L1SS_MASK, val); } static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val) diff --git a/include/linux/pci.h b/include/linux/pci.h index 60ca768bc867..268c4bd98ef3 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1239,6 +1239,8 @@ int pci_read_config_dword(const struct pci_dev *dev, int where, u32 *val); int pci_write_config_byte(const struct pci_dev *dev, int where, u8 val); int pci_write_config_word(const struct pci_dev *dev, int where, u16 val); int pci_write_config_dword(const struct pci_dev *dev, int where, u32 val); +void pci_clear_and_set_config_dword(const struct pci_dev *dev, int pos, + u32 clear, u32 set); int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val); int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val); From af9597adc2f1e3609c67c9792a2469bb64e43ae9 Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Fri, 8 Dec 2023 10:56:51 +0800 Subject: [PATCH 75/87] drivers/perf: add DesignWare PCIe PMU driver This commit adds the PCIe Performance Monitoring Unit (PMU) driver support for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express Core controller IP which provides statistics feature. The PMU is a PCIe configuration space register block provided by each PCIe Root Port in a Vendor-Specific Extended Capability named RAS D.E.S (Debug, Error injection, and Statistics). To facilitate collection of statistics the controller provides the following two features for each Root Port: - one 64-bit counter for Time Based Analysis (RX/TX data throughput and time spent in each low-power LTSSM state) and - one 32-bit counter for Event Counting (error and non-error events for a specified lane) Note: There is no interrupt for counter overflow. This driver adds PMU devices for each PCIe Root Port. And the PMU device is named based the BDF of Root Port. For example, 30:03.0 PCI bridge: Device 1ded:8000 (rev 01) the PMU device name for this Root Port is dwc_rootport_3018. Example usage of counting PCIe RX TLP data payload (Units of bytes):: $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ average RX bandwidth can be calculated like this: PCIe TX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window Signed-off-by: Shuai Xue Reviewed-by: Baolin Wang Reviewed-by: Jonathan Cameron Reviewed-by: Yicong Yang Reviewed-and-tested-by: Ilkka Koskinen Link: https://lore.kernel.org/r/20231208025652.87192-5-xueshuai@linux.alibaba.com [will: Fix sparse error due to use of uninitialised 'vsec' symbol in dwc_pcie_match_des_cap()] Signed-off-by: Will Deacon --- drivers/perf/Kconfig | 7 + drivers/perf/Makefile | 1 + drivers/perf/dwc_pcie_pmu.c | 792 ++++++++++++++++++++++++++++++++++++ 3 files changed, 800 insertions(+) create mode 100644 drivers/perf/dwc_pcie_pmu.c diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index 273d67ecf6d2..ec6e0d9194a1 100644 --- a/drivers/perf/Kconfig +++ b/drivers/perf/Kconfig @@ -217,6 +217,13 @@ config MARVELL_CN10K_DDR_PMU Enable perf support for Marvell DDR Performance monitoring event on CN10K platform. +config DWC_PCIE_PMU + tristate "Synopsys DesignWare PCIe PMU" + depends on PCI + help + Enable perf support for Synopsys DesignWare PCIe PMU Performance + monitoring event on platform including the Alibaba Yitian 710. + source "drivers/perf/arm_cspmu/Kconfig" source "drivers/perf/amlogic/Kconfig" diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile index 16b3ec4db916..a06338e3401c 100644 --- a/drivers/perf/Makefile +++ b/drivers/perf/Makefile @@ -23,6 +23,7 @@ obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o obj-$(CONFIG_ALIBABA_UNCORE_DRW_PMU) += alibaba_uncore_drw_pmu.o +obj-$(CONFIG_DWC_PCIE_PMU) += dwc_pcie_pmu.o obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/ obj-$(CONFIG_MESON_DDR_PMU) += amlogic/ obj-$(CONFIG_CXL_PMU) += cxl_pmu.o diff --git a/drivers/perf/dwc_pcie_pmu.c b/drivers/perf/dwc_pcie_pmu.c new file mode 100644 index 000000000000..957058ad0099 --- /dev/null +++ b/drivers/perf/dwc_pcie_pmu.c @@ -0,0 +1,792 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Synopsys DesignWare PCIe PMU driver + * + * Copyright (C) 2021-2023 Alibaba Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DWC_PCIE_VSEC_RAS_DES_ID 0x02 +#define DWC_PCIE_EVENT_CNT_CTL 0x8 + +/* + * Event Counter Data Select includes two parts: + * - 27-24: Group number(4-bit: 0..0x7) + * - 23-16: Event number(8-bit: 0..0x13) within the Group + * + * Put them together as in TRM. + */ +#define DWC_PCIE_CNT_EVENT_SEL GENMASK(27, 16) +#define DWC_PCIE_CNT_LANE_SEL GENMASK(11, 8) +#define DWC_PCIE_CNT_STATUS BIT(7) +#define DWC_PCIE_CNT_ENABLE GENMASK(4, 2) +#define DWC_PCIE_PER_EVENT_OFF 0x1 +#define DWC_PCIE_PER_EVENT_ON 0x3 +#define DWC_PCIE_EVENT_CLEAR GENMASK(1, 0) +#define DWC_PCIE_EVENT_PER_CLEAR 0x1 + +#define DWC_PCIE_EVENT_CNT_DATA 0xC + +#define DWC_PCIE_TIME_BASED_ANAL_CTL 0x10 +#define DWC_PCIE_TIME_BASED_REPORT_SEL GENMASK(31, 24) +#define DWC_PCIE_TIME_BASED_DURATION_SEL GENMASK(15, 8) +#define DWC_PCIE_DURATION_MANUAL_CTL 0x0 +#define DWC_PCIE_DURATION_1MS 0x1 +#define DWC_PCIE_DURATION_10MS 0x2 +#define DWC_PCIE_DURATION_100MS 0x3 +#define DWC_PCIE_DURATION_1S 0x4 +#define DWC_PCIE_DURATION_2S 0x5 +#define DWC_PCIE_DURATION_4S 0x6 +#define DWC_PCIE_DURATION_4US 0xFF +#define DWC_PCIE_TIME_BASED_TIMER_START BIT(0) +#define DWC_PCIE_TIME_BASED_CNT_ENABLE 0x1 + +#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW 0x14 +#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH 0x18 + +/* Event attributes */ +#define DWC_PCIE_CONFIG_EVENTID GENMASK(15, 0) +#define DWC_PCIE_CONFIG_TYPE GENMASK(19, 16) +#define DWC_PCIE_CONFIG_LANE GENMASK(27, 20) + +#define DWC_PCIE_EVENT_ID(event) FIELD_GET(DWC_PCIE_CONFIG_EVENTID, (event)->attr.config) +#define DWC_PCIE_EVENT_TYPE(event) FIELD_GET(DWC_PCIE_CONFIG_TYPE, (event)->attr.config) +#define DWC_PCIE_EVENT_LANE(event) FIELD_GET(DWC_PCIE_CONFIG_LANE, (event)->attr.config) + +enum dwc_pcie_event_type { + DWC_PCIE_TIME_BASE_EVENT, + DWC_PCIE_LANE_EVENT, + DWC_PCIE_EVENT_TYPE_MAX, +}; + +#define DWC_PCIE_LANE_EVENT_MAX_PERIOD GENMASK_ULL(31, 0) +#define DWC_PCIE_MAX_PERIOD GENMASK_ULL(63, 0) + +struct dwc_pcie_pmu { + struct pmu pmu; + struct pci_dev *pdev; /* Root Port device */ + u16 ras_des_offset; + u32 nr_lanes; + + struct list_head pmu_node; + struct hlist_node cpuhp_node; + struct perf_event *event[DWC_PCIE_EVENT_TYPE_MAX]; + int on_cpu; +}; + +#define to_dwc_pcie_pmu(p) (container_of(p, struct dwc_pcie_pmu, pmu)) + +static int dwc_pcie_pmu_hp_state; +static struct list_head dwc_pcie_dev_info_head = + LIST_HEAD_INIT(dwc_pcie_dev_info_head); +static bool notify; + +struct dwc_pcie_dev_info { + struct platform_device *plat_dev; + struct pci_dev *pdev; + struct list_head dev_node; +}; + +struct dwc_pcie_vendor_id { + int vendor_id; +}; + +static const struct dwc_pcie_vendor_id dwc_pcie_vendor_ids[] = { + {.vendor_id = PCI_VENDOR_ID_ALIBABA }, + {} /* terminator */ +}; + +static ssize_t cpumask_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(dev_get_drvdata(dev)); + + return cpumap_print_to_pagebuf(true, buf, cpumask_of(pcie_pmu->on_cpu)); +} +static DEVICE_ATTR_RO(cpumask); + +static struct attribute *dwc_pcie_pmu_cpumask_attrs[] = { + &dev_attr_cpumask.attr, + NULL +}; + +static struct attribute_group dwc_pcie_cpumask_attr_group = { + .attrs = dwc_pcie_pmu_cpumask_attrs, +}; + +struct dwc_pcie_format_attr { + struct device_attribute attr; + u64 field; + int config; +}; + +PMU_FORMAT_ATTR(eventid, "config:0-15"); +PMU_FORMAT_ATTR(type, "config:16-19"); +PMU_FORMAT_ATTR(lane, "config:20-27"); + +static struct attribute *dwc_pcie_format_attrs[] = { + &format_attr_type.attr, + &format_attr_eventid.attr, + &format_attr_lane.attr, + NULL, +}; + +static struct attribute_group dwc_pcie_format_attrs_group = { + .name = "format", + .attrs = dwc_pcie_format_attrs, +}; + +struct dwc_pcie_event_attr { + struct device_attribute attr; + enum dwc_pcie_event_type type; + u16 eventid; + u8 lane; +}; + +static ssize_t dwc_pcie_event_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dwc_pcie_event_attr *eattr; + + eattr = container_of(attr, typeof(*eattr), attr); + + if (eattr->type == DWC_PCIE_LANE_EVENT) + return sysfs_emit(buf, "eventid=0x%x,type=0x%x,lane=?\n", + eattr->eventid, eattr->type); + else if (eattr->type == DWC_PCIE_TIME_BASE_EVENT) + return sysfs_emit(buf, "eventid=0x%x,type=0x%x\n", + eattr->eventid, eattr->type); + + return 0; +} + +#define DWC_PCIE_EVENT_ATTR(_name, _type, _eventid, _lane) \ + (&((struct dwc_pcie_event_attr[]) {{ \ + .attr = __ATTR(_name, 0444, dwc_pcie_event_show, NULL), \ + .type = _type, \ + .eventid = _eventid, \ + .lane = _lane, \ + }})[0].attr.attr) + +#define DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(_name, _eventid) \ + DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_TIME_BASE_EVENT, _eventid, 0) +#define DWC_PCIE_PMU_LANE_EVENT_ATTR(_name, _eventid) \ + DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_LANE_EVENT, _eventid, 0) + +static struct attribute *dwc_pcie_pmu_time_event_attrs[] = { + /* Group #0 */ + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(one_cycle, 0x00), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_L0S, 0x01), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(RX_L0S, 0x02), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L0, 0x03), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1, 0x04), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_1, 0x05), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_2, 0x06), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(CFG_RCVRY, 0x07), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_RX_L0S, 0x08), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_AUX, 0x09), + + /* Group #1 */ + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_PCIe_TLP_Data_Payload, 0x20), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_PCIe_TLP_Data_Payload, 0x21), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_CCIX_TLP_Data_Payload, 0x22), + DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_CCIX_TLP_Data_Payload, 0x23), + + /* + * Leave it to the user to specify the lane ID to avoid generating + * a list of hundreds of events. + */ + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ack_dllp, 0x600), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_update_fc_dllp, 0x601), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ack_dllp, 0x602), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_update_fc_dllp, 0x603), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_nulified_tlp, 0x604), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_nulified_tlp, 0x605), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_duplicate_tl, 0x606), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_write, 0x700), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_read, 0x701), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_write, 0x702), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_read, 0x703), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_write, 0x704), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_read, 0x705), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_without_data, 0x706), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_with_data, 0x707), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_message_tlp, 0x708), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_atomic, 0x709), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_tlp_with_prefix, 0x70A), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_write, 0x70B), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_read, 0x70C), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_write, 0x70F), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_read, 0x710), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_without_data, 0x711), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_with_data, 0x712), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_message_tlp, 0x713), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_atomic, 0x714), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_tlp_with_prefix, 0x715), + DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ccix_tlp, 0x716), + DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ccix_tlp, 0x717), + NULL +}; + +static const struct attribute_group dwc_pcie_event_attrs_group = { + .name = "events", + .attrs = dwc_pcie_pmu_time_event_attrs, +}; + +static const struct attribute_group *dwc_pcie_attr_groups[] = { + &dwc_pcie_event_attrs_group, + &dwc_pcie_format_attrs_group, + &dwc_pcie_cpumask_attr_group, + NULL +}; + +static void dwc_pcie_pmu_lane_event_enable(struct dwc_pcie_pmu *pcie_pmu, + bool enable) +{ + struct pci_dev *pdev = pcie_pmu->pdev; + u16 ras_des_offset = pcie_pmu->ras_des_offset; + + if (enable) + pci_clear_and_set_config_dword(pdev, + ras_des_offset + DWC_PCIE_EVENT_CNT_CTL, + DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_ON); + else + pci_clear_and_set_config_dword(pdev, + ras_des_offset + DWC_PCIE_EVENT_CNT_CTL, + DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF); +} + +static void dwc_pcie_pmu_time_based_event_enable(struct dwc_pcie_pmu *pcie_pmu, + bool enable) +{ + struct pci_dev *pdev = pcie_pmu->pdev; + u16 ras_des_offset = pcie_pmu->ras_des_offset; + + pci_clear_and_set_config_dword(pdev, + ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_CTL, + DWC_PCIE_TIME_BASED_TIMER_START, enable); +} + +static u64 dwc_pcie_pmu_read_lane_event_counter(struct perf_event *event) +{ + struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu); + struct pci_dev *pdev = pcie_pmu->pdev; + u16 ras_des_offset = pcie_pmu->ras_des_offset; + u32 val; + + pci_read_config_dword(pdev, ras_des_offset + DWC_PCIE_EVENT_CNT_DATA, &val); + + return val; +} + +static u64 dwc_pcie_pmu_read_time_based_counter(struct perf_event *event) +{ + struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu); + struct pci_dev *pdev = pcie_pmu->pdev; + int event_id = DWC_PCIE_EVENT_ID(event); + u16 ras_des_offset = pcie_pmu->ras_des_offset; + u32 lo, hi, ss; + u64 val; + + /* + * The 64-bit value of the data counter is spread across two + * registers that are not synchronized. In order to read them + * atomically, ensure that the high 32 bits match before and after + * reading the low 32 bits. + */ + pci_read_config_dword(pdev, + ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH, &hi); + do { + /* snapshot the high 32 bits */ + ss = hi; + + pci_read_config_dword( + pdev, ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW, + &lo); + pci_read_config_dword( + pdev, ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH, + &hi); + } while (hi != ss); + + val = ((u64)hi << 32) | lo; + /* + * The Group#1 event measures the amount of data processed in 16-byte + * units. Simplify the end-user interface by multiplying the counter + * at the point of read. + */ + if (event_id >= 0x20 && event_id <= 0x23) + val *= 16; + + return val; +} + +static void dwc_pcie_pmu_event_update(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event); + u64 delta, prev, now = 0; + + do { + prev = local64_read(&hwc->prev_count); + + if (type == DWC_PCIE_LANE_EVENT) + now = dwc_pcie_pmu_read_lane_event_counter(event); + else if (type == DWC_PCIE_TIME_BASE_EVENT) + now = dwc_pcie_pmu_read_time_based_counter(event); + + } while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev); + + delta = (now - prev) & DWC_PCIE_MAX_PERIOD; + /* 32-bit counter for Lane Event Counting */ + if (type == DWC_PCIE_LANE_EVENT) + delta &= DWC_PCIE_LANE_EVENT_MAX_PERIOD; + + local64_add(delta, &event->count); +} + +static int dwc_pcie_pmu_event_init(struct perf_event *event) +{ + struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu); + enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event); + struct perf_event *sibling; + u32 lane; + + if (event->attr.type != event->pmu->type) + return -ENOENT; + + /* We don't support sampling */ + if (is_sampling_event(event)) + return -EINVAL; + + /* We cannot support task bound events */ + if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) + return -EINVAL; + + if (event->group_leader != event && + !is_software_event(event->group_leader)) + return -EINVAL; + + for_each_sibling_event(sibling, event->group_leader) { + if (sibling->pmu != event->pmu && !is_software_event(sibling)) + return -EINVAL; + } + + if (type < 0 || type >= DWC_PCIE_EVENT_TYPE_MAX) + return -EINVAL; + + if (type == DWC_PCIE_LANE_EVENT) { + lane = DWC_PCIE_EVENT_LANE(event); + if (lane < 0 || lane >= pcie_pmu->nr_lanes) + return -EINVAL; + } + + event->cpu = pcie_pmu->on_cpu; + + return 0; +} + +static void dwc_pcie_pmu_event_start(struct perf_event *event, int flags) +{ + struct hw_perf_event *hwc = &event->hw; + struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu); + enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event); + + hwc->state = 0; + local64_set(&hwc->prev_count, 0); + + if (type == DWC_PCIE_LANE_EVENT) + dwc_pcie_pmu_lane_event_enable(pcie_pmu, true); + else if (type == DWC_PCIE_TIME_BASE_EVENT) + dwc_pcie_pmu_time_based_event_enable(pcie_pmu, true); +} + +static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags) +{ + struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu); + enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event); + struct hw_perf_event *hwc = &event->hw; + + if (event->hw.state & PERF_HES_STOPPED) + return; + + if (type == DWC_PCIE_LANE_EVENT) + dwc_pcie_pmu_lane_event_enable(pcie_pmu, false); + else if (type == DWC_PCIE_TIME_BASE_EVENT) + dwc_pcie_pmu_time_based_event_enable(pcie_pmu, false); + + dwc_pcie_pmu_event_update(event); + hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE; +} + +static int dwc_pcie_pmu_event_add(struct perf_event *event, int flags) +{ + struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu); + struct pci_dev *pdev = pcie_pmu->pdev; + struct hw_perf_event *hwc = &event->hw; + enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event); + int event_id = DWC_PCIE_EVENT_ID(event); + int lane = DWC_PCIE_EVENT_LANE(event); + u16 ras_des_offset = pcie_pmu->ras_des_offset; + u32 ctrl; + + /* one counter for each type and it is in use */ + if (pcie_pmu->event[type]) + return -ENOSPC; + + pcie_pmu->event[type] = event; + hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE; + + if (type == DWC_PCIE_LANE_EVENT) { + /* EVENT_COUNTER_DATA_REG needs clear manually */ + ctrl = FIELD_PREP(DWC_PCIE_CNT_EVENT_SEL, event_id) | + FIELD_PREP(DWC_PCIE_CNT_LANE_SEL, lane) | + FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF) | + FIELD_PREP(DWC_PCIE_EVENT_CLEAR, DWC_PCIE_EVENT_PER_CLEAR); + pci_write_config_dword(pdev, ras_des_offset + DWC_PCIE_EVENT_CNT_CTL, + ctrl); + } else if (type == DWC_PCIE_TIME_BASE_EVENT) { + /* + * TIME_BASED_ANAL_DATA_REG is a 64 bit register, we can safely + * use it with any manually controlled duration. And it is + * cleared when next measurement starts. + */ + ctrl = FIELD_PREP(DWC_PCIE_TIME_BASED_REPORT_SEL, event_id) | + FIELD_PREP(DWC_PCIE_TIME_BASED_DURATION_SEL, + DWC_PCIE_DURATION_MANUAL_CTL) | + DWC_PCIE_TIME_BASED_CNT_ENABLE; + pci_write_config_dword( + pdev, ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_CTL, ctrl); + } + + if (flags & PERF_EF_START) + dwc_pcie_pmu_event_start(event, PERF_EF_RELOAD); + + perf_event_update_userpage(event); + + return 0; +} + +static void dwc_pcie_pmu_event_del(struct perf_event *event, int flags) +{ + struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu); + enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event); + + dwc_pcie_pmu_event_stop(event, flags | PERF_EF_UPDATE); + perf_event_update_userpage(event); + pcie_pmu->event[type] = NULL; +} + +static void dwc_pcie_pmu_remove_cpuhp_instance(void *hotplug_node) +{ + cpuhp_state_remove_instance_nocalls(dwc_pcie_pmu_hp_state, hotplug_node); +} + +/* + * Find the binded DES capability device info of a PCI device. + * @pdev: The PCI device. + */ +static struct dwc_pcie_dev_info *dwc_pcie_find_dev_info(struct pci_dev *pdev) +{ + struct dwc_pcie_dev_info *dev_info; + + list_for_each_entry(dev_info, &dwc_pcie_dev_info_head, dev_node) + if (dev_info->pdev == pdev) + return dev_info; + + return NULL; +} + +static void dwc_pcie_unregister_pmu(void *data) +{ + struct dwc_pcie_pmu *pcie_pmu = data; + + perf_pmu_unregister(&pcie_pmu->pmu); +} + +static bool dwc_pcie_match_des_cap(struct pci_dev *pdev) +{ + const struct dwc_pcie_vendor_id *vid; + u16 vsec = 0; + u32 val; + + if (!pci_is_pcie(pdev) || !(pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)) + return false; + + for (vid = dwc_pcie_vendor_ids; vid->vendor_id; vid++) { + vsec = pci_find_vsec_capability(pdev, vid->vendor_id, + DWC_PCIE_VSEC_RAS_DES_ID); + if (vsec) + break; + } + if (!vsec) + return false; + + pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val); + if (PCI_VNDR_HEADER_REV(val) != 0x04) + return false; + + pci_dbg(pdev, + "Detected PCIe Vendor-Specific Extended Capability RAS DES\n"); + return true; +} + +static void dwc_pcie_unregister_dev(struct dwc_pcie_dev_info *dev_info) +{ + platform_device_unregister(dev_info->plat_dev); + list_del(&dev_info->dev_node); + kfree(dev_info); +} + +static int dwc_pcie_register_dev(struct pci_dev *pdev) +{ + struct platform_device *plat_dev; + struct dwc_pcie_dev_info *dev_info; + u32 bdf; + + bdf = PCI_DEVID(pdev->bus->number, pdev->devfn); + plat_dev = platform_device_register_data(NULL, "dwc_pcie_pmu", bdf, + pdev, sizeof(*pdev)); + + if (IS_ERR(plat_dev)) + return PTR_ERR(plat_dev); + + dev_info = kzalloc(sizeof(*dev_info), GFP_KERNEL); + if (!dev_info) + return -ENOMEM; + + /* Cache platform device to handle pci device hotplug */ + dev_info->plat_dev = plat_dev; + dev_info->pdev = pdev; + list_add(&dev_info->dev_node, &dwc_pcie_dev_info_head); + + return 0; +} + +static int dwc_pcie_pmu_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct device *dev = data; + struct pci_dev *pdev = to_pci_dev(dev); + struct dwc_pcie_dev_info *dev_info; + + switch (action) { + case BUS_NOTIFY_ADD_DEVICE: + if (!dwc_pcie_match_des_cap(pdev)) + return NOTIFY_DONE; + if (dwc_pcie_register_dev(pdev)) + return NOTIFY_BAD; + break; + case BUS_NOTIFY_DEL_DEVICE: + dev_info = dwc_pcie_find_dev_info(pdev); + if (!dev_info) + return NOTIFY_DONE; + dwc_pcie_unregister_dev(dev_info); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block dwc_pcie_pmu_nb = { + .notifier_call = dwc_pcie_pmu_notifier, +}; + +static int dwc_pcie_pmu_probe(struct platform_device *plat_dev) +{ + struct pci_dev *pdev = plat_dev->dev.platform_data; + struct dwc_pcie_pmu *pcie_pmu; + char *name; + u32 bdf, val; + u16 vsec; + int ret; + + vsec = pci_find_vsec_capability(pdev, pdev->vendor, + DWC_PCIE_VSEC_RAS_DES_ID); + pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val); + bdf = PCI_DEVID(pdev->bus->number, pdev->devfn); + name = devm_kasprintf(&plat_dev->dev, GFP_KERNEL, "dwc_rootport_%x", bdf); + if (!name) + return -ENOMEM; + + pcie_pmu = devm_kzalloc(&plat_dev->dev, sizeof(*pcie_pmu), GFP_KERNEL); + if (!pcie_pmu) + return -ENOMEM; + + pcie_pmu->pdev = pdev; + pcie_pmu->ras_des_offset = vsec; + pcie_pmu->nr_lanes = pcie_get_width_cap(pdev); + pcie_pmu->on_cpu = -1; + pcie_pmu->pmu = (struct pmu){ + .name = name, + .parent = &pdev->dev, + .module = THIS_MODULE, + .attr_groups = dwc_pcie_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, + .task_ctx_nr = perf_invalid_context, + .event_init = dwc_pcie_pmu_event_init, + .add = dwc_pcie_pmu_event_add, + .del = dwc_pcie_pmu_event_del, + .start = dwc_pcie_pmu_event_start, + .stop = dwc_pcie_pmu_event_stop, + .read = dwc_pcie_pmu_event_update, + }; + + /* Add this instance to the list used by the offline callback */ + ret = cpuhp_state_add_instance(dwc_pcie_pmu_hp_state, + &pcie_pmu->cpuhp_node); + if (ret) { + pci_err(pdev, "Error %d registering hotplug @%x\n", ret, bdf); + return ret; + } + + /* Unwind when platform driver removes */ + ret = devm_add_action_or_reset(&plat_dev->dev, + dwc_pcie_pmu_remove_cpuhp_instance, + &pcie_pmu->cpuhp_node); + if (ret) + return ret; + + ret = perf_pmu_register(&pcie_pmu->pmu, name, -1); + if (ret) { + pci_err(pdev, "Error %d registering PMU @%x\n", ret, bdf); + return ret; + } + ret = devm_add_action_or_reset(&plat_dev->dev, dwc_pcie_unregister_pmu, + pcie_pmu); + if (ret) + return ret; + + return 0; +} + +static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node) +{ + struct dwc_pcie_pmu *pcie_pmu; + + pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node); + if (pcie_pmu->on_cpu == -1) + pcie_pmu->on_cpu = cpumask_local_spread( + 0, dev_to_node(&pcie_pmu->pdev->dev)); + + return 0; +} + +static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node) +{ + struct dwc_pcie_pmu *pcie_pmu; + struct pci_dev *pdev; + int node; + cpumask_t mask; + unsigned int target; + + pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node); + /* Nothing to do if this CPU doesn't own the PMU */ + if (cpu != pcie_pmu->on_cpu) + return 0; + + pcie_pmu->on_cpu = -1; + pdev = pcie_pmu->pdev; + node = dev_to_node(&pdev->dev); + if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) && + cpumask_andnot(&mask, &mask, cpumask_of(cpu))) + target = cpumask_any(&mask); + else + target = cpumask_any_but(cpu_online_mask, cpu); + + if (target >= nr_cpu_ids) { + pci_err(pdev, "There is no CPU to set\n"); + return 0; + } + + /* This PMU does NOT support interrupt, just migrate context. */ + perf_pmu_migrate_context(&pcie_pmu->pmu, cpu, target); + pcie_pmu->on_cpu = target; + + return 0; +} + +static struct platform_driver dwc_pcie_pmu_driver = { + .probe = dwc_pcie_pmu_probe, + .driver = {.name = "dwc_pcie_pmu",}, +}; + +static int __init dwc_pcie_pmu_init(void) +{ + struct pci_dev *pdev = NULL; + bool found = false; + int ret; + + for_each_pci_dev(pdev) { + if (!dwc_pcie_match_des_cap(pdev)) + continue; + + ret = dwc_pcie_register_dev(pdev); + if (ret) { + pci_dev_put(pdev); + return ret; + } + + found = true; + } + if (!found) + return -ENODEV; + + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "perf/dwc_pcie_pmu:online", + dwc_pcie_pmu_online_cpu, + dwc_pcie_pmu_offline_cpu); + if (ret < 0) + return ret; + + dwc_pcie_pmu_hp_state = ret; + + ret = platform_driver_register(&dwc_pcie_pmu_driver); + if (ret) + goto platform_driver_register_err; + + ret = bus_register_notifier(&pci_bus_type, &dwc_pcie_pmu_nb); + if (ret) + goto platform_driver_register_err; + notify = true; + + return 0; + +platform_driver_register_err: + cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state); + + return ret; +} + +static void __exit dwc_pcie_pmu_exit(void) +{ + struct dwc_pcie_dev_info *dev_info, *tmp; + + if (notify) + bus_unregister_notifier(&pci_bus_type, &dwc_pcie_pmu_nb); + list_for_each_entry_safe(dev_info, tmp, &dwc_pcie_dev_info_head, dev_node) + dwc_pcie_unregister_dev(dev_info); + platform_driver_unregister(&dwc_pcie_pmu_driver); + cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state); +} + +module_init(dwc_pcie_pmu_init); +module_exit(dwc_pcie_pmu_exit); + +MODULE_DESCRIPTION("PMU driver for DesignWare Cores PCI Express Controller"); +MODULE_AUTHOR("Shuai Xue "); +MODULE_LICENSE("GPL v2"); From f56bb3de66bc7db90cd6df0a9866167e8a79317e Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Fri, 8 Dec 2023 10:56:52 +0800 Subject: [PATCH 76/87] MAINTAINERS: add maintainers for DesignWare PCIe PMU driver Add maintainers for Synopsys DesignWare PCIe PMU driver and driver document. Signed-off-by: Shuai Xue Link: https://lore.kernel.org/r/20231208025652.87192-6-xueshuai@linux.alibaba.com Signed-off-by: Will Deacon --- MAINTAINERS | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 012df8ccf34e..5b28ec113c7d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21090,6 +21090,13 @@ L: linux-mmc@vger.kernel.org S: Maintained F: drivers/mmc/host/dw_mmc* +SYNOPSYS DESIGNWARE PCIE PMU DRIVER +M: Shuai Xue +M: Jing Zhang +S: Supported +F: Documentation/admin-guide/perf/dwc_pcie_pmu.rst +F: drivers/perf/dwc_pcie_pmu.c + SYNOPSYS HSDK RESET CONTROLLER DRIVER M: Eugeniy Paltsev S: Supported From 63a2d92e1461396cd83742cbaf20af240b7a94e9 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 12 Dec 2023 17:09:09 +0000 Subject: [PATCH 77/87] arm64: Cleanup system cpucap handling Recent changes to remove cpus_have_const_cap() introduced new users of cpus_have_cap() in the period between detecting system cpucaps and patching alternatives. It would be preferable to defer these until after the relevant cpucaps have been patched so that these can use the usual feature check helper functions, which is clearer and has less risk of accidental usage of code relying upon an alternative which has not yet been patched. This patch reworks the system-wide cpucap detection and patching to minimize this transient period: * The detection, enablement, and patching of system cpucaps is moved into a new setup_system_capabilities() function so that these can be grouped together more clearly, with no other functions called in the period between detection and patching. This is called from setup_system_features() before the subsequent checks that depend on the cpucaps. The logging of TTBR0 PAN and cpucaps with a mask is also moved here to keep these as close as possible to update_cpu_capabilities(). At the same time, comments are corrected and improved to make the intent clearer. * As hyp_mode_check() only tests system register values (not hwcaps) and must be called prior to patching, the call to hyp_mode_check() is moved before the call to setup_system_features(). * In setup_system_features(), the use of system_uses_ttbr0_pan() is restored, now that this occurs after alternatives are patched. This is a partial revert of commit: 53d62e995d9eaed1 ("arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN") * In sve_setup() and sme_setup(), the use of system_supports_sve() and system_supports_sme() respectively are restored, now that these occur after alternatives are patched. This is a partial revert of commit: a76521d160284a1e ("arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}") Signed-off-by: Mark Rutland Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Will Deacon Link: https://lore.kernel.org/r/20231212170910.3745497-2-mark.rutland@arm.com Signed-off-by: Will Deacon --- arch/arm64/kernel/cpufeature.c | 46 ++++++++++++++++++++-------------- arch/arm64/kernel/fpsimd.c | 4 +-- arch/arm64/kernel/smp.c | 3 +-- 3 files changed, 30 insertions(+), 23 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index b335da126e86..5f1320c38aa2 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -3318,23 +3318,40 @@ unsigned long cpu_get_elf_hwcap2(void) return elf_hwcap[1]; } -void __init setup_system_features(void) +static void __init setup_system_capabilities(void) { - int i; /* - * The system-wide safe feature feature register values have been - * finalized. Finalize and log the available system capabilities. + * The system-wide safe feature register values have been finalized. + * Detect, enable, and patch alternatives for the available system + * cpucaps. */ update_cpu_capabilities(SCOPE_SYSTEM); - if (IS_ENABLED(CONFIG_ARM64_SW_TTBR0_PAN) && - !cpus_have_cap(ARM64_HAS_PAN)) - pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n"); + enable_cpu_capabilities(SCOPE_ALL & ~SCOPE_BOOT_CPU); + apply_alternatives_all(); /* - * Enable all the available capabilities which have not been enabled - * already. + * Log any cpucaps with a cpumask as these aren't logged by + * update_cpu_capabilities(). */ - enable_cpu_capabilities(SCOPE_ALL & ~SCOPE_BOOT_CPU); + for (int i = 0; i < ARM64_NCAPS; i++) { + const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i]; + + if (caps && caps->cpus && caps->desc && + cpumask_any(caps->cpus) < nr_cpu_ids) + pr_info("detected: %s on CPU%*pbl\n", + caps->desc, cpumask_pr_args(caps->cpus)); + } + + /* + * TTBR0 PAN doesn't have its own cpucap, so log it manually. + */ + if (system_uses_ttbr0_pan()) + pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n"); +} + +void __init setup_system_features(void) +{ + setup_system_capabilities(); kpti_install_ng_mappings(); @@ -3347,15 +3364,6 @@ void __init setup_system_features(void) if (!cache_type_cwg()) pr_warn("No Cache Writeback Granule information, assuming %d\n", ARCH_DMA_MINALIGN); - - for (i = 0; i < ARM64_NCAPS; i++) { - const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i]; - - if (caps && caps->cpus && caps->desc && - cpumask_any(caps->cpus) < nr_cpu_ids) - pr_info("detected: %s on CPU%*pbl\n", - caps->desc, cpumask_pr_args(caps->cpus)); - } } void __init setup_user_features(void) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 1559c706d32d..bc9384517db3 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1171,7 +1171,7 @@ void __init sve_setup(void) unsigned long b; int max_bit; - if (!cpus_have_cap(ARM64_SVE)) + if (!system_supports_sve()) return; /* @@ -1301,7 +1301,7 @@ void __init sme_setup(void) struct vl_info *info = &vl_info[ARM64_VEC_SME]; int min_bit, max_bit; - if (!cpus_have_cap(ARM64_SME)) + if (!system_supports_sme()) return; /* diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index defbab84e9e5..85384dc9a89d 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -439,9 +439,8 @@ static void __init hyp_mode_check(void) void __init smp_cpus_done(unsigned int max_cpus) { pr_info("SMP: Total of %d processors activated.\n", num_online_cpus()); - setup_system_features(); hyp_mode_check(); - apply_alternatives_all(); + setup_system_features(); setup_user_features(); mark_linear_text_alias_ro(); } From eb15d707c252c3fb56e3ea57a185976c67e5a07f Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 12 Dec 2023 17:09:10 +0000 Subject: [PATCH 78/87] arm64: Align boot cpucap handling with system cpucap handling Currently the detection+enablement of boot cpucaps is separate from the patching of boot cpucap alternatives, which means there's a period where cpus_have_cap($CAP) and alternative_has_cap($CAP) may be mismatched. It would be preferable to manage the boot cpucaps in the same way as the system cpucaps, both for clarity and to minimize the risk of accidental usage of code relying upon an alternative which has not yet been patched. This patch aligns the handling of boot cpucaps with the handling of system cpucaps: * The existing setup_boot_cpu_capabilities() function is moved to be closer to the setup_system_capabilities() and setup_system_features() functions so that they're more clearly related and more likely to be updated together in future. * The patching of boot cpucap alternatives is moved into setup_boot_cpu_capabilities(), immediately after boot cpucaps are detected and enabled. * A new setup_boot_cpu_features() function is added to mirror setup_system_features(); this handles initialization of cpucap data structures and calls setup_boot_cpu_capabilities(). This makes init_cpu_features() a closer mirror to update_cpu_features(), and makes smp_prepare_boot_cpu() a closer mirror to smp_cpus_done(). Importantly, while these changes alter the structure of the code, they retain the existing order of calls to: init_cpu_features(); // prefix initializing feature regs init_cpucap_indirect_list(); detect_system_supports_pseudo_nmi(); update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU); enable_cpu_capabilities(SCOPE_BOOT_CPU); apply_boot_alternatives(); ... and hence there should be no functional change as a result of this patch; this is purely a structural cleanup. Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Will Deacon Link: https://lore.kernel.org/r/20231212170910.3745497-3-mark.rutland@arm.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/cpufeature.h | 1 + arch/arm64/kernel/cpufeature.c | 57 +++++++++++++++-------------- arch/arm64/kernel/smp.c | 9 +---- 3 files changed, 33 insertions(+), 34 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index f6d416fe49b0..2ea24c5cb900 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -617,6 +617,7 @@ static inline bool id_aa64pfr1_mte(u64 pfr1) return val >= ID_AA64PFR1_EL1_MTE_MTE2; } +void __init setup_boot_cpu_features(void); void __init setup_system_features(void); void __init setup_user_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 5f1320c38aa2..e976538c11ac 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1081,25 +1081,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) if (id_aa64pfr1_mte(info->reg_id_aa64pfr1)) init_cpu_ftr_reg(SYS_GMID_EL1, info->reg_gmid); - - /* - * Initialize the indirect array of CPU capabilities pointers before we - * handle the boot CPU below. - */ - init_cpucap_indirect_list(); - - /* - * Detect broken pseudo-NMI. Must be called _before_ the call to - * setup_boot_cpu_capabilities() since it interacts with - * can_use_gic_priorities(). - */ - detect_system_supports_pseudo_nmi(); - - /* - * Detect and enable early CPU capabilities based on the boot CPU, - * after we have initialised the CPU feature infrastructure. - */ - setup_boot_cpu_capabilities(); } static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new) @@ -3255,14 +3236,6 @@ void check_local_cpu_capabilities(void) verify_local_cpu_capabilities(); } -static void __init setup_boot_cpu_capabilities(void) -{ - /* Detect capabilities with either SCOPE_BOOT_CPU or SCOPE_LOCAL_CPU */ - update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU); - /* Enable the SCOPE_BOOT_CPU capabilities alone right away */ - enable_cpu_capabilities(SCOPE_BOOT_CPU); -} - bool this_cpu_has_cap(unsigned int n) { if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) { @@ -3318,6 +3291,36 @@ unsigned long cpu_get_elf_hwcap2(void) return elf_hwcap[1]; } +static void __init setup_boot_cpu_capabilities(void) +{ + /* + * The boot CPU's feature register values have been recorded. Detect + * boot cpucaps and local cpucaps for the boot CPU, then enable and + * patch alternatives for the available boot cpucaps. + */ + update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU); + enable_cpu_capabilities(SCOPE_BOOT_CPU); + apply_boot_alternatives(); +} + +void __init setup_boot_cpu_features(void) +{ + /* + * Initialize the indirect array of CPU capabilities pointers before we + * handle the boot CPU. + */ + init_cpucap_indirect_list(); + + /* + * Detect broken pseudo-NMI. Must be called _before_ the call to + * setup_boot_cpu_capabilities() since it interacts with + * can_use_gic_priorities(). + */ + detect_system_supports_pseudo_nmi(); + + setup_boot_cpu_capabilities(); +} + static void __init setup_system_capabilities(void) { /* diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 85384dc9a89d..4ced34f62dab 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -453,14 +453,9 @@ void __init smp_prepare_boot_cpu(void) * freed shortly, so we must move over to the runtime per-cpu area. */ set_my_cpu_offset(per_cpu_offset(smp_processor_id())); - cpuinfo_store_boot_cpu(); - /* - * We now know enough about the boot CPU to apply the - * alternatives that cannot wait until interrupt handling - * and/or scheduling is enabled. - */ - apply_boot_alternatives(); + cpuinfo_store_boot_cpu(); + setup_boot_cpu_features(); /* Conditionally switch to GIC PMR for interrupt masking */ if (system_uses_irq_prio_masking()) From bb339db4d363c84e0a8d70827df591397ccd7312 Mon Sep 17 00:00:00 2001 From: James Clark Date: Fri, 15 Dec 2023 17:56:48 +0000 Subject: [PATCH 79/87] arm: perf: Fix ARCH=arm build with GCC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit LLVM ignores everything inside the if statement and doesn't generate errors, but GCC doesn't ignore it, resulting in the following error: drivers/perf/arm_pmuv3.c: In function ‘armv8pmu_write_evtype’: include/linux/bits.h:34:29: error: left shift count >= width of type [-Werror=shift-count-overflow] 34 | (((~UL(0)) - (UL(1) << (l)) + 1) & \ Fix it by using GENMASK_ULL which doesn't overflow on arm32 (even though the value is never used there). Fixes: 3115ee021bfb ("arm64: perf: Include threshold control fields in PMEVTYPER mask") Reported-by: Uwe Kleine-König Closes: https://lore.kernel.org/linux-arm-kernel/20231215120817.h2f3akgv72zhrtqo@pengutronix.de/ Signed-off-by: James Clark Acked-by: Mark Rutland Reviewed-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20231215175648.3397170-2-james.clark@arm.com Signed-off-by: Will Deacon --- include/linux/perf/arm_pmuv3.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index 0f4d62ef3a9a..46377e134d67 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -234,8 +234,8 @@ * PMXEVTYPER: Event selection reg */ #define ARMV8_PMU_EVTYPE_EVENT GENMASK(15, 0) /* Mask for EVENT bits */ -#define ARMV8_PMU_EVTYPE_TH GENMASK(43, 32) -#define ARMV8_PMU_EVTYPE_TC GENMASK(63, 61) +#define ARMV8_PMU_EVTYPE_TH GENMASK_ULL(43, 32) /* arm64 only */ +#define ARMV8_PMU_EVTYPE_TC GENMASK_ULL(63, 61) /* arm64 only */ /* * Event filters for PMUv3 From 5cc5ed7a668dcfb62c02f380c7cca9c808242ed0 Mon Sep 17 00:00:00 2001 From: Wang Jinchao Date: Fri, 15 Dec 2023 14:40:08 +0800 Subject: [PATCH 80/87] arm64: memory: remove duplicated include remove duplicated include Signed-off-by: Wang Jinchao Link: https://lore.kernel.org/r/202312151439+0800-wangjinchao@xfusion.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/memory.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index fde4186cc387..27fc302a6d70 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -407,6 +407,5 @@ void dump_mem_limit(void); #define INIT_MEMBLOCK_MEMORY_REGIONS (INIT_MEMBLOCK_REGIONS * 8) #endif -#include #endif /* __ASM_MEMORY_H */ From 3b077ad8cb25c936ff55780c517dbd8ea36fb018 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Thu, 14 Dec 2023 10:01:41 +0000 Subject: [PATCH 81/87] arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1 Add the Pauth_LR field definitions to ID_AA64ISAR1_EL1, based on DDI0601 2023-09. These fields aren't used yet. Adding them for completeness and consistency (definition already exists for ID_AA64ISAR2_EL1). Signed-off-by: Fuad Tabba Reviewed-by: Mark Brown Link: https://lore.kernel.org/r/20231214100158.2305400-2-tabba@google.com Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 2c4b6665c5bf..d596be2599d1 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1401,6 +1401,7 @@ UnsignedEnum 11:8 API 0b0011 PAuth2 0b0100 FPAC 0b0101 FPACCOMBINE + 0b0110 PAuth_LR EndEnum UnsignedEnum 7:4 APA 0b0000 NI @@ -1409,6 +1410,7 @@ UnsignedEnum 7:4 APA 0b0011 PAuth2 0b0100 FPAC 0b0101 FPACCOMBINE + 0b0110 PAuth_LR EndEnum UnsignedEnum 3:0 DPB 0b0000 NI From 4f101cdcb578638454eeff3e1d6c2cb2495d8005 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Thu, 14 Dec 2023 10:01:42 +0000 Subject: [PATCH 82/87] arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1 Add the ExtTrcBuff field definitions to ID_AA64DFR0_EL1 from DDI0601 2023-09. This field isn't used yet. Adding it for completeness and because it will be used in future patches. Signed-off-by: Fuad Tabba Reviewed-by: Mark Brown Link: https://lore.kernel.org/r/20231214100158.2305400-3-tabba@google.com Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index d596be2599d1..a8e36640c027 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1189,7 +1189,10 @@ Enum 63:60 HPMN0 0b0000 UNPREDICTABLE 0b0001 DEF EndEnum -Res0 59:56 +UnsignedEnum 59:56 ExtTrcBuff + 0b0000 NI + 0b0001 IMP +EndEnum UnsignedEnum 55:52 BRBE 0b0000 NI 0b0001 IMP From 885c6d8e2885915451dd4f4a90ddd1bb82ba5a4f Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Thu, 14 Dec 2023 10:01:43 +0000 Subject: [PATCH 83/87] arm64/sysreg: Add missing system register definitions for FGT Add the definitions of missing system registers that are trappable by fine grain traps. The definitions are based on DDI0601 2023-09. Signed-off-by: Fuad Tabba Reviewed-by: Mark Brown Link: https://lore.kernel.org/r/20231214100158.2305400-4-tabba@google.com Signed-off-by: Will Deacon --- arch/arm64/tools/sysreg | 43 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index a8e36640c027..5ceaa1d3630e 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2086,10 +2086,18 @@ Sysreg CONTEXTIDR_EL1 3 0 13 0 1 Fields CONTEXTIDR_ELx EndSysreg +Sysreg RCWSMASK_EL1 3 0 13 0 3 +Field 63:0 RCWSMASK +EndSysreg + Sysreg TPIDR_EL1 3 0 13 0 4 Field 63:0 ThreadID EndSysreg +Sysreg RCWMASK_EL1 3 0 13 0 6 +Field 63:0 RCWMASK +EndSysreg + Sysreg SCXTNUM_EL1 3 0 13 0 7 Field 63:0 SoftwareContextNumber EndSysreg @@ -2714,6 +2722,33 @@ Field 1 PIE Field 0 PnCH EndSysreg +SysregFields MAIR2_ELx +Field 63:56 Attr7 +Field 55:48 Attr6 +Field 47:40 Attr5 +Field 39:32 Attr4 +Field 31:24 Attr3 +Field 23:16 Attr2 +Field 15:8 Attr1 +Field 7:0 Attr0 +EndSysregFields + +Sysreg MAIR2_EL1 3 0 10 2 1 +Fields MAIR2_ELx +EndSysreg + +Sysreg MAIR2_EL2 3 4 10 1 1 +Fields MAIR2_ELx +EndSysreg + +Sysreg AMAIR2_EL1 3 0 10 3 1 +Field 63:0 ImpDef +EndSysreg + +Sysreg AMAIR2_EL2 3 4 10 3 1 +Field 63:0 ImpDef +EndSysreg + SysregFields PIRx_ELx Field 63:60 Perm15 Field 59:56 Perm14 @@ -2765,6 +2800,14 @@ Sysreg POR_EL12 3 5 10 2 4 Fields PIRx_ELx EndSysreg +Sysreg S2POR_EL1 3 0 10 2 5 +Fields PIRx_ELx +EndSysreg + +Sysreg S2PIR_EL2 3 4 10 2 5 +Fields PIRx_ELx +EndSysreg + Sysreg LORSA_EL1 3 0 10 4 0 Res0 63:52 Field 51:16 SA From 4ebee8cebdf6d661dfe7272cf74d378108160a3e Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Thu, 14 Dec 2023 10:01:44 +0000 Subject: [PATCH 84/87] arm64/sysreg: Add missing system instruction definitions for FGT Add the definitions of missing system instructions that are trappable by fine grain traps. The definitions are based on DDI0602 2023-09. Signed-off-by: Fuad Tabba Link: https://lore.kernel.org/r/20231214100158.2305400-5-tabba@google.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/sysreg.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 9c2caf0efdc7..b320fb0de56b 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -645,6 +645,7 @@ #define OP_AT_S1E0W sys_insn(AT_Op0, 0, AT_CRn, 8, 3) #define OP_AT_S1E1RP sys_insn(AT_Op0, 0, AT_CRn, 9, 0) #define OP_AT_S1E1WP sys_insn(AT_Op0, 0, AT_CRn, 9, 1) +#define OP_AT_S1E1A sys_insn(AT_Op0, 0, AT_CRn, 9, 2) #define OP_AT_S1E2R sys_insn(AT_Op0, 4, AT_CRn, 8, 0) #define OP_AT_S1E2W sys_insn(AT_Op0, 4, AT_CRn, 8, 1) #define OP_AT_S12E1R sys_insn(AT_Op0, 4, AT_CRn, 8, 4) @@ -781,10 +782,16 @@ #define OP_TLBI_VMALLS12E1NXS sys_insn(1, 4, 9, 7, 6) /* Misc instructions */ +#define OP_GCSPUSHX sys_insn(1, 0, 7, 7, 4) +#define OP_GCSPOPCX sys_insn(1, 0, 7, 7, 5) +#define OP_GCSPOPX sys_insn(1, 0, 7, 7, 6) +#define OP_GCSPUSHM sys_insn(1, 3, 7, 7, 0) + #define OP_BRB_IALL sys_insn(1, 1, 7, 2, 4) #define OP_BRB_INJ sys_insn(1, 1, 7, 2, 5) #define OP_CFP_RCTX sys_insn(1, 3, 7, 3, 4) #define OP_DVP_RCTX sys_insn(1, 3, 7, 3, 5) +#define OP_COSP_RCTX sys_insn(1, 3, 7, 3, 6) #define OP_CPP_RCTX sys_insn(1, 3, 7, 3, 7) /* Common SCTLR_ELx flags. */ From 7b21ed7d119dc06b0ed2ba3e406a02cafe3a8d03 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Thu, 14 Dec 2023 11:18:50 -0500 Subject: [PATCH 85/87] arm64: properly install vmlinuz.efi If you select CONFIG_EFI_ZBOOT, we will generate vmlinuz.efi, and then when we go to install the kernel we'll install the vmlinux instead because install.sh only recognizes Image.gz as wanting the compressed install image. With CONFIG_EFI_ZBOOT we don't get the proper kernel installed, which means it doesn't boot, which makes for a very confused and subsequently angry kernel developer. Fix this by properly installing our compressed kernel if we've enabled CONFIG_EFI_ZBOOT. Signed-off-by: Josef Bacik Cc: # 6.1.x Fixes: c37b830fef13 ("arm64: efi: enable generic EFI compressed boot") Reviewed-by: Simon Glass Link: https://lore.kernel.org/r/6edb1402769c2c14c4fbef8f7eaedb3167558789.1702570674.git.josef@toxicpanda.com Signed-off-by: Will Deacon --- arch/arm64/boot/install.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/install.sh b/arch/arm64/boot/install.sh index 7399d706967a..9b7a09808a3d 100755 --- a/arch/arm64/boot/install.sh +++ b/arch/arm64/boot/install.sh @@ -17,7 +17,8 @@ # $3 - kernel map file # $4 - default install path (blank if root directory) -if [ "$(basename $2)" = "Image.gz" ]; then +if [ "$(basename $2)" = "Image.gz" ] || [ "$(basename $2)" = "vmlinuz.efi" ] +then # Compressed install echo "Installing compressed kernel" base=vmlinuz From 97ba4416d6dd53c4202038ee7d86dfb29774e00f Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 18 Dec 2023 17:01:27 +0900 Subject: [PATCH 86/87] efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad You do not need to use $(shell ...) in recipe lines, as they are already executed in a shell. An alternative solution is $$(...), which is an escaped sequence of the shell's command substituion, $(...). For this case, there is a reason to avoid $(shell ...). Kbuild detects command changes by using the if_changed macro, which compares the previous command recorded in .*.cmd with the current command from Makefile. If they differ, Kbuild re-runs the build rule. To diff the commands, Make must expand $(shell ...) first. It means that hexdump is executed every time, even when nothing needs rebuilding. If Kbuild determines that vmlinux.bin needs rebuilding, hexdump will be executed again to evaluate the 'cmd' macro, one more time to really build vmlinux.bin, and finally yet again to record the expanded command into .*.cmd. Replace $(shell ...) with $$(...) to avoid multiple, unnecessay shell evaluations. Since Make is agnostic about the shell code, $(...), the if_changed macro compares the string "$(hexdump -s16 -n4 ...)" verbatim, so hexdump is run only for building vmlinux.bin. For the same reason, $(shell ...) in EFI_ZBOOT_OBJCOPY_FLAGS should be eliminated. While I was here, I replaced '&&' with ';' because a command for if_changed is executed with 'set -e'. Signed-off-by: Masahiro Yamada Reviewed-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20231218080127.907460-1-masahiroy@kernel.org Signed-off-by: Will Deacon --- arch/arm64/boot/Makefile | 2 +- drivers/firmware/efi/libstub/Makefile.zboot | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/boot/Makefile b/arch/arm64/boot/Makefile index 1761f5972443..a5a787371117 100644 --- a/arch/arm64/boot/Makefile +++ b/arch/arm64/boot/Makefile @@ -44,7 +44,7 @@ EFI_ZBOOT_BFD_TARGET := elf64-littleaarch64 EFI_ZBOOT_MACH_TYPE := ARM64 EFI_ZBOOT_FORWARD_CFI := $(CONFIG_ARM64_BTI_KERNEL) -EFI_ZBOOT_OBJCOPY_FLAGS = --add-symbol zboot_code_size=0x$(shell \ +EFI_ZBOOT_OBJCOPY_FLAGS = --add-symbol zboot_code_size=0x$$( \ $(NM) vmlinux|grep _kernel_codesize|cut -d' ' -f1) include $(srctree)/drivers/firmware/efi/libstub/Makefile.zboot diff --git a/drivers/firmware/efi/libstub/Makefile.zboot b/drivers/firmware/efi/libstub/Makefile.zboot index 2c489627a807..65ffd0b760b2 100644 --- a/drivers/firmware/efi/libstub/Makefile.zboot +++ b/drivers/firmware/efi/libstub/Makefile.zboot @@ -5,8 +5,8 @@ # EFI_ZBOOT_FORWARD_CFI quiet_cmd_copy_and_pad = PAD $@ - cmd_copy_and_pad = cp $< $@ && \ - truncate -s $(shell hexdump -s16 -n4 -e '"%u"' $<) $@ + cmd_copy_and_pad = cp $< $@; \ + truncate -s $$(hexdump -s16 -n4 -e '"%u"' $<) $@ # Pad the file to the size of the uncompressed image in memory, including BSS $(obj)/vmlinux.bin: $(obj)/$(EFI_ZBOOT_PAYLOAD) FORCE From 9a802ddb2123e5adec394d35cd539cc0b15bc830 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Mon, 18 Dec 2023 23:39:32 +0000 Subject: [PATCH 87/87] kselftest/arm64: Don't probe the current VL for unsupported vector types The vec-syscfg selftest verifies that setting the VL of the currently tested vector type does not disrupt the VL of the other vector type. To do this it records the current vector length for each type but neglects to guard this with a check for that vector type actually being supported. Add one, using a helper function which we also update all the other instances of this pattern. Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20231218-kselftest-arm64-vec-syscfg-rdvl-v1-1-0ac22d47e81f@kernel.org Signed-off-by: Will Deacon --- tools/testing/selftests/arm64/fp/vec-syscfg.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/arm64/fp/vec-syscfg.c b/tools/testing/selftests/arm64/fp/vec-syscfg.c index 5f648b97a06f..ea9c7d47790f 100644 --- a/tools/testing/selftests/arm64/fp/vec-syscfg.c +++ b/tools/testing/selftests/arm64/fp/vec-syscfg.c @@ -66,6 +66,11 @@ static struct vec_data vec_data[] = { }, }; +static bool vec_type_supported(struct vec_data *data) +{ + return getauxval(data->hwcap_type) & data->hwcap; +} + static int stdio_read_integer(FILE *f, const char *what, int *val) { int n = 0; @@ -564,8 +569,11 @@ static void prctl_set_all_vqs(struct vec_data *data) return; } - for (i = 0; i < ARRAY_SIZE(vec_data); i++) + for (i = 0; i < ARRAY_SIZE(vec_data); i++) { + if (!vec_type_supported(&vec_data[i])) + continue; orig_vls[i] = vec_data[i].rdvl(); + } for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) { vl = sve_vl_from_vq(vq); @@ -594,7 +602,7 @@ static void prctl_set_all_vqs(struct vec_data *data) if (&vec_data[i] == data) continue; - if (!(getauxval(vec_data[i].hwcap_type) & vec_data[i].hwcap)) + if (!vec_type_supported(&vec_data[i])) continue; if (vec_data[i].rdvl() != orig_vls[i]) { @@ -765,7 +773,7 @@ int main(void) struct vec_data *data = &vec_data[i]; unsigned long supported; - supported = getauxval(data->hwcap_type) & data->hwcap; + supported = vec_type_supported(data); if (!supported) all_supported = false;