crypto: x86/aes-gcm - rewrite the AES-NI optimized AES-GCM
Rewrite the AES-NI implementations of AES-GCM, taking advantage of
things I learned while writing the VAES-AVX10 implementations. This is
a complete rewrite that reduces the AES-NI GCM source code size by about
70% and the binary code size by about 95%, while not regressing
performance and in fact improving it significantly in many cases.
The following summarizes the state before this patch:
- The aesni-intel module registered algorithms "generic-gcm-aesni" and
"rfc4106-gcm-aesni" with the crypto API that actually delegated to one
of three underlying implementations according to the CPU capabilities
detected at runtime: AES-NI, AES-NI + AVX, or AES-NI + AVX2.
- The AES-NI + AVX and AES-NI + AVX2 assembly code was in
aesni-intel_avx-x86_64.S and consisted of 2804 lines of source and
257 KB of binary. This massive binary size was not really
appropriate, and depending on the kconfig it could take up over 1% the
size of the entire vmlinux. The main loops did 8 blocks per
iteration. The AVX code minimized the use of carryless multiplication
whereas the AVX2 code did not. The "AVX2" code did not actually use
AVX2; the check for AVX2 was really a check for Intel Haswell or later
to detect support for fast carryless multiplication. The long source
length was caused by factors such as significant code duplication.
- The AES-NI only assembly code was in aesni-intel_asm.S and consisted
of 1501 lines of source and 15 KB of binary. The main loops did 4
blocks per iteration and minimized the use of carryless multiplication
by using Karatsuba multiplication and a multiplication-less reduction.
- The assembly code was contributed in 2010-2013. Maintenance has been
sporadic and most design choices haven't been revisited.
- The assembly function prototypes and the corresponding glue code were
separate from and were not consistent with the new VAES-AVX10 code I
recently added. The older code had several issues such as not
precomputing the GHASH key powers, which hurt performance.
This rewrite achieves the following goals:
- Much shorter source and binary sizes. The assembly source shrinks
from 4300 lines to 1130 lines, and it produces about 9 KB of binary
instead of 272 KB. This is achieved via a better designed AES-GCM
implementation that doesn't excessively unroll the code and instead
prioritizes the parts that really matter. Sharing the C glue code
with the VAES-AVX10 implementations also saves 250 lines of C source.
- Improve performance on most (possibly all) CPUs on which this code
runs, for most (possibly all) message lengths. Benchmark results are
given in Tables 1 and 2 below.
- Use the same function prototypes and glue code as the new VAES-AVX10
algorithms. This fixes some issues with the integration of the
assembly and results in some significant performance improvements,
primarily on short messages. Also, the AVX and non-AVX
implementations are now registered as separate algorithms with the
crypto API, which makes them both testable by the self-tests.
- Keep support for AES-NI without AVX (for Westmere, Silvermont,
Goldmont, and Tremont), but unify the source code with AES-NI + AVX.
Since 256-bit vectors cannot be used without VAES anyway, this is made
feasible by just using the non-VEX coded form of most instructions.
- Use a unified approach where the main loop does 8 blocks per iteration
and uses Karatsuba multiplication to save one pclmulqdq per block but
does not use the multiplication-less reduction. This strikes a good
balance across the range of CPUs on which this code runs.
- Don't spam the kernel log with an informational message on every boot.
The following tables summarize the improvement in AES-GCM throughput on
various CPU microarchitectures as a result of this patch:
Table 1: AES-256-GCM encryption throughput improvement,
CPU microarchitecture vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
-------------------+-------+-------+-------+-------+-------+-------+
Intel Broadwell | 2% | 8% | 11% | 18% | 31% | 26% |
Intel Skylake | 1% | 4% | 7% | 12% | 26% | 19% |
Intel Cascade Lake | 3% | 8% | 10% | 18% | 33% | 24% |
AMD Zen 1 | 6% | 12% | 6% | 15% | 27% | 24% |
AMD Zen 2 | 8% | 13% | 13% | 19% | 26% | 28% |
AMD Zen 3 | 8% | 14% | 13% | 19% | 26% | 25% |
| 300 | 200 | 64 | 63 | 16 |
-------------------+-------+-------+-------+-------+-------+
Intel Broadwell | 35% | 29% | 45% | 55% | 54% |
Intel Skylake | 25% | 19% | 28% | 33% | 27% |
Intel Cascade Lake | 36% | 28% | 39% | 49% | 54% |
AMD Zen 1 | 27% | 22% | 23% | 29% | 26% |
AMD Zen 2 | 32% | 24% | 22% | 25% | 31% |
AMD Zen 3 | 30% | 24% | 22% | 23% | 26% |
Table 2: AES-256-GCM decryption throughput improvement,
CPU microarchitecture vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
-------------------+-------+-------+-------+-------+-------+-------+
Intel Broadwell | 3% | 8% | 11% | 19% | 32% | 28% |
Intel Skylake | 3% | 4% | 7% | 13% | 28% | 27% |
Intel Cascade Lake | 3% | 9% | 11% | 19% | 33% | 28% |
AMD Zen 1 | 15% | 18% | 14% | 20% | 36% | 33% |
AMD Zen 2 | 9% | 16% | 13% | 21% | 26% | 27% |
AMD Zen 3 | 8% | 15% | 12% | 18% | 23% | 23% |
| 300 | 200 | 64 | 63 | 16 |
-------------------+-------+-------+-------+-------+-------+
Intel Broadwell | 36% | 31% | 40% | 51% | 53% |
Intel Skylake | 28% | 21% | 23% | 30% | 30% |
Intel Cascade Lake | 36% | 29% | 36% | 47% | 53% |
AMD Zen 1 | 35% | 31% | 32% | 35% | 36% |
AMD Zen 2 | 31% | 30% | 27% | 38% | 30% |
AMD Zen 3 | 27% | 23% | 24% | 32% | 26% |
The above numbers are percentage improvements in single-thread
throughput, so e.g. an increase from 3000 MB/s to 3300 MB/s would be
listed as 10%. They were collected by directly measuring the Linux
crypto API performance using a custom kernel module. Note that indirect
benchmarks (e.g. 'cryptsetup benchmark' or benchmarking dm-crypt I/O)
include more overhead and won't see quite as much of a difference. All
these benchmarks used an associated data length of 16 bytes. Note that
AES-GCM is almost always used with short associated data lengths.
I didn't test Intel CPUs before Broadwell, AMD CPUs before Zen 1, or
Intel low-power CPUs, as these weren't readily available to me.
However, based on the design of the new code and the available
information about these other CPU microarchitectures, I wouldn't expect
any significant regressions, and there's a good chance performance is
improved just as it is above.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This commit is contained in:
@@ -48,8 +48,9 @@ chacha-x86_64-$(CONFIG_AS_AVX512) += chacha-avx512vl-x86_64.o
|
||||
|
||||
obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
|
||||
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
|
||||
aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o \
|
||||
aes_ctrby8_avx-x86_64.o aes-xts-avx-x86_64.o
|
||||
aesni-intel-$(CONFIG_64BIT) += aes_ctrby8_avx-x86_64.o \
|
||||
aes-gcm-aesni-x86_64.o \
|
||||
aes-xts-avx-x86_64.o
|
||||
ifeq ($(CONFIG_AS_VAES)$(CONFIG_AS_VPCLMULQDQ),yy)
|
||||
aesni-intel-$(CONFIG_64BIT) += aes-gcm-avx10-x86_64.o
|
||||
endif
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
+258
-510
@@ -46,41 +46,11 @@
|
||||
#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
|
||||
#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
|
||||
|
||||
/* This data is stored at the end of the crypto_tfm struct.
|
||||
* It's a type of per "session" data storage location.
|
||||
* This needs to be 16 byte aligned.
|
||||
*/
|
||||
struct aesni_rfc4106_gcm_ctx {
|
||||
u8 hash_subkey[16] AESNI_ALIGN_ATTR;
|
||||
struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
|
||||
u8 nonce[4];
|
||||
};
|
||||
|
||||
struct generic_gcmaes_ctx {
|
||||
u8 hash_subkey[16] AESNI_ALIGN_ATTR;
|
||||
struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
|
||||
};
|
||||
|
||||
struct aesni_xts_ctx {
|
||||
struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
|
||||
struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
|
||||
};
|
||||
|
||||
#define GCM_BLOCK_LEN 16
|
||||
|
||||
struct gcm_context_data {
|
||||
/* init, update and finalize context data */
|
||||
u8 aad_hash[GCM_BLOCK_LEN];
|
||||
u64 aad_length;
|
||||
u64 in_length;
|
||||
u8 partial_block_enc_key[GCM_BLOCK_LEN];
|
||||
u8 orig_IV[GCM_BLOCK_LEN];
|
||||
u8 current_counter[GCM_BLOCK_LEN];
|
||||
u64 partial_block_len;
|
||||
u64 unused;
|
||||
u8 hash_keys[GCM_BLOCK_LEN * 16];
|
||||
};
|
||||
|
||||
static inline void *aes_align_addr(void *addr)
|
||||
{
|
||||
if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
|
||||
@@ -105,9 +75,6 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
|
||||
asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
|
||||
const u8 *in, unsigned int len, u8 *iv);
|
||||
|
||||
#define AVX_GEN2_OPTSIZE 640
|
||||
#define AVX_GEN4_OPTSIZE 4096
|
||||
|
||||
asmlinkage void aesni_xts_enc(const struct crypto_aes_ctx *ctx, u8 *out,
|
||||
const u8 *in, unsigned int len, u8 *iv);
|
||||
|
||||
@@ -120,23 +87,6 @@ asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
|
||||
const u8 *in, unsigned int len, u8 *iv);
|
||||
DEFINE_STATIC_CALL(aesni_ctr_enc_tfm, aesni_ctr_enc);
|
||||
|
||||
/* Scatter / Gather routines, with args similar to above */
|
||||
asmlinkage void aesni_gcm_init(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *iv,
|
||||
u8 *hash_subkey, const u8 *aad,
|
||||
unsigned long aad_len);
|
||||
asmlinkage void aesni_gcm_enc_update(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len);
|
||||
asmlinkage void aesni_gcm_dec_update(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in,
|
||||
unsigned long ciphertext_len);
|
||||
asmlinkage void aesni_gcm_finalize(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
|
||||
void *keys, u8 *out, unsigned int num_bytes);
|
||||
asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
|
||||
@@ -156,67 +106,6 @@ asmlinkage void aes_xctr_enc_192_avx_by8(const u8 *in, const u8 *iv,
|
||||
asmlinkage void aes_xctr_enc_256_avx_by8(const u8 *in, const u8 *iv,
|
||||
const void *keys, u8 *out, unsigned int num_bytes,
|
||||
unsigned int byte_ctr);
|
||||
|
||||
/*
|
||||
* asmlinkage void aesni_gcm_init_avx_gen2()
|
||||
* gcm_data *my_ctx_data, context data
|
||||
* u8 *hash_subkey, the Hash sub key input. Data starts on a 16-byte boundary.
|
||||
*/
|
||||
asmlinkage void aesni_gcm_init_avx_gen2(void *my_ctx_data,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *iv,
|
||||
u8 *hash_subkey,
|
||||
const u8 *aad,
|
||||
unsigned long aad_len);
|
||||
|
||||
asmlinkage void aesni_gcm_enc_update_avx_gen2(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len);
|
||||
asmlinkage void aesni_gcm_dec_update_avx_gen2(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in,
|
||||
unsigned long ciphertext_len);
|
||||
asmlinkage void aesni_gcm_finalize_avx_gen2(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
/*
|
||||
* asmlinkage void aesni_gcm_init_avx_gen4()
|
||||
* gcm_data *my_ctx_data, context data
|
||||
* u8 *hash_subkey, the Hash sub key input. Data starts on a 16-byte boundary.
|
||||
*/
|
||||
asmlinkage void aesni_gcm_init_avx_gen4(void *my_ctx_data,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *iv,
|
||||
u8 *hash_subkey,
|
||||
const u8 *aad,
|
||||
unsigned long aad_len);
|
||||
|
||||
asmlinkage void aesni_gcm_enc_update_avx_gen4(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len);
|
||||
asmlinkage void aesni_gcm_dec_update_avx_gen4(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in,
|
||||
unsigned long ciphertext_len);
|
||||
asmlinkage void aesni_gcm_finalize_avx_gen4(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx);
|
||||
static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
|
||||
|
||||
static inline struct
|
||||
aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
|
||||
{
|
||||
return aes_align_addr(crypto_aead_ctx(tfm));
|
||||
}
|
||||
|
||||
static inline struct
|
||||
generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
|
||||
{
|
||||
return aes_align_addr(crypto_aead_ctx(tfm));
|
||||
}
|
||||
#endif
|
||||
|
||||
static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
|
||||
@@ -590,280 +479,6 @@ static int xctr_crypt(struct skcipher_request *req)
|
||||
}
|
||||
return err;
|
||||
}
|
||||
|
||||
static int aes_gcm_derive_hash_subkey(const struct crypto_aes_ctx *aes_key,
|
||||
u8 hash_subkey[AES_BLOCK_SIZE])
|
||||
{
|
||||
static const u8 zeroes[AES_BLOCK_SIZE];
|
||||
|
||||
aes_encrypt(aes_key, hash_subkey, zeroes);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int common_rfc4106_set_key(struct crypto_aead *aead, const u8 *key,
|
||||
unsigned int key_len)
|
||||
{
|
||||
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(aead);
|
||||
|
||||
if (key_len < 4)
|
||||
return -EINVAL;
|
||||
|
||||
/*Account for 4 byte nonce at the end.*/
|
||||
key_len -= 4;
|
||||
|
||||
memcpy(ctx->nonce, key + key_len, sizeof(ctx->nonce));
|
||||
|
||||
return aes_set_key_common(&ctx->aes_key_expanded, key, key_len) ?:
|
||||
aes_gcm_derive_hash_subkey(&ctx->aes_key_expanded,
|
||||
ctx->hash_subkey);
|
||||
}
|
||||
|
||||
/* This is the Integrity Check Value (aka the authentication tag) length and can
|
||||
* be 8, 12 or 16 bytes long. */
|
||||
static int common_rfc4106_set_authsize(struct crypto_aead *aead,
|
||||
unsigned int authsize)
|
||||
{
|
||||
switch (authsize) {
|
||||
case 8:
|
||||
case 12:
|
||||
case 16:
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int generic_gcmaes_set_authsize(struct crypto_aead *tfm,
|
||||
unsigned int authsize)
|
||||
{
|
||||
switch (authsize) {
|
||||
case 4:
|
||||
case 8:
|
||||
case 12:
|
||||
case 13:
|
||||
case 14:
|
||||
case 15:
|
||||
case 16:
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
|
||||
unsigned int assoclen, u8 *hash_subkey,
|
||||
u8 *iv, void *aes_ctx, u8 *auth_tag,
|
||||
unsigned long auth_tag_len)
|
||||
{
|
||||
u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
|
||||
struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
|
||||
unsigned long left = req->cryptlen;
|
||||
struct scatter_walk assoc_sg_walk;
|
||||
struct skcipher_walk walk;
|
||||
bool do_avx, do_avx2;
|
||||
u8 *assocmem = NULL;
|
||||
u8 *assoc;
|
||||
int err;
|
||||
|
||||
if (!enc)
|
||||
left -= auth_tag_len;
|
||||
|
||||
do_avx = (left >= AVX_GEN2_OPTSIZE);
|
||||
do_avx2 = (left >= AVX_GEN4_OPTSIZE);
|
||||
|
||||
/* Linearize assoc, if not already linear */
|
||||
if (req->src->length >= assoclen && req->src->length) {
|
||||
scatterwalk_start(&assoc_sg_walk, req->src);
|
||||
assoc = scatterwalk_map(&assoc_sg_walk);
|
||||
} else {
|
||||
gfp_t flags = (req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
|
||||
GFP_KERNEL : GFP_ATOMIC;
|
||||
|
||||
/* assoc can be any length, so must be on heap */
|
||||
assocmem = kmalloc(assoclen, flags);
|
||||
if (unlikely(!assocmem))
|
||||
return -ENOMEM;
|
||||
assoc = assocmem;
|
||||
|
||||
scatterwalk_map_and_copy(assoc, req->src, 0, assoclen, 0);
|
||||
}
|
||||
|
||||
kernel_fpu_begin();
|
||||
if (static_branch_likely(&gcm_use_avx2) && do_avx2)
|
||||
aesni_gcm_init_avx_gen4(aes_ctx, data, iv, hash_subkey, assoc,
|
||||
assoclen);
|
||||
else if (static_branch_likely(&gcm_use_avx) && do_avx)
|
||||
aesni_gcm_init_avx_gen2(aes_ctx, data, iv, hash_subkey, assoc,
|
||||
assoclen);
|
||||
else
|
||||
aesni_gcm_init(aes_ctx, data, iv, hash_subkey, assoc, assoclen);
|
||||
kernel_fpu_end();
|
||||
|
||||
if (!assocmem)
|
||||
scatterwalk_unmap(assoc);
|
||||
else
|
||||
kfree(assocmem);
|
||||
|
||||
err = enc ? skcipher_walk_aead_encrypt(&walk, req, false)
|
||||
: skcipher_walk_aead_decrypt(&walk, req, false);
|
||||
|
||||
while (walk.nbytes > 0) {
|
||||
kernel_fpu_begin();
|
||||
if (static_branch_likely(&gcm_use_avx2) && do_avx2) {
|
||||
if (enc)
|
||||
aesni_gcm_enc_update_avx_gen4(aes_ctx, data,
|
||||
walk.dst.virt.addr,
|
||||
walk.src.virt.addr,
|
||||
walk.nbytes);
|
||||
else
|
||||
aesni_gcm_dec_update_avx_gen4(aes_ctx, data,
|
||||
walk.dst.virt.addr,
|
||||
walk.src.virt.addr,
|
||||
walk.nbytes);
|
||||
} else if (static_branch_likely(&gcm_use_avx) && do_avx) {
|
||||
if (enc)
|
||||
aesni_gcm_enc_update_avx_gen2(aes_ctx, data,
|
||||
walk.dst.virt.addr,
|
||||
walk.src.virt.addr,
|
||||
walk.nbytes);
|
||||
else
|
||||
aesni_gcm_dec_update_avx_gen2(aes_ctx, data,
|
||||
walk.dst.virt.addr,
|
||||
walk.src.virt.addr,
|
||||
walk.nbytes);
|
||||
} else if (enc) {
|
||||
aesni_gcm_enc_update(aes_ctx, data, walk.dst.virt.addr,
|
||||
walk.src.virt.addr, walk.nbytes);
|
||||
} else {
|
||||
aesni_gcm_dec_update(aes_ctx, data, walk.dst.virt.addr,
|
||||
walk.src.virt.addr, walk.nbytes);
|
||||
}
|
||||
kernel_fpu_end();
|
||||
|
||||
err = skcipher_walk_done(&walk, 0);
|
||||
}
|
||||
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
kernel_fpu_begin();
|
||||
if (static_branch_likely(&gcm_use_avx2) && do_avx2)
|
||||
aesni_gcm_finalize_avx_gen4(aes_ctx, data, auth_tag,
|
||||
auth_tag_len);
|
||||
else if (static_branch_likely(&gcm_use_avx) && do_avx)
|
||||
aesni_gcm_finalize_avx_gen2(aes_ctx, data, auth_tag,
|
||||
auth_tag_len);
|
||||
else
|
||||
aesni_gcm_finalize(aes_ctx, data, auth_tag, auth_tag_len);
|
||||
kernel_fpu_end();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
|
||||
u8 *hash_subkey, u8 *iv, void *aes_ctx)
|
||||
{
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
|
||||
u8 auth_tag[16];
|
||||
int err;
|
||||
|
||||
err = gcmaes_crypt_by_sg(true, req, assoclen, hash_subkey, iv, aes_ctx,
|
||||
auth_tag, auth_tag_len);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
scatterwalk_map_and_copy(auth_tag, req->dst,
|
||||
req->assoclen + req->cryptlen,
|
||||
auth_tag_len, 1);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen,
|
||||
u8 *hash_subkey, u8 *iv, void *aes_ctx)
|
||||
{
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
|
||||
u8 auth_tag_msg[16];
|
||||
u8 auth_tag[16];
|
||||
int err;
|
||||
|
||||
err = gcmaes_crypt_by_sg(false, req, assoclen, hash_subkey, iv, aes_ctx,
|
||||
auth_tag, auth_tag_len);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
/* Copy out original auth_tag */
|
||||
scatterwalk_map_and_copy(auth_tag_msg, req->src,
|
||||
req->assoclen + req->cryptlen - auth_tag_len,
|
||||
auth_tag_len, 0);
|
||||
|
||||
/* Compare generated tag with passed in tag. */
|
||||
if (crypto_memneq(auth_tag_msg, auth_tag, auth_tag_len)) {
|
||||
memzero_explicit(auth_tag, sizeof(auth_tag));
|
||||
return -EBADMSG;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int helper_rfc4106_encrypt(struct aead_request *req)
|
||||
{
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
|
||||
void *aes_ctx = &(ctx->aes_key_expanded);
|
||||
u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
|
||||
u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
|
||||
unsigned int i;
|
||||
__be32 counter = cpu_to_be32(1);
|
||||
|
||||
/* Assuming we are supporting rfc4106 64-bit extended */
|
||||
/* sequence numbers We need to have the AAD length equal */
|
||||
/* to 16 or 20 bytes */
|
||||
if (unlikely(req->assoclen != 16 && req->assoclen != 20))
|
||||
return -EINVAL;
|
||||
|
||||
/* IV below built */
|
||||
for (i = 0; i < 4; i++)
|
||||
*(iv+i) = ctx->nonce[i];
|
||||
for (i = 0; i < 8; i++)
|
||||
*(iv+4+i) = req->iv[i];
|
||||
*((__be32 *)(iv+12)) = counter;
|
||||
|
||||
return gcmaes_encrypt(req, req->assoclen - 8, ctx->hash_subkey, iv,
|
||||
aes_ctx);
|
||||
}
|
||||
|
||||
static int helper_rfc4106_decrypt(struct aead_request *req)
|
||||
{
|
||||
__be32 counter = cpu_to_be32(1);
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
|
||||
void *aes_ctx = &(ctx->aes_key_expanded);
|
||||
u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
|
||||
u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
|
||||
unsigned int i;
|
||||
|
||||
if (unlikely(req->assoclen != 16 && req->assoclen != 20))
|
||||
return -EINVAL;
|
||||
|
||||
/* Assuming we are supporting rfc4106 64-bit extended */
|
||||
/* sequence numbers We need to have the AAD length */
|
||||
/* equal to 16 or 20 bytes */
|
||||
|
||||
/* IV below built */
|
||||
for (i = 0; i < 4; i++)
|
||||
*(iv+i) = ctx->nonce[i];
|
||||
for (i = 0; i < 8; i++)
|
||||
*(iv+4+i) = req->iv[i];
|
||||
*((__be32 *)(iv+12)) = counter;
|
||||
|
||||
return gcmaes_decrypt(req, req->assoclen - 8, ctx->hash_subkey, iv,
|
||||
aes_ctx);
|
||||
}
|
||||
#endif
|
||||
|
||||
static int xts_setkey_aesni(struct crypto_skcipher *tfm, const u8 *key,
|
||||
@@ -1216,6 +831,7 @@ DEFINE_XTS_ALG(aesni_avx, "xts-aes-aesni-avx", 500);
|
||||
DEFINE_XTS_ALG(vaes_avx2, "xts-aes-vaes-avx2", 600);
|
||||
DEFINE_XTS_ALG(vaes_avx10_256, "xts-aes-vaes-avx10_256", 700);
|
||||
DEFINE_XTS_ALG(vaes_avx10_512, "xts-aes-vaes-avx10_512", 800);
|
||||
#endif
|
||||
|
||||
/* The common part of the x86_64 AES-GCM key struct */
|
||||
struct aes_gcm_key {
|
||||
@@ -1226,6 +842,40 @@ struct aes_gcm_key {
|
||||
u32 rfc4106_nonce;
|
||||
};
|
||||
|
||||
/* Key struct used by the AES-NI implementations of AES-GCM */
|
||||
struct aes_gcm_key_aesni {
|
||||
/*
|
||||
* Common part of the key. The assembly code requires 16-byte alignment
|
||||
* for the round keys; we get this by them being located at the start of
|
||||
* the struct and the whole struct being 16-byte aligned.
|
||||
*/
|
||||
struct aes_gcm_key base;
|
||||
|
||||
/*
|
||||
* Powers of the hash key H^8 through H^1. These are 128-bit values.
|
||||
* They all have an extra factor of x^-1 and are byte-reversed. 16-byte
|
||||
* alignment is required by the assembly code.
|
||||
*/
|
||||
u64 h_powers[8][2] __aligned(16);
|
||||
|
||||
/*
|
||||
* h_powers_xored[i] contains the two 64-bit halves of h_powers[i] XOR'd
|
||||
* together. It's used for Karatsuba multiplication. 16-byte alignment
|
||||
* is required by the assembly code.
|
||||
*/
|
||||
u64 h_powers_xored[8] __aligned(16);
|
||||
|
||||
/*
|
||||
* H^1 times x^64 (and also the usual extra factor of x^-1). 16-byte
|
||||
* alignment is required by the assembly code.
|
||||
*/
|
||||
u64 h_times_x64[2] __aligned(16);
|
||||
};
|
||||
#define AES_GCM_KEY_AESNI(key) \
|
||||
container_of((key), struct aes_gcm_key_aesni, base)
|
||||
#define AES_GCM_KEY_AESNI_SIZE \
|
||||
(sizeof(struct aes_gcm_key_aesni) + (15 & ~(CRYPTO_MINALIGN - 1)))
|
||||
|
||||
/* Key struct used by the VAES + AVX10 implementations of AES-GCM */
|
||||
struct aes_gcm_key_avx10 {
|
||||
/*
|
||||
@@ -1261,14 +911,32 @@ struct aes_gcm_key_avx10 {
|
||||
*/
|
||||
#define FLAG_RFC4106 BIT(0)
|
||||
#define FLAG_ENC BIT(1)
|
||||
#define FLAG_AVX10_512 BIT(2)
|
||||
#define FLAG_AVX BIT(2)
|
||||
#if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
|
||||
# define FLAG_AVX10_256 BIT(3)
|
||||
# define FLAG_AVX10_512 BIT(4)
|
||||
#else
|
||||
/*
|
||||
* This should cause all calls to the AVX10 assembly functions to be
|
||||
* optimized out, avoiding the need to ifdef each call individually.
|
||||
*/
|
||||
# define FLAG_AVX10_256 0
|
||||
# define FLAG_AVX10_512 0
|
||||
#endif
|
||||
|
||||
static inline struct aes_gcm_key *
|
||||
aes_gcm_key_get(struct crypto_aead *tfm, int flags)
|
||||
{
|
||||
return PTR_ALIGN(crypto_aead_ctx(tfm), 64);
|
||||
if (flags & (FLAG_AVX10_256 | FLAG_AVX10_512))
|
||||
return PTR_ALIGN(crypto_aead_ctx(tfm), 64);
|
||||
else
|
||||
return PTR_ALIGN(crypto_aead_ctx(tfm), 16);
|
||||
}
|
||||
|
||||
asmlinkage void
|
||||
aes_gcm_precompute_aesni(struct aes_gcm_key_aesni *key);
|
||||
asmlinkage void
|
||||
aes_gcm_precompute_aesni_avx(struct aes_gcm_key_aesni *key);
|
||||
asmlinkage void
|
||||
aes_gcm_precompute_vaes_avx10_256(struct aes_gcm_key_avx10 *key);
|
||||
asmlinkage void
|
||||
@@ -1283,13 +951,25 @@ static void aes_gcm_precompute(struct aes_gcm_key *key, int flags)
|
||||
* straightforward to provide a 512-bit one because of how the assembly
|
||||
* code is structured, and it works nicely because the total size of the
|
||||
* key powers is a multiple of 512 bits. So we take advantage of that.
|
||||
*
|
||||
* A similar situation applies to the AES-NI implementations.
|
||||
*/
|
||||
if (flags & FLAG_AVX10_512)
|
||||
aes_gcm_precompute_vaes_avx10_512(AES_GCM_KEY_AVX10(key));
|
||||
else
|
||||
else if (flags & FLAG_AVX10_256)
|
||||
aes_gcm_precompute_vaes_avx10_256(AES_GCM_KEY_AVX10(key));
|
||||
else if (flags & FLAG_AVX)
|
||||
aes_gcm_precompute_aesni_avx(AES_GCM_KEY_AESNI(key));
|
||||
else
|
||||
aes_gcm_precompute_aesni(AES_GCM_KEY_AESNI(key));
|
||||
}
|
||||
|
||||
asmlinkage void
|
||||
aes_gcm_aad_update_aesni(const struct aes_gcm_key_aesni *key,
|
||||
u8 ghash_acc[16], const u8 *aad, int aadlen);
|
||||
asmlinkage void
|
||||
aes_gcm_aad_update_aesni_avx(const struct aes_gcm_key_aesni *key,
|
||||
u8 ghash_acc[16], const u8 *aad, int aadlen);
|
||||
asmlinkage void
|
||||
aes_gcm_aad_update_vaes_avx10(const struct aes_gcm_key_avx10 *key,
|
||||
u8 ghash_acc[16], const u8 *aad, int aadlen);
|
||||
@@ -1297,10 +977,25 @@ aes_gcm_aad_update_vaes_avx10(const struct aes_gcm_key_avx10 *key,
|
||||
static void aes_gcm_aad_update(const struct aes_gcm_key *key, u8 ghash_acc[16],
|
||||
const u8 *aad, int aadlen, int flags)
|
||||
{
|
||||
aes_gcm_aad_update_vaes_avx10(AES_GCM_KEY_AVX10(key), ghash_acc,
|
||||
aad, aadlen);
|
||||
if (flags & (FLAG_AVX10_256 | FLAG_AVX10_512))
|
||||
aes_gcm_aad_update_vaes_avx10(AES_GCM_KEY_AVX10(key), ghash_acc,
|
||||
aad, aadlen);
|
||||
else if (flags & FLAG_AVX)
|
||||
aes_gcm_aad_update_aesni_avx(AES_GCM_KEY_AESNI(key), ghash_acc,
|
||||
aad, aadlen);
|
||||
else
|
||||
aes_gcm_aad_update_aesni(AES_GCM_KEY_AESNI(key), ghash_acc,
|
||||
aad, aadlen);
|
||||
}
|
||||
|
||||
asmlinkage void
|
||||
aes_gcm_enc_update_aesni(const struct aes_gcm_key_aesni *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
const u8 *src, u8 *dst, int datalen);
|
||||
asmlinkage void
|
||||
aes_gcm_enc_update_aesni_avx(const struct aes_gcm_key_aesni *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
const u8 *src, u8 *dst, int datalen);
|
||||
asmlinkage void
|
||||
aes_gcm_enc_update_vaes_avx10_256(const struct aes_gcm_key_avx10 *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
@@ -1311,6 +1006,14 @@ aes_gcm_enc_update_vaes_avx10_512(const struct aes_gcm_key_avx10 *key,
|
||||
const u8 *src, u8 *dst, int datalen);
|
||||
|
||||
asmlinkage void
|
||||
aes_gcm_dec_update_aesni(const struct aes_gcm_key_aesni *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
const u8 *src, u8 *dst, int datalen);
|
||||
asmlinkage void
|
||||
aes_gcm_dec_update_aesni_avx(const struct aes_gcm_key_aesni *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
const u8 *src, u8 *dst, int datalen);
|
||||
asmlinkage void
|
||||
aes_gcm_dec_update_vaes_avx10_256(const struct aes_gcm_key_avx10 *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
const u8 *src, u8 *dst, int datalen);
|
||||
@@ -1330,22 +1033,45 @@ aes_gcm_update(const struct aes_gcm_key *key,
|
||||
aes_gcm_enc_update_vaes_avx10_512(AES_GCM_KEY_AVX10(key),
|
||||
le_ctr, ghash_acc,
|
||||
src, dst, datalen);
|
||||
else
|
||||
else if (flags & FLAG_AVX10_256)
|
||||
aes_gcm_enc_update_vaes_avx10_256(AES_GCM_KEY_AVX10(key),
|
||||
le_ctr, ghash_acc,
|
||||
src, dst, datalen);
|
||||
else if (flags & FLAG_AVX)
|
||||
aes_gcm_enc_update_aesni_avx(AES_GCM_KEY_AESNI(key),
|
||||
le_ctr, ghash_acc,
|
||||
src, dst, datalen);
|
||||
else
|
||||
aes_gcm_enc_update_aesni(AES_GCM_KEY_AESNI(key), le_ctr,
|
||||
ghash_acc, src, dst, datalen);
|
||||
} else {
|
||||
if (flags & FLAG_AVX10_512)
|
||||
aes_gcm_dec_update_vaes_avx10_512(AES_GCM_KEY_AVX10(key),
|
||||
le_ctr, ghash_acc,
|
||||
src, dst, datalen);
|
||||
else
|
||||
else if (flags & FLAG_AVX10_256)
|
||||
aes_gcm_dec_update_vaes_avx10_256(AES_GCM_KEY_AVX10(key),
|
||||
le_ctr, ghash_acc,
|
||||
src, dst, datalen);
|
||||
else if (flags & FLAG_AVX)
|
||||
aes_gcm_dec_update_aesni_avx(AES_GCM_KEY_AESNI(key),
|
||||
le_ctr, ghash_acc,
|
||||
src, dst, datalen);
|
||||
else
|
||||
aes_gcm_dec_update_aesni(AES_GCM_KEY_AESNI(key),
|
||||
le_ctr, ghash_acc,
|
||||
src, dst, datalen);
|
||||
}
|
||||
}
|
||||
|
||||
asmlinkage void
|
||||
aes_gcm_enc_final_aesni(const struct aes_gcm_key_aesni *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
u64 total_aadlen, u64 total_datalen);
|
||||
asmlinkage void
|
||||
aes_gcm_enc_final_aesni_avx(const struct aes_gcm_key_aesni *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
u64 total_aadlen, u64 total_datalen);
|
||||
asmlinkage void
|
||||
aes_gcm_enc_final_vaes_avx10(const struct aes_gcm_key_avx10 *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
@@ -1357,11 +1083,30 @@ aes_gcm_enc_final(const struct aes_gcm_key *key,
|
||||
const u32 le_ctr[4], u8 ghash_acc[16],
|
||||
u64 total_aadlen, u64 total_datalen, int flags)
|
||||
{
|
||||
aes_gcm_enc_final_vaes_avx10(AES_GCM_KEY_AVX10(key),
|
||||
le_ctr, ghash_acc,
|
||||
total_aadlen, total_datalen);
|
||||
if (flags & (FLAG_AVX10_256 | FLAG_AVX10_512))
|
||||
aes_gcm_enc_final_vaes_avx10(AES_GCM_KEY_AVX10(key),
|
||||
le_ctr, ghash_acc,
|
||||
total_aadlen, total_datalen);
|
||||
else if (flags & FLAG_AVX)
|
||||
aes_gcm_enc_final_aesni_avx(AES_GCM_KEY_AESNI(key),
|
||||
le_ctr, ghash_acc,
|
||||
total_aadlen, total_datalen);
|
||||
else
|
||||
aes_gcm_enc_final_aesni(AES_GCM_KEY_AESNI(key),
|
||||
le_ctr, ghash_acc,
|
||||
total_aadlen, total_datalen);
|
||||
}
|
||||
|
||||
asmlinkage bool __must_check
|
||||
aes_gcm_dec_final_aesni(const struct aes_gcm_key_aesni *key,
|
||||
const u32 le_ctr[4], const u8 ghash_acc[16],
|
||||
u64 total_aadlen, u64 total_datalen,
|
||||
const u8 tag[16], int taglen);
|
||||
asmlinkage bool __must_check
|
||||
aes_gcm_dec_final_aesni_avx(const struct aes_gcm_key_aesni *key,
|
||||
const u32 le_ctr[4], const u8 ghash_acc[16],
|
||||
u64 total_aadlen, u64 total_datalen,
|
||||
const u8 tag[16], int taglen);
|
||||
asmlinkage bool __must_check
|
||||
aes_gcm_dec_final_vaes_avx10(const struct aes_gcm_key_avx10 *key,
|
||||
const u32 le_ctr[4], const u8 ghash_acc[16],
|
||||
@@ -1374,10 +1119,59 @@ aes_gcm_dec_final(const struct aes_gcm_key *key, const u32 le_ctr[4],
|
||||
u8 ghash_acc[16], u64 total_aadlen, u64 total_datalen,
|
||||
u8 tag[16], int taglen, int flags)
|
||||
{
|
||||
return aes_gcm_dec_final_vaes_avx10(AES_GCM_KEY_AVX10(key),
|
||||
le_ctr, ghash_acc,
|
||||
total_aadlen, total_datalen,
|
||||
tag, taglen);
|
||||
if (flags & (FLAG_AVX10_256 | FLAG_AVX10_512))
|
||||
return aes_gcm_dec_final_vaes_avx10(AES_GCM_KEY_AVX10(key),
|
||||
le_ctr, ghash_acc,
|
||||
total_aadlen, total_datalen,
|
||||
tag, taglen);
|
||||
else if (flags & FLAG_AVX)
|
||||
return aes_gcm_dec_final_aesni_avx(AES_GCM_KEY_AESNI(key),
|
||||
le_ctr, ghash_acc,
|
||||
total_aadlen, total_datalen,
|
||||
tag, taglen);
|
||||
else
|
||||
return aes_gcm_dec_final_aesni(AES_GCM_KEY_AESNI(key),
|
||||
le_ctr, ghash_acc,
|
||||
total_aadlen, total_datalen,
|
||||
tag, taglen);
|
||||
}
|
||||
|
||||
/*
|
||||
* This is the Integrity Check Value (aka the authentication tag) length and can
|
||||
* be 8, 12 or 16 bytes long.
|
||||
*/
|
||||
static int common_rfc4106_set_authsize(struct crypto_aead *aead,
|
||||
unsigned int authsize)
|
||||
{
|
||||
switch (authsize) {
|
||||
case 8:
|
||||
case 12:
|
||||
case 16:
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int generic_gcmaes_set_authsize(struct crypto_aead *tfm,
|
||||
unsigned int authsize)
|
||||
{
|
||||
switch (authsize) {
|
||||
case 4:
|
||||
case 8:
|
||||
case 12:
|
||||
case 13:
|
||||
case 14:
|
||||
case 15:
|
||||
case 16:
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -1407,6 +1201,11 @@ static int gcm_setkey(struct crypto_aead *tfm, const u8 *raw_key,
|
||||
}
|
||||
|
||||
/* The assembly code assumes the following offsets. */
|
||||
BUILD_BUG_ON(offsetof(struct aes_gcm_key_aesni, base.aes_key.key_enc) != 0);
|
||||
BUILD_BUG_ON(offsetof(struct aes_gcm_key_aesni, base.aes_key.key_length) != 480);
|
||||
BUILD_BUG_ON(offsetof(struct aes_gcm_key_aesni, h_powers) != 496);
|
||||
BUILD_BUG_ON(offsetof(struct aes_gcm_key_aesni, h_powers_xored) != 624);
|
||||
BUILD_BUG_ON(offsetof(struct aes_gcm_key_aesni, h_times_x64) != 688);
|
||||
BUILD_BUG_ON(offsetof(struct aes_gcm_key_avx10, base.aes_key.key_enc) != 0);
|
||||
BUILD_BUG_ON(offsetof(struct aes_gcm_key_avx10, base.aes_key.key_length) != 480);
|
||||
BUILD_BUG_ON(offsetof(struct aes_gcm_key_avx10, h_powers) != 512);
|
||||
@@ -1424,7 +1223,9 @@ static int gcm_setkey(struct crypto_aead *tfm, const u8 *raw_key,
|
||||
static const u8 x_to_the_minus1[16] __aligned(__alignof__(be128)) = {
|
||||
[0] = 0xc2, [15] = 1
|
||||
};
|
||||
struct aes_gcm_key_avx10 *k = AES_GCM_KEY_AVX10(key);
|
||||
static const u8 x_to_the_63[16] __aligned(__alignof__(be128)) = {
|
||||
[7] = 1,
|
||||
};
|
||||
be128 h1 = {};
|
||||
be128 h;
|
||||
int i;
|
||||
@@ -1441,12 +1242,29 @@ static int gcm_setkey(struct crypto_aead *tfm, const u8 *raw_key,
|
||||
gf128mul_lle(&h, (const be128 *)x_to_the_minus1);
|
||||
|
||||
/* Compute the needed key powers */
|
||||
for (i = ARRAY_SIZE(k->h_powers) - 1; i >= 0; i--) {
|
||||
k->h_powers[i][0] = be64_to_cpu(h.b);
|
||||
k->h_powers[i][1] = be64_to_cpu(h.a);
|
||||
gf128mul_lle(&h, &h1);
|
||||
if (flags & (FLAG_AVX10_256 | FLAG_AVX10_512)) {
|
||||
struct aes_gcm_key_avx10 *k = AES_GCM_KEY_AVX10(key);
|
||||
|
||||
for (i = ARRAY_SIZE(k->h_powers) - 1; i >= 0; i--) {
|
||||
k->h_powers[i][0] = be64_to_cpu(h.b);
|
||||
k->h_powers[i][1] = be64_to_cpu(h.a);
|
||||
gf128mul_lle(&h, &h1);
|
||||
}
|
||||
memset(k->padding, 0, sizeof(k->padding));
|
||||
} else {
|
||||
struct aes_gcm_key_aesni *k = AES_GCM_KEY_AESNI(key);
|
||||
|
||||
for (i = ARRAY_SIZE(k->h_powers) - 1; i >= 0; i--) {
|
||||
k->h_powers[i][0] = be64_to_cpu(h.b);
|
||||
k->h_powers[i][1] = be64_to_cpu(h.a);
|
||||
k->h_powers_xored[i] = k->h_powers[i][0] ^
|
||||
k->h_powers[i][1];
|
||||
gf128mul_lle(&h, &h1);
|
||||
}
|
||||
gf128mul_lle(&h1, (const be128 *)x_to_the_63);
|
||||
k->h_times_x64[0] = be64_to_cpu(h1.b);
|
||||
k->h_times_x64[1] = be64_to_cpu(h1.a);
|
||||
}
|
||||
memset(k->padding, 0, sizeof(k->padding));
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
@@ -1630,7 +1448,7 @@ out:
|
||||
ctxsize, priority) \
|
||||
\
|
||||
static int gcm_setkey_##suffix(struct crypto_aead *tfm, const u8 *raw_key, \
|
||||
unsigned int keylen) \
|
||||
unsigned int keylen) \
|
||||
{ \
|
||||
return gcm_setkey(tfm, raw_key, keylen, (flags)); \
|
||||
} \
|
||||
@@ -1646,7 +1464,7 @@ static int gcm_decrypt_##suffix(struct aead_request *req) \
|
||||
} \
|
||||
\
|
||||
static int rfc4106_setkey_##suffix(struct crypto_aead *tfm, const u8 *raw_key, \
|
||||
unsigned int keylen) \
|
||||
unsigned int keylen) \
|
||||
{ \
|
||||
return gcm_setkey(tfm, raw_key, keylen, (flags) | FLAG_RFC4106); \
|
||||
} \
|
||||
@@ -1699,8 +1517,19 @@ static struct aead_alg aes_gcm_algs_##suffix[] = { { \
|
||||
\
|
||||
static struct simd_aead_alg *aes_gcm_simdalgs_##suffix[2] \
|
||||
|
||||
/* aes_gcm_algs_aesni */
|
||||
DEFINE_GCM_ALGS(aesni, /* no flags */ 0,
|
||||
"generic-gcm-aesni", "rfc4106-gcm-aesni",
|
||||
AES_GCM_KEY_AESNI_SIZE, 400);
|
||||
|
||||
/* aes_gcm_algs_aesni_avx */
|
||||
DEFINE_GCM_ALGS(aesni_avx, FLAG_AVX,
|
||||
"generic-gcm-aesni-avx", "rfc4106-gcm-aesni-avx",
|
||||
AES_GCM_KEY_AESNI_SIZE, 500);
|
||||
|
||||
#if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
|
||||
/* aes_gcm_algs_vaes_avx10_256 */
|
||||
DEFINE_GCM_ALGS(vaes_avx10_256, 0,
|
||||
DEFINE_GCM_ALGS(vaes_avx10_256, FLAG_AVX10_256,
|
||||
"generic-gcm-vaes-avx10_256", "rfc4106-gcm-vaes-avx10_256",
|
||||
AES_GCM_KEY_AVX10_SIZE, 700);
|
||||
|
||||
@@ -1740,6 +1569,11 @@ static int __init register_avx_algs(void)
|
||||
&aes_xts_simdalg_aesni_avx);
|
||||
if (err)
|
||||
return err;
|
||||
err = simd_register_aeads_compat(aes_gcm_algs_aesni_avx,
|
||||
ARRAY_SIZE(aes_gcm_algs_aesni_avx),
|
||||
aes_gcm_simdalgs_aesni_avx);
|
||||
if (err)
|
||||
return err;
|
||||
#if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
|
||||
if (!boot_cpu_has(X86_FEATURE_AVX2) ||
|
||||
!boot_cpu_has(X86_FEATURE_VAES) ||
|
||||
@@ -1795,6 +1629,10 @@ static void unregister_avx_algs(void)
|
||||
if (aes_xts_simdalg_aesni_avx)
|
||||
simd_unregister_skciphers(&aes_xts_alg_aesni_avx, 1,
|
||||
&aes_xts_simdalg_aesni_avx);
|
||||
if (aes_gcm_simdalgs_aesni_avx[0])
|
||||
simd_unregister_aeads(aes_gcm_algs_aesni_avx,
|
||||
ARRAY_SIZE(aes_gcm_algs_aesni_avx),
|
||||
aes_gcm_simdalgs_aesni_avx);
|
||||
#if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
|
||||
if (aes_xts_simdalg_vaes_avx2)
|
||||
simd_unregister_skciphers(&aes_xts_alg_vaes_avx2, 1,
|
||||
@@ -1816,6 +1654,9 @@ static void unregister_avx_algs(void)
|
||||
#endif
|
||||
}
|
||||
#else /* CONFIG_X86_64 */
|
||||
static struct aead_alg aes_gcm_algs_aesni[0];
|
||||
static struct simd_aead_alg *aes_gcm_simdalgs_aesni[0];
|
||||
|
||||
static int __init register_avx_algs(void)
|
||||
{
|
||||
return 0;
|
||||
@@ -1826,90 +1667,6 @@ static void unregister_avx_algs(void)
|
||||
}
|
||||
#endif /* !CONFIG_X86_64 */
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
static int generic_gcmaes_set_key(struct crypto_aead *aead, const u8 *key,
|
||||
unsigned int key_len)
|
||||
{
|
||||
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(aead);
|
||||
|
||||
return aes_set_key_common(&ctx->aes_key_expanded, key, key_len) ?:
|
||||
aes_gcm_derive_hash_subkey(&ctx->aes_key_expanded,
|
||||
ctx->hash_subkey);
|
||||
}
|
||||
|
||||
static int generic_gcmaes_encrypt(struct aead_request *req)
|
||||
{
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
|
||||
void *aes_ctx = &(ctx->aes_key_expanded);
|
||||
u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
|
||||
u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
|
||||
__be32 counter = cpu_to_be32(1);
|
||||
|
||||
memcpy(iv, req->iv, 12);
|
||||
*((__be32 *)(iv+12)) = counter;
|
||||
|
||||
return gcmaes_encrypt(req, req->assoclen, ctx->hash_subkey, iv,
|
||||
aes_ctx);
|
||||
}
|
||||
|
||||
static int generic_gcmaes_decrypt(struct aead_request *req)
|
||||
{
|
||||
__be32 counter = cpu_to_be32(1);
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
|
||||
void *aes_ctx = &(ctx->aes_key_expanded);
|
||||
u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
|
||||
u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
|
||||
|
||||
memcpy(iv, req->iv, 12);
|
||||
*((__be32 *)(iv+12)) = counter;
|
||||
|
||||
return gcmaes_decrypt(req, req->assoclen, ctx->hash_subkey, iv,
|
||||
aes_ctx);
|
||||
}
|
||||
|
||||
static struct aead_alg aesni_aeads[] = { {
|
||||
.setkey = common_rfc4106_set_key,
|
||||
.setauthsize = common_rfc4106_set_authsize,
|
||||
.encrypt = helper_rfc4106_encrypt,
|
||||
.decrypt = helper_rfc4106_decrypt,
|
||||
.ivsize = GCM_RFC4106_IV_SIZE,
|
||||
.maxauthsize = 16,
|
||||
.base = {
|
||||
.cra_name = "__rfc4106(gcm(aes))",
|
||||
.cra_driver_name = "__rfc4106-gcm-aesni",
|
||||
.cra_priority = 400,
|
||||
.cra_flags = CRYPTO_ALG_INTERNAL,
|
||||
.cra_blocksize = 1,
|
||||
.cra_ctxsize = sizeof(struct aesni_rfc4106_gcm_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
}, {
|
||||
.setkey = generic_gcmaes_set_key,
|
||||
.setauthsize = generic_gcmaes_set_authsize,
|
||||
.encrypt = generic_gcmaes_encrypt,
|
||||
.decrypt = generic_gcmaes_decrypt,
|
||||
.ivsize = GCM_AES_IV_SIZE,
|
||||
.maxauthsize = 16,
|
||||
.base = {
|
||||
.cra_name = "__gcm(aes)",
|
||||
.cra_driver_name = "__generic-gcm-aesni",
|
||||
.cra_priority = 400,
|
||||
.cra_flags = CRYPTO_ALG_INTERNAL,
|
||||
.cra_blocksize = 1,
|
||||
.cra_ctxsize = sizeof(struct generic_gcmaes_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
} };
|
||||
#else
|
||||
static struct aead_alg aesni_aeads[0];
|
||||
#endif
|
||||
|
||||
static struct simd_aead_alg *aesni_simd_aeads[ARRAY_SIZE(aesni_aeads)];
|
||||
|
||||
static const struct x86_cpu_id aesni_cpu_id[] = {
|
||||
X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
|
||||
{}
|
||||
@@ -1923,17 +1680,6 @@ static int __init aesni_init(void)
|
||||
if (!x86_match_cpu(aesni_cpu_id))
|
||||
return -ENODEV;
|
||||
#ifdef CONFIG_X86_64
|
||||
if (boot_cpu_has(X86_FEATURE_AVX2)) {
|
||||
pr_info("AVX2 version of gcm_enc/dec engaged.\n");
|
||||
static_branch_enable(&gcm_use_avx);
|
||||
static_branch_enable(&gcm_use_avx2);
|
||||
} else
|
||||
if (boot_cpu_has(X86_FEATURE_AVX)) {
|
||||
pr_info("AVX version of gcm_enc/dec engaged.\n");
|
||||
static_branch_enable(&gcm_use_avx);
|
||||
} else {
|
||||
pr_info("SSE version of gcm_enc/dec engaged.\n");
|
||||
}
|
||||
if (boot_cpu_has(X86_FEATURE_AVX)) {
|
||||
/* optimize performance of ctr mode encryption transform */
|
||||
static_call_update(aesni_ctr_enc_tfm, aesni_ctr_enc_avx_tfm);
|
||||
@@ -1951,8 +1697,9 @@ static int __init aesni_init(void)
|
||||
if (err)
|
||||
goto unregister_cipher;
|
||||
|
||||
err = simd_register_aeads_compat(aesni_aeads, ARRAY_SIZE(aesni_aeads),
|
||||
aesni_simd_aeads);
|
||||
err = simd_register_aeads_compat(aes_gcm_algs_aesni,
|
||||
ARRAY_SIZE(aes_gcm_algs_aesni),
|
||||
aes_gcm_simdalgs_aesni);
|
||||
if (err)
|
||||
goto unregister_skciphers;
|
||||
|
||||
@@ -1977,9 +1724,9 @@ unregister_avx:
|
||||
simd_unregister_skciphers(&aesni_xctr, 1, &aesni_simd_xctr);
|
||||
unregister_aeads:
|
||||
#endif /* CONFIG_X86_64 */
|
||||
simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads),
|
||||
aesni_simd_aeads);
|
||||
|
||||
simd_unregister_aeads(aes_gcm_algs_aesni,
|
||||
ARRAY_SIZE(aes_gcm_algs_aesni),
|
||||
aes_gcm_simdalgs_aesni);
|
||||
unregister_skciphers:
|
||||
simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers),
|
||||
aesni_simd_skciphers);
|
||||
@@ -1990,8 +1737,9 @@ unregister_cipher:
|
||||
|
||||
static void __exit aesni_exit(void)
|
||||
{
|
||||
simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads),
|
||||
aesni_simd_aeads);
|
||||
simd_unregister_aeads(aes_gcm_algs_aesni,
|
||||
ARRAY_SIZE(aes_gcm_algs_aesni),
|
||||
aes_gcm_simdalgs_aesni);
|
||||
simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers),
|
||||
aesni_simd_skciphers);
|
||||
crypto_unregister_alg(&aesni_cipher_alg);
|
||||
|
||||
Reference in New Issue
Block a user