Skip to content

lowram: Stream matrix A element-by-element to reduce memory#1019

Merged
mkannwischer merged 2 commits intomainfrom
lowram-stream-a
Apr 20, 2026
Merged

lowram: Stream matrix A element-by-element to reduce memory#1019
mkannwischer merged 2 commits intomainfrom
lowram-stream-a

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

Replace the row-level matrix buffer (mld_polyvecl) with a single-poly
buffer in REDUCE_RAM mode. In the lazy path, matrix elements A[k][l]
are sampled on demand one at a time, and the matrix-vector product
accumulates element-by-element instead of row-by-row.

Restructure polymat into eager/lazy variants following the same pattern
as s1hat/s2hat/t0hat:

  • mld_polymat_eager: stores full K x L matrix
  • mld_polymat_lazy: stores rho + single poly_buffer + tmp
  • mld_polyvec_matrix_expand_eager/_lazy: separate implementations
  • mld_polyvec_matrix_pointwise_montgomery_eager/_lazy: separate
    implementations with CBMC contracts only on the eager variants

Move all polymat-related code from polyvec.h/polyvec.c into
polyvec_lazy.h/polyvec_lazy.c.

@mkannwischer mkannwischer force-pushed the lowram-stream-a branch 2 times, most recently from 7013003 to 5ccd3c2 Compare April 5, 2026 07:10
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 5, 2026

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof Status Current Previous Change
keccakf1600x4_permute_native ⚠️ 25s 12s +108%
Full Results (185 proofs)
Proof Status Current Previous Change
**TOTAL** 2285s 2277s +0.4%
polyvecl_pointwise_acc_montgomery_c 265s 494s -46%
sign_verify_internal 253s 210s +20%
polyvec_matrix_expand_eager 192s - new
poly_pointwise_montgomery_c 171s 144s +19%
rej_uniform_native 150s 135s +11%
mld_ct_memcmp 79s 71s +11%
mld_invntt_layer 68s 64s +6%
mld_ntt_layer 58s 48s +21%
mld_attempt_signature_generation 54s 60s -10%
sign_signature_internal 30s 22s +36%
sign_keypair_internal 29s 30s -3%
polyvec_matrix_expand_eager_serial 27s - new
keccakf1600x4_permute_native ⚠️ 25s 12s +108%
rej_uniform 23s 22s +5%
sign_pk_from_sk 23s 17s +35%
fqmul 22s 19s +16%
poly_chknorm_c 19s 20s -5%
poly_uniform_eta_4x 18s 15s +20%
polyveck_pointwise_poly_montgomery 18s 8s +125%
polyveck_decompose 17s 13s +31%
rej_uniform_c 16s 14s +14%
polyt0_unpack 15s 13s +15%
poly_add 14s 12s +17%
poly_uniform_4x 14s 15s -7%
polyveck_power2round 14s 16s -12%
mld_ntt_butterfly_block 12s 10s +20%
polyveck_add 12s 12s +0%
keccak_absorb_once_x4 10s 11s -9%
mld_check_pct 10s 10s +0%
polyvec_matrix_pointwise_montgomery_eager 10s - new
polyveck_invntt_tomont 10s 5s +100%
polyveck_use_hint 10s 5s +100%
polyvecl_ntt 10s 9s +11%
keccakf1600_permute_native 9s 8s +12%
mld_sample_s1_s2 9s 9s +0%
polyveck_sub 9s 10s -10%
keccakf1600_permute 8s 6s +33%
mld_prepare_domain_separation_prefix 8s 4s +100%
poly_power2round 8s 7s +14%
polyeta_pack 8s 3s +167%
polyz_unpack_c 8s 5s +60%
sign 8s 9s -11%
unpack_sk 8s 9s -11%
keccak_absorb 7s 7s +0%
mld_compute_pack_z 7s 9s -22%
mld_sample_s1_s2_serial 7s 4s +75%
poly_decompose_c 7s 6s +17%
poly_invntt_tomont_c 7s 6s +17%
poly_uniform 7s 3s +133%
polyveck_reduce 7s 6s +17%
pointwise_acc_native_aarch64 6s 4s +50%
pointwise_acc_native_x86_64 6s 5s +20%
pointwise_native_x86_64 6s 4s +50%
poly_caddq_c 6s 5s +20%
poly_permute_bitrev_to_custom_optional_native 6s - new
polyeta_unpack 6s 6s +0%
polyveck_caddq 6s 7s -14%
polyveck_ntt 6s 7s -14%
polyvecl_pack_eta 6s 3s +100%
polyvecl_pointwise_acc_montgomery_native 6s 3s +100%
unpack_hints 6s 5s +20%
caddq 5s 2s +150%
keccak_init 5s 2s +150%
mld_h 5s 5s +0%
poly_caddq 5s 1s +400%
poly_uniform_eta 5s 2s +150%
poly_use_hint_c 5s 5s +0%
polyt0_pack 5s 5s +0%
polyveck_chknorm 5s 5s +0%
polyveck_shiftl 5s 6s -17%
polyvecl_uniform_gamma1_serial 5s 6s -17%
polyz_pack 5s 2s +150%
rej_eta 5s 5s +0%
rej_eta_native 5s 4s +25%
sign_signature 5s 5s +0%
sign_verify 5s 5s +0%
sign_verify_pre_hash_shake256 5s 4s +25%
unpack_sig 5s 3s +67%
intt_native_x86_64 4s 4s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 4s 2s +100%
keccak_squeezeblocks_x4 4s 6s -33%
keccakf1600x4_permute 4s 3s +33%
pack_sk_rho_key_tr_s2_t0 4s 4s +0%
pointwise_native_aarch64 4s 2s +100%
poly_caddq_native 4s 4s +0%
poly_invntt_tomont 4s 4s +0%
poly_invntt_tomont_native 4s 3s +33%
poly_ntt 4s 2s +100%
poly_ntt_native 4s 3s +33%
poly_pointwise_montgomery_native 4s 3s +33%
poly_shiftl 4s 1s +300%
poly_sub 4s 5s -20%
poly_uniform_gamma1_4x 4s 3s +33%
polyveck_pack_t0 4s 2s +100%
polyvecl_unpack_eta 4s 3s +33%
polyvecl_unpack_z 4s 4s +0%
reduce32 4s 4s +0%
shake128_release 4s 3s +33%
sign_signature_extmu 4s 4s +0%
sign_signature_pre_hash_internal 4s 4s +0%
sign_verify_extmu 4s 4s +0%
sys_check_capability 4s 2s +100%
decompose 3s 3s +0%
fqscale 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 3s +0%
keccak_finalize 3s 2s +50%
keccak_squeeze 3s 2s +50%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_xor_bytes 3s 3s +0%
make_hint 3s 6s -50%
mld_ct_abs_i32 3s 1s +200%
mld_ct_cmask_nonzero_u32 3s 4s -25%
mld_ct_cmask_nonzero_u8 3s 4s -25%
mld_ct_get_optblocker_u32 3s 3s +0%
mld_keccakf1600_extract_bytes 3s 4s -25%
mld_value_barrier_i64 3s 4s -25%
montgomery_reduce 3s 5s -40%
pack_sig_c 3s 3s +0%
pack_sig_h_poly 3s 5s -40%
pack_sig_z 3s 3s +0%
pack_sk_s1 3s 3s +0%
poly_caddq_native_aarch64 3s 2s +50%
poly_challenge 3s 3s +0%
poly_chknorm_native 3s 2s +50%
poly_chknorm_native_aarch64 3s 5s -40%
poly_decompose 3s 4s -25%
poly_ntt_c 3s 3s +0%
poly_permute_bitrev_to_custom_optional 3s - new
poly_pointwise_montgomery 3s 4s -25%
poly_uniform_gamma1 3s 6s -50%
poly_use_hint 3s 2s +50%
polyt1_pack 3s 3s +0%
polyt1_unpack 3s 4s -25%
polyveck_pack_w1 3s 2s +50%
polyveck_unpack_eta 3s 3s +0%
polyveck_unpack_t0 3s 7s -57%
polyvecl_chknorm 3s 4s -25%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyw1_pack 3s 5s -40%
polyz_unpack 3s 1s +200%
power2round 3s 1s +200%
shake128_finalize 3s 3s +0%
shake128_squeeze 3s 2s +50%
shake128x4_absorb_once 3s 2s +50%
shake256 3s 4s -25%
shake256_release 3s 3s +0%
shake256_squeeze 3s 2s +50%
shake256x4_squeezeblocks 3s 5s -40%
sign_open 3s 5s -40%
sign_signature_pre_hash_shake256 3s 4s -25%
sign_verify_pre_hash_internal 3s 2s +50%
keccak_f1600_x1_native_aarch64_v84a 2s 4s -50%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 2s +0%
keccakf1600_xor_bytes 2s 3s -33%
mld_ct_cmask_neg_i32 2s 1s +100%
mld_ct_get_optblocker_i64 2s 1s +100%
mld_ct_sel_int32 2s 2s +0%
mld_value_barrier_u8 2s 3s -33%
ntt_native_aarch64 2s 4s -50%
ntt_native_x86_64 2s 4s -50%
pack_pk 2s 2s +0%
poly_chknorm 2s 3s -33%
poly_make_hint 2s 3s -33%
poly_reduce 2s 3s -33%
poly_use_hint_native 2s 3s -33%
polyveck_pack_eta 2s 3s -33%
polyvecl_uniform_gamma1 2s 3s -33%
polyz_unpack_native 2s 3s -33%
rej_eta_c 2s 4s -50%
shake128_absorb 2s 1s +100%
shake128_init 2s 2s +0%
shake128x4_squeezeblocks 2s 3s -33%
shake256_init 2s 2s +0%
shake256x4_absorb_once 2s 2s +0%
sign_keypair 2s 4s -50%
unpack_pk 2s 2s +0%
use_hint 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 4s -75%
keccakf1600x4_extract_bytes 1s 3s -67%
mld_ct_get_optblocker_u8 1s 3s -67%
mld_value_barrier_u32 1s 3s -67%
poly_decompose_native 1s 2s -50%
shake256_absorb 1s 2s -50%
shake256_finalize 1s 1s +0%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 5, 2026

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof Status Current Previous Change
keccakf1600x4_permute_native ⚠️ 23s 14s +64%
Full Results (185 proofs)
Proof Status Current Previous Change
**TOTAL** 1687s 1712s -1.5%
sign_verify_internal 216s 153s +41%
poly_pointwise_montgomery_c 136s 143s -5%
rej_uniform_native 130s 134s -3%
polyvecl_pointwise_acc_montgomery_c 94s 155s -39%
mld_ct_memcmp 70s 73s -4%
mld_invntt_layer 60s 63s -5%
mld_ntt_layer 49s 50s -2%
mld_attempt_signature_generation 45s 49s -8%
polyvec_matrix_expand_eager 33s - new
keccakf1600x4_permute_native ⚠️ 23s 14s +64%
fqmul 22s 20s +10%
sign_keypair_internal 22s 19s +16%
rej_uniform 19s 23s -17%
sign_signature_internal 19s 17s +12%
poly_chknorm_c 18s 22s -18%
sign_pk_from_sk 16s 16s +0%
poly_uniform_eta_4x 15s 16s -6%
polyeta_unpack 15s 13s +15%
rej_uniform_c 15s 13s +15%
poly_uniform_4x 14s 13s +8%
polyt0_unpack 12s 17s -29%
polyveck_decompose 11s 4s +175%
polyz_unpack_c 11s 12s -8%
mld_ntt_butterfly_block 10s 9s +11%
poly_add 10s 10s +0%
poly_decompose_c 10s 8s +25%
keccak_absorb_once_x4 9s 8s +12%
mld_check_pct 9s 10s -10%
polyvec_matrix_expand_eager_serial 9s - new
keccakf1600_permute 8s 8s +0%
pointwise_acc_native_aarch64 8s 5s +60%
poly_caddq_c 8s 6s +33%
polyveck_add 8s 7s +14%
polyveck_chknorm 8s 2s +300%
rej_eta_native 8s 8s +0%
keccak_squeezeblocks_x4 7s 4s +75%
keccakf1600_permute_native 7s 9s -22%
poly_caddq_native_aarch64 7s 3s +133%
polyveck_invntt_tomont 7s 7s +0%
sign_verify_pre_hash_shake256 7s 4s +75%
unpack_sk 7s 7s +0%
keccak_absorb 6s 8s -25%
mld_compute_pack_z 6s 8s -25%
mld_h 6s 2s +200%
ntt_native_x86_64 6s 3s +100%
poly_invntt_tomont_c 6s 6s +0%
polyt0_pack 6s 3s +100%
polyvec_matrix_pointwise_montgomery_eager 6s - new
polyveck_ntt 6s 5s +20%
polyveck_shiftl 6s 7s -14%
polyveck_use_hint 6s 5s +20%
sign 6s 6s +0%
sign_signature 6s 4s +50%
make_hint 5s 6s -17%
pointwise_acc_native_x86_64 5s 5s +0%
poly_ntt_c 5s 2s +150%
poly_ntt_native 5s 1s +400%
poly_uniform_gamma1_4x 5s 7s -29%
poly_use_hint_c 5s 6s -17%
polyveck_pack_eta 5s 4s +25%
polyveck_power2round 5s 11s -55%
polyveck_sub 5s 7s -29%
polyveck_unpack_eta 5s 5s +0%
polyvecl_chknorm 5s 4s +25%
polyvecl_uniform_gamma1 5s 2s +150%
reduce32 5s 2s +150%
shake128_init 5s 2s +150%
shake256x4_absorb_once 5s 2s +150%
sign_signature_pre_hash_shake256 5s 6s -17%
sign_verify 5s 5s +0%
unpack_hints 5s 5s +0%
unpack_sig 5s 3s +67%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 4s 2s +100%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 2s +100%
mld_sample_s1_s2_serial 4s 5s -20%
pack_sk_s1 4s 4s +0%
pointwise_native_aarch64 4s 3s +33%
poly_challenge 4s 5s -20%
poly_chknorm_native 4s 2s +100%
poly_invntt_tomont_native 4s 1s +300%
poly_make_hint 4s 2s +100%
poly_power2round 4s 7s -43%
poly_shiftl 4s 3s +33%
polyt1_pack 4s 2s +100%
polyveck_caddq 4s 6s -33%
polyveck_pack_w1 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 3s +33%
polyvecl_unpack_eta 4s 2s +100%
polyw1_pack 4s 4s +0%
polyz_unpack 4s 4s +0%
power2round 4s 5s -20%
rej_eta 4s 2s +100%
shake128_squeeze 4s 2s +100%
shake128x4_absorb_once 4s 2s +100%
sign_keypair 4s 5s -20%
sign_open 4s 6s -33%
sign_signature_extmu 4s 2s +100%
sign_signature_pre_hash_internal 4s 3s +33%
sign_verify_extmu 4s 5s -20%
use_hint 4s 3s +33%
caddq 3s 2s +50%
decompose 3s 2s +50%
fqscale 3s 4s -25%
keccak_f1600_x1_native_aarch64 3s 1s +200%
keccakf1600_xor_bytes 3s 2s +50%
mld_ct_abs_i32 3s 2s +50%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_ct_get_optblocker_u32 3s 3s +0%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_prepare_domain_separation_prefix 3s 4s -25%
montgomery_reduce 3s 4s -25%
pack_pk 3s 3s +0%
pack_sk_rho_key_tr_s2_t0 3s 2s +50%
pointwise_native_x86_64 3s 3s +0%
poly_caddq_native 3s 4s -25%
poly_chknorm 3s 3s +0%
poly_decompose_native 3s 1s +200%
poly_invntt_tomont 3s 4s -25%
poly_ntt 3s 3s +0%
poly_pointwise_montgomery 3s 4s -25%
poly_pointwise_montgomery_native 3s 5s -40%
poly_sub 3s 4s -25%
poly_uniform 3s 4s -25%
polyeta_pack 3s 2s +50%
polyt1_unpack 3s 4s -25%
polyveck_pointwise_poly_montgomery 3s 6s -50%
polyveck_reduce 3s 4s -25%
polyvecl_ntt 3s 4s -25%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_unpack_z 3s 3s +0%
rej_eta_c 3s 3s +0%
shake128_absorb 3s 3s +0%
shake128_release 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_init 3s 3s +0%
shake256_release 3s 4s -25%
shake256x4_squeezeblocks 3s 1s +200%
sign_verify_pre_hash_internal 3s 2s +50%
intt_native_x86_64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_finalize 2s 2s +0%
keccak_init 2s 2s +0%
keccak_squeeze 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 3s -33%
keccakf1600x4_extract_bytes 2s 1s +100%
keccakf1600x4_permute 2s 2s +0%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_ct_get_optblocker_i64 2s 4s -50%
mld_sample_s1_s2 2s 7s -71%
mld_value_barrier_i64 2s 1s +100%
mld_value_barrier_u32 2s 2s +0%
ntt_native_aarch64 2s 4s -50%
pack_sig_h_poly 2s 4s -50%
poly_decompose 2s 3s -33%
poly_permute_bitrev_to_custom_optional 2s - new
poly_permute_bitrev_to_custom_optional_native 2s - new
poly_reduce 2s 3s -33%
poly_uniform_eta 2s 5s -60%
poly_uniform_gamma1 2s 4s -50%
poly_use_hint 2s 3s -33%
poly_use_hint_native 2s 4s -50%
polyveck_pack_t0 2s 5s -60%
polyveck_unpack_t0 2s 4s -50%
polyvecl_pack_eta 2s 2s +0%
polyvecl_pointwise_acc_montgomery_native 2s 5s -60%
polyz_pack 2s 4s -50%
polyz_unpack_native 2s 4s -50%
sys_check_capability 2s 3s -33%
unpack_pk 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
mld_ct_get_optblocker_u8 1s 2s -50%
mld_ct_sel_int32 1s 2s -50%
mld_value_barrier_u8 1s 1s +0%
pack_sig_c 1s 3s -67%
pack_sig_z 1s 3s -67%
poly_caddq 1s 3s -67%
poly_chknorm_native_aarch64 1s 4s -75%
shake128_finalize 1s 3s -67%
shake128x4_squeezeblocks 1s 1s +0%
shake256 1s 4s -75%
shake256_finalize 1s 1s +0%
shake256_squeeze 1s 1s +0%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 5, 2026

CBMC Results (ML-DSA-87)

⚠️ Attention Required

Proof Status Current Previous Change
keccakf1600x4_permute_native ⚠️ 24s 15s +60%
Full Results (185 proofs)
Proof Status Current Previous Change
**TOTAL** 2280s 3260s -30.1%
sign_verify_internal 279s 245s +14%
polyvecl_pointwise_acc_montgomery_c 268s 1095s -76%
poly_pointwise_montgomery_c 151s 154s -2%
polyvec_matrix_expand_eager 141s - new
rej_uniform_native 137s 144s -5%
mld_attempt_signature_generation 78s 89s -12%
mld_ct_memcmp 72s 75s -4%
mld_invntt_layer 62s 66s -6%
polyvec_matrix_expand_eager_serial 61s - new
mld_ntt_layer 51s 53s -4%
sign_keypair_internal 51s 48s +6%
sign_pk_from_sk 26s 24s +8%
sign_signature_internal 25s 36s -31%
keccakf1600x4_permute_native ⚠️ 24s 15s +60%
rej_uniform 21s 23s -9%
fqmul 19s 21s -10%
poly_chknorm_c 18s 21s -14%
poly_uniform_eta_4x 17s 16s +6%
polyveck_chknorm 16s 6s +167%
polyveck_decompose 16s 19s -16%
rej_uniform_c 16s 14s +14%
polyeta_unpack 15s 15s +0%
poly_uniform_4x 14s 15s -7%
polyveck_add 14s 12s +17%
polyvec_matrix_pointwise_montgomery_eager 13s - new
mld_check_pct 11s 10s +10%
mld_compute_pack_z 11s 8s +38%
polyt0_unpack 11s 13s -15%
mld_sample_s1_s2 10s 7s +43%
poly_add 10s 11s -9%
polyveck_caddq 10s 8s +25%
mld_ntt_butterfly_block 9s 12s -25%
polyveck_pointwise_poly_montgomery 9s 7s +29%
polyveck_power2round 9s 8s +12%
polyveck_use_hint 9s 8s +12%
polyz_unpack_c 9s 8s +12%
rej_eta_native 9s 6s +50%
sign 9s 8s +12%
unpack_sk 9s 10s -10%
keccak_absorb_once_x4 8s 9s -11%
keccakf1600_permute_native 8s 9s -11%
pointwise_acc_native_x86_64 8s 5s +60%
polyveck_invntt_tomont 8s 27s -70%
polyveck_reduce 8s 8s +0%
polyveck_shiftl 8s 8s +0%
keccak_absorb 7s 9s -22%
keccakf1600_permute 7s 9s -22%
pointwise_acc_native_aarch64 7s 8s -12%
poly_caddq_c 7s 8s -12%
polyveck_sub 7s 7s +0%
polyvecl_ntt 7s 8s -12%
keccakf1600x4_xor_bytes 6s 5s +20%
pack_sig_c 6s 3s +100%
pointwise_native_aarch64 6s 5s +20%
poly_decompose_c 6s 5s +20%
poly_power2round 6s 7s -14%
poly_use_hint 6s 2s +200%
polyveck_ntt 6s 7s -14%
polyvecl_chknorm 6s 7s -14%
sign_open 6s 4s +50%
unpack_hints 6s 4s +50%
keccak_squeezeblocks_x4 5s 5s +0%
mld_h 5s 7s -29%
poly_ntt_native 5s 3s +67%
polyt0_pack 5s 5s +0%
polyvecl_pointwise_acc_montgomery 5s 3s +67%
polyvecl_pointwise_acc_montgomery_native 5s 4s +25%
rej_eta_c 5s 8s -38%
shake128_squeeze 5s 2s +150%
sign_keypair 5s 3s +67%
sign_signature 5s 4s +25%
sign_signature_extmu 5s 3s +67%
sign_signature_pre_hash_internal 5s 5s +0%
sign_verify_extmu 5s 4s +25%
make_hint 4s 3s +33%
mld_ct_get_optblocker_u32 4s 1s +300%
mld_prepare_domain_separation_prefix 4s 7s -43%
mld_sample_s1_s2_serial 4s 6s -33%
ntt_native_x86_64 4s 6s -33%
poly_caddq 4s 2s +100%
poly_challenge 4s 4s +0%
poly_chknorm_native 4s 2s +100%
poly_decompose 4s 2s +100%
poly_decompose_native 4s 4s +0%
poly_invntt_tomont_c 4s 5s -20%
poly_invntt_tomont_native 4s 3s +33%
poly_make_hint 4s 4s +0%
poly_pointwise_montgomery_native 4s 2s +100%
poly_uniform 4s 6s -33%
poly_uniform_eta 4s 5s -20%
polyeta_pack 4s 2s +100%
polyveck_pack_w1 4s 3s +33%
polyveck_unpack_eta 4s 5s -20%
polyvecl_uniform_gamma1 4s 5s -20%
rej_eta 4s 4s +0%
shake256_squeeze 4s 2s +100%
shake256x4_squeezeblocks 4s 4s +0%
sign_signature_pre_hash_shake256 4s 6s -33%
sign_verify_pre_hash_internal 4s 3s +33%
sign_verify_pre_hash_shake256 4s 4s +0%
unpack_sig 4s 4s +0%
use_hint 4s 3s +33%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 1s +200%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 1s +200%
keccak_finalize 3s 2s +50%
keccak_init 3s 2s +50%
keccakf1600_xor_bytes 3s 1s +200%
keccakf1600_xor_bytes (big endian) 3s 4s -25%
keccakf1600x4_permute 3s 2s +50%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 5s -40%
mld_ct_cmask_nonzero_u8 3s 4s -25%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_ct_get_optblocker_u8 3s 1s +200%
mld_ct_sel_int32 3s 2s +50%
mld_value_barrier_u32 3s 1s +200%
mld_value_barrier_u8 3s 1s +200%
ntt_native_aarch64 3s 5s -40%
pack_pk 3s 5s -40%
pack_sig_h_poly 3s 3s +0%
pack_sig_z 3s 5s -40%
pointwise_native_x86_64 3s 5s -40%
poly_caddq_native 3s 6s -50%
poly_caddq_native_aarch64 3s 4s -25%
poly_chknorm 3s 3s +0%
poly_invntt_tomont 3s 4s -25%
poly_ntt_c 3s 4s -25%
poly_permute_bitrev_to_custom_optional_native 3s - new
poly_reduce 3s 3s +0%
poly_shiftl 3s 3s +0%
poly_sub 3s 3s +0%
poly_uniform_gamma1 3s 2s +50%
poly_uniform_gamma1_4x 3s 4s -25%
poly_use_hint_c 3s 3s +0%
poly_use_hint_native 3s 3s +0%
polyt1_pack 3s 3s +0%
polyt1_unpack 3s 4s -25%
polyveck_pack_eta 3s 3s +0%
polyveck_pack_t0 3s 3s +0%
polyveck_unpack_t0 3s 5s -40%
polyvecl_pack_eta 3s 2s +50%
polyvecl_uniform_gamma1_serial 3s 5s -40%
polyvecl_unpack_z 3s 2s +50%
polyz_pack 3s 5s -40%
polyz_unpack 3s 4s -25%
power2round 3s 4s -25%
shake128_finalize 3s 4s -25%
shake256_finalize 3s 2s +50%
shake256_init 3s 2s +50%
shake256x4_absorb_once 3s 1s +200%
sign_verify 3s 4s -25%
sys_check_capability 3s 4s -25%
decompose 2s 2s +0%
fqscale 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_squeeze 2s 4s -50%
keccakf1600_extract_bytes (big endian) 2s 2s +0%
keccakf1600x4_extract_bytes 2s 5s -60%
mld_ct_abs_i32 2s 3s -33%
mld_value_barrier_i64 2s 1s +100%
montgomery_reduce 2s 4s -50%
pack_sk_rho_key_tr_s2_t0 2s 3s -33%
pack_sk_s1 2s 5s -60%
poly_chknorm_native_aarch64 2s 3s -33%
poly_ntt 2s 4s -50%
poly_pointwise_montgomery 2s 4s -50%
polyvecl_unpack_eta 2s 3s -33%
polyw1_pack 2s 2s +0%
polyz_unpack_native 2s 3s -33%
shake128_absorb 2s 2s +0%
shake128_init 2s 3s -33%
shake128x4_absorb_once 2s 2s +0%
shake128x4_squeezeblocks 2s 3s -33%
shake256 2s 3s -33%
shake256_absorb 2s 3s -33%
shake256_release 2s 2s +0%
unpack_pk 2s 4s -50%
caddq 1s 2s -50%
mld_keccakf1600_extract_bytes 1s 1s +0%
poly_permute_bitrev_to_custom_optional 1s - new
reduce32 1s 3s -67%
shake128_release 1s 7s -86%

Comment thread mldsa/src/polyvec_lazy.h
Comment thread mldsa/src/polyvec_lazy.c Outdated
Comment thread mldsa/src/polyvec.h
Comment thread mldsa/src/polyvec_lazy.c Outdated
Comment thread mldsa/src/polyvec_lazy.c Outdated
Comment thread mldsa/src/polyvec_lazy.c Outdated
Comment thread mldsa/src/polyvec_lazy.c Outdated
Comment thread mldsa/src/polyvec_lazy.h Outdated
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the complexity we're building up here. There are (too) many small functions which make the code very difficult to oversee and which, I believe, aren't all necessary -- see comments. Let's see if we can clean this up a bit more before merging.

In principle, I'm OK with the optimization, though I don't think it's necessary to meet a 32K RAM target -- assuming all the other optimizations get merged, it seems like the row-by-row expansion is already enough? The latter is less intrusive and more performant since it allows you to still use the faster vector-vector scalar product.

@mkannwischer
Copy link
Copy Markdown
Contributor Author

In principle, I'm OK with the optimization, though I don't think it's necessary to meet a 32K RAM target -- assuming all the other optimizations get merged, it seems like the row-by-row expansion is already enough? The latter is less intrusive and more performant since it allows you to still use the faster vector-vector scalar product.

If speed is a goal, the first optimization in REDUCE_RAM mode to drop is the recomputation of y (#1031). That costs an L-polyvec and saves a lot of Keccak invocations inside of the main signing loop. I'm fairly sure this outweighs the gains made by vector-vector polymul on most platforms.

@mkannwischer mkannwischer force-pushed the lowram-stream-a branch 3 times, most recently from 50a8e16 to ee70381 Compare April 19, 2026 04:33
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 269950 cycles 269581 cycles 1.00
ML-DSA-44 sign 806100 cycles 805552 cycles 1.00
ML-DSA-44 verify 273009 cycles 273206 cycles 1.00
ML-DSA-65 keypair 464042 cycles 464038 cycles 1.00
ML-DSA-65 sign 1315384 cycles 1318075 cycles 1.00
ML-DSA-65 verify 451650 cycles 450690 cycles 1.00
ML-DSA-87 keypair 791862 cycles 791078 cycles 1.00
ML-DSA-87 sign 1792119 cycles 1789357 cycles 1.00
ML-DSA-87 verify 775222 cycles 775586 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 933ad18 Previous: f57eae6 Ratio
ML-DSA-44 keypair 34120 cycles 34169 cycles 1.00
ML-DSA-44 sign 120131 cycles 119846 cycles 1.00
ML-DSA-44 verify 38160 cycles 38056 cycles 1.00
ML-DSA-65 keypair 60035 cycles 59700 cycles 1.01
ML-DSA-65 sign 200707 cycles 199973 cycles 1.00
ML-DSA-65 verify 62810 cycles 62637 cycles 1.00
ML-DSA-87 keypair 92709 cycles 94165 cycles 0.98
ML-DSA-87 sign 236225 cycles 236812 cycles 1.00
ML-DSA-87 verify 94324 cycles 96066 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: f57eae6 Ratio
ML-DSA-44 keypair 94250 cycles 94048 cycles 1.00
ML-DSA-44 sign 332883 cycles 333043 cycles 1.00
ML-DSA-44 verify 99360 cycles 99382 cycles 1.00
ML-DSA-65 keypair 159206 cycles 158957 cycles 1.00
ML-DSA-65 sign 544225 cycles 544071 cycles 1.00
ML-DSA-65 verify 161244 cycles 161582 cycles 1.00
ML-DSA-87 keypair 265958 cycles 266114 cycles 1.00
ML-DSA-87 sign 707822 cycles 708527 cycles 1.00
ML-DSA-87 verify 272096 cycles 272454 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 68352 cycles 68455 cycles 1.00
ML-DSA-44 sign 187631 cycles 187785 cycles 1.00
ML-DSA-44 verify 68864 cycles 68741 cycles 1.00
ML-DSA-65 keypair 118500 cycles 118575 cycles 1.00
ML-DSA-65 sign 300122 cycles 300571 cycles 1.00
ML-DSA-65 verify 115324 cycles 115341 cycles 1.00
ML-DSA-87 keypair 202213 cycles 202501 cycles 1.00
ML-DSA-87 sign 395184 cycles 395925 cycles 1.00
ML-DSA-87 verify 194567 cycles 195166 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 112330 cycles 112291 cycles 1.00
ML-DSA-44 sign 356038 cycles 355994 cycles 1.00
ML-DSA-44 verify 117692 cycles 117673 cycles 1.00
ML-DSA-65 keypair 194848 cycles 195029 cycles 1.00
ML-DSA-65 sign 587290 cycles 587316 cycles 1.00
ML-DSA-65 verify 194025 cycles 194218 cycles 1.00
ML-DSA-87 keypair 320688 cycles 320593 cycles 1.00
ML-DSA-87 sign 752370 cycles 752846 cycles 1.00
ML-DSA-87 verify 319539 cycles 319380 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 55876 cycles 55942 cycles 1.00
ML-DSA-44 sign 181515 cycles 181646 cycles 1.00
ML-DSA-44 verify 61058 cycles 61087 cycles 1.00
ML-DSA-65 keypair 97628 cycles 97794 cycles 1.00
ML-DSA-65 sign 296968 cycles 299830 cycles 0.99
ML-DSA-65 verify 99998 cycles 100506 cycles 0.99
ML-DSA-87 keypair 164989 cycles 155883 cycles 1.06
ML-DSA-87 sign 362613 cycles 358927 cycles 1.01
ML-DSA-87 verify 160957 cycles 156304 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-87 keypair 164989 cycles 155883 cycles 1.06

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 134474 cycles 134298 cycles 1.00
ML-DSA-44 sign 524946 cycles 525354 cycles 1.00
ML-DSA-44 verify 146921 cycles 147059 cycles 1.00
ML-DSA-65 keypair 225592 cycles 226152 cycles 1.00
ML-DSA-65 sign 848197 cycles 847920 cycles 1.00
ML-DSA-65 verify 234858 cycles 234621 cycles 1.00
ML-DSA-87 keypair 370606 cycles 370244 cycles 1.00
ML-DSA-87 sign 1069071 cycles 1069642 cycles 1.00
ML-DSA-87 verify 382838 cycles 383256 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 40937 cycles 41700 cycles 0.98
ML-DSA-44 sign 134014 cycles 133791 cycles 1.00
ML-DSA-44 verify 44304 cycles 44598 cycles 0.99
ML-DSA-65 keypair 71745 cycles 73806 cycles 0.97
ML-DSA-65 sign 213973 cycles 217561 cycles 0.98
ML-DSA-65 verify 74133 cycles 74019 cycles 1.00
ML-DSA-87 keypair 112631 cycles 108217 cycles 1.04
ML-DSA-87 sign 254435 cycles 249678 cycles 1.02
ML-DSA-87 verify 115420 cycles 109668 cycles 1.05

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-87 keypair 112631 cycles 108217 cycles 1.04
ML-DSA-87 verify 115420 cycles 109668 cycles 1.05

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 67548 cycles 67519 cycles 1.00
ML-DSA-44 sign 203256 cycles 202994 cycles 1.00
ML-DSA-44 verify 70708 cycles 70652 cycles 1.00
ML-DSA-65 keypair 120034 cycles 120143 cycles 1.00
ML-DSA-65 sign 330759 cycles 331159 cycles 1.00
ML-DSA-65 verify 117620 cycles 117687 cycles 1.00
ML-DSA-87 keypair 197135 cycles 197530 cycles 1.00
ML-DSA-87 sign 428294 cycles 429269 cycles 1.00
ML-DSA-87 verify 194439 cycles 194379 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 156935 cycles 157262 cycles 1.00
ML-DSA-44 sign 546406 cycles 547323 cycles 1.00
ML-DSA-44 verify 168862 cycles 168995 cycles 1.00
ML-DSA-65 keypair 268189 cycles 269676 cycles 0.99
ML-DSA-65 sign 895287 cycles 899547 cycles 1.00
ML-DSA-65 verify 274987 cycles 275798 cycles 1.00
ML-DSA-87 keypair 449412 cycles 453529 cycles 0.99
ML-DSA-87 sign 1159907 cycles 1165869 cycles 0.99
ML-DSA-87 verify 458899 cycles 463032 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 120172 cycles 123524 cycles 0.97
ML-DSA-44 sign 445464 cycles 458580 cycles 0.97
ML-DSA-44 verify 131023 cycles 133900 cycles 0.98
ML-DSA-65 keypair 204160 cycles 209892 cycles 0.97
ML-DSA-65 sign 722267 cycles 741074 cycles 0.97
ML-DSA-65 verify 210002 cycles 215575 cycles 0.97
ML-DSA-87 keypair 338859 cycles 345400 cycles 0.98
ML-DSA-87 sign 923385 cycles 936742 cycles 0.99
ML-DSA-87 verify 348127 cycles 352512 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 128107 cycles 128004 cycles 1.00
ML-DSA-44 sign 446022 cycles 445772 cycles 1.00
ML-DSA-44 verify 138229 cycles 142108 cycles 0.97
ML-DSA-65 keypair 220091 cycles 219983 cycles 1.00
ML-DSA-65 sign 720521 cycles 721241 cycles 1.00
ML-DSA-65 verify 222728 cycles 223192 cycles 1.00
ML-DSA-87 keypair 365646 cycles 389083 cycles 0.94
ML-DSA-87 sign 921270 cycles 920684 cycles 1.00
ML-DSA-87 verify 373447 cycles 373241 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 71794 cycles 71681 cycles 1.00
ML-DSA-44 sign 213095 cycles 212955 cycles 1.00
ML-DSA-44 verify 75733 cycles 75718 cycles 1.00
ML-DSA-65 keypair 126726 cycles 126715 cycles 1.00
ML-DSA-65 sign 348264 cycles 348747 cycles 1.00
ML-DSA-65 verify 125312 cycles 125219 cycles 1.00
ML-DSA-87 keypair 205241 cycles 207570 cycles 0.99
ML-DSA-87 sign 444871 cycles 448999 cycles 0.99
ML-DSA-87 verify 205500 cycles 205110 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 211923 cycles 211637 cycles 1.00
ML-DSA-44 sign 758486 cycles 758326 cycles 1.00
ML-DSA-44 verify 228907 cycles 228658 cycles 1.00
ML-DSA-65 keypair 378802 cycles 378077 cycles 1.00
ML-DSA-65 sign 1245502 cycles 1244464 cycles 1.00
ML-DSA-65 verify 370935 cycles 371521 cycles 1.00
ML-DSA-87 keypair 602399 cycles 604346 cycles 1.00
ML-DSA-87 sign 1585487 cycles 1587530 cycles 1.00
ML-DSA-87 verify 618042 cycles 617552 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 138321 cycles 138216 cycles 1.00
ML-DSA-44 sign 482702 cycles 482639 cycles 1.00
ML-DSA-44 verify 148768 cycles 156489 cycles 0.95
ML-DSA-65 keypair 241490 cycles 241428 cycles 1.00
ML-DSA-65 sign 786734 cycles 786658 cycles 1.00
ML-DSA-65 verify 240649 cycles 241139 cycles 1.00
ML-DSA-87 keypair 395311 cycles 443622 cycles 0.89
ML-DSA-87 sign 1006802 cycles 1006183 cycles 1.00
ML-DSA-87 verify 402947 cycles 402491 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 458730 cycles 461159 cycles 0.99
ML-DSA-44 sign 2126190 cycles 2128509 cycles 1.00
ML-DSA-44 verify 546750 cycles 550232 cycles 0.99
ML-DSA-65 keypair 771666 cycles 774829 cycles 1.00
ML-DSA-65 sign 3458520 cycles 3481060 cycles 0.99
ML-DSA-65 verify 850160 cycles 854237 cycles 1.00
ML-DSA-87 keypair 1242634 cycles 1252977 cycles 0.99
ML-DSA-87 sign 4261047 cycles 4312551 cycles 0.99
ML-DSA-87 verify 1365314 cycles 1383952 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 112688 cycles 112603 cycles 1.00
ML-DSA-44 sign 356591 cycles 356703 cycles 1.00
ML-DSA-44 verify 118169 cycles 118040 cycles 1.00
ML-DSA-65 keypair 195228 cycles 195024 cycles 1.00
ML-DSA-65 sign 587818 cycles 587526 cycles 1.00
ML-DSA-65 verify 194638 cycles 194321 cycles 1.00
ML-DSA-87 keypair 321588 cycles 320970 cycles 1.00
ML-DSA-87 sign 753806 cycles 753974 cycles 1.00
ML-DSA-87 verify 319966 cycles 319985 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 212209 cycles 212778 cycles 1.00
ML-DSA-44 sign 759219 cycles 760574 cycles 1.00
ML-DSA-44 verify 229397 cycles 233684 cycles 0.98
ML-DSA-65 keypair 378868 cycles 379190 cycles 1.00
ML-DSA-65 sign 1245431 cycles 1246848 cycles 1.00
ML-DSA-65 verify 372168 cycles 371506 cycles 1.00
ML-DSA-87 keypair 646219 cycles 621559 cycles 1.04
ML-DSA-87 sign 1589407 cycles 1586087 cycles 1.00
ML-DSA-87 verify 617945 cycles 618164 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-87 keypair 646219 cycles 621559 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 822306 cycles 822311 cycles 1.00
ML-DSA-44 sign 3233057 cycles 3234193 cycles 1.00
ML-DSA-44 verify 921679 cycles 921704 cycles 1.00
ML-DSA-65 keypair 1389889 cycles 1395328 cycles 1.00
ML-DSA-65 sign 5268058 cycles 5257303 cycles 1.00
ML-DSA-65 verify 1472895 cycles 1473562 cycles 1.00
ML-DSA-87 keypair 2298504 cycles 2301175 cycles 1.00
ML-DSA-87 sign 6613086 cycles 6645214 cycles 1.00
ML-DSA-87 verify 2409667 cycles 2418420 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 220428 cycles 222878 cycles 0.99
ML-DSA-44 sign 604384 cycles 611922 cycles 0.99
ML-DSA-44 verify 217235 cycles 220194 cycles 0.99
ML-DSA-65 keypair 398344 cycles 387594 cycles 1.03
ML-DSA-65 sign 1035800 cycles 1002001 cycles 1.03
ML-DSA-65 verify 382226 cycles 368900 cycles 1.04
ML-DSA-87 keypair 671079 cycles 657172 cycles 1.02
ML-DSA-87 sign 1399402 cycles 1378779 cycles 1.01
ML-DSA-87 verify 638082 cycles 637541 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-65 sign 1035800 cycles 1002001 cycles 1.03
ML-DSA-65 verify 382226 cycles 368900 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 933ad18 Previous: 7cc8dc1 Ratio
ML-DSA-44 keypair 304623 cycles 308136 cycles 0.99
ML-DSA-44 sign 1180931 cycles 1205985 cycles 0.98
ML-DSA-44 verify 345876 cycles 341143 cycles 1.01
ML-DSA-65 keypair 572371 cycles 583043 cycles 0.98
ML-DSA-65 sign 1962850 cycles 2004177 cycles 0.98
ML-DSA-65 verify 544392 cycles 568318 cycles 0.96
ML-DSA-87 keypair 888067 cycles 916596 cycles 0.97
ML-DSA-87 sign 2444956 cycles 2554112 cycles 0.96
ML-DSA-87 verify 896704 cycles 920344 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Ports pq-code-package/mlkem-native#1654

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer merged commit 56bfbee into main Apr 20, 2026
799 of 800 checks passed
@mkannwischer mkannwischer deleted the lowram-stream-a branch April 20, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants