lowram: Stream matrix A element-by-element to reduce memory#1019
lowram: Stream matrix A element-by-element to reduce memory#1019mkannwischer merged 2 commits intomainfrom
Conversation
7013003 to
5ccd3c2
Compare
CBMC Results (ML-DSA-65)
Full Results (185 proofs)
|
CBMC Results (ML-DSA-44)
Full Results (185 proofs)
|
CBMC Results (ML-DSA-87)
Full Results (185 proofs)
|
3acbf60 to
9ba57ff
Compare
9ba57ff to
bb6bd8b
Compare
hanno-becker
left a comment
There was a problem hiding this comment.
I'm worried about the complexity we're building up here. There are (too) many small functions which make the code very difficult to oversee and which, I believe, aren't all necessary -- see comments. Let's see if we can clean this up a bit more before merging.
In principle, I'm OK with the optimization, though I don't think it's necessary to meet a 32K RAM target -- assuming all the other optimizations get merged, it seems like the row-by-row expansion is already enough? The latter is less intrusive and more performant since it allows you to still use the faster vector-vector scalar product.
If speed is a goal, the first optimization in |
50a8e16 to
ee70381
Compare
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
269950 cycles |
269581 cycles |
1.00 |
ML-DSA-44 sign |
806100 cycles |
805552 cycles |
1.00 |
ML-DSA-44 verify |
273009 cycles |
273206 cycles |
1.00 |
ML-DSA-65 keypair |
464042 cycles |
464038 cycles |
1.00 |
ML-DSA-65 sign |
1315384 cycles |
1318075 cycles |
1.00 |
ML-DSA-65 verify |
451650 cycles |
450690 cycles |
1.00 |
ML-DSA-87 keypair |
791862 cycles |
791078 cycles |
1.00 |
ML-DSA-87 sign |
1792119 cycles |
1789357 cycles |
1.00 |
ML-DSA-87 verify |
775222 cycles |
775586 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 933ad18 | Previous: f57eae6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34120 cycles |
34169 cycles |
1.00 |
ML-DSA-44 sign |
120131 cycles |
119846 cycles |
1.00 |
ML-DSA-44 verify |
38160 cycles |
38056 cycles |
1.00 |
ML-DSA-65 keypair |
60035 cycles |
59700 cycles |
1.01 |
ML-DSA-65 sign |
200707 cycles |
199973 cycles |
1.00 |
ML-DSA-65 verify |
62810 cycles |
62637 cycles |
1.00 |
ML-DSA-87 keypair |
92709 cycles |
94165 cycles |
0.98 |
ML-DSA-87 sign |
236225 cycles |
236812 cycles |
1.00 |
ML-DSA-87 verify |
94324 cycles |
96066 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: f57eae6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
94250 cycles |
94048 cycles |
1.00 |
ML-DSA-44 sign |
332883 cycles |
333043 cycles |
1.00 |
ML-DSA-44 verify |
99360 cycles |
99382 cycles |
1.00 |
ML-DSA-65 keypair |
159206 cycles |
158957 cycles |
1.00 |
ML-DSA-65 sign |
544225 cycles |
544071 cycles |
1.00 |
ML-DSA-65 verify |
161244 cycles |
161582 cycles |
1.00 |
ML-DSA-87 keypair |
265958 cycles |
266114 cycles |
1.00 |
ML-DSA-87 sign |
707822 cycles |
708527 cycles |
1.00 |
ML-DSA-87 verify |
272096 cycles |
272454 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68352 cycles |
68455 cycles |
1.00 |
ML-DSA-44 sign |
187631 cycles |
187785 cycles |
1.00 |
ML-DSA-44 verify |
68864 cycles |
68741 cycles |
1.00 |
ML-DSA-65 keypair |
118500 cycles |
118575 cycles |
1.00 |
ML-DSA-65 sign |
300122 cycles |
300571 cycles |
1.00 |
ML-DSA-65 verify |
115324 cycles |
115341 cycles |
1.00 |
ML-DSA-87 keypair |
202213 cycles |
202501 cycles |
1.00 |
ML-DSA-87 sign |
395184 cycles |
395925 cycles |
1.00 |
ML-DSA-87 verify |
194567 cycles |
195166 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112330 cycles |
112291 cycles |
1.00 |
ML-DSA-44 sign |
356038 cycles |
355994 cycles |
1.00 |
ML-DSA-44 verify |
117692 cycles |
117673 cycles |
1.00 |
ML-DSA-65 keypair |
194848 cycles |
195029 cycles |
1.00 |
ML-DSA-65 sign |
587290 cycles |
587316 cycles |
1.00 |
ML-DSA-65 verify |
194025 cycles |
194218 cycles |
1.00 |
ML-DSA-87 keypair |
320688 cycles |
320593 cycles |
1.00 |
ML-DSA-87 sign |
752370 cycles |
752846 cycles |
1.00 |
ML-DSA-87 verify |
319539 cycles |
319380 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
55876 cycles |
55942 cycles |
1.00 |
ML-DSA-44 sign |
181515 cycles |
181646 cycles |
1.00 |
ML-DSA-44 verify |
61058 cycles |
61087 cycles |
1.00 |
ML-DSA-65 keypair |
97628 cycles |
97794 cycles |
1.00 |
ML-DSA-65 sign |
296968 cycles |
299830 cycles |
0.99 |
ML-DSA-65 verify |
99998 cycles |
100506 cycles |
0.99 |
ML-DSA-87 keypair |
164989 cycles |
155883 cycles |
1.06 |
ML-DSA-87 sign |
362613 cycles |
358927 cycles |
1.01 |
ML-DSA-87 verify |
160957 cycles |
156304 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-87 keypair |
164989 cycles |
155883 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
134474 cycles |
134298 cycles |
1.00 |
ML-DSA-44 sign |
524946 cycles |
525354 cycles |
1.00 |
ML-DSA-44 verify |
146921 cycles |
147059 cycles |
1.00 |
ML-DSA-65 keypair |
225592 cycles |
226152 cycles |
1.00 |
ML-DSA-65 sign |
848197 cycles |
847920 cycles |
1.00 |
ML-DSA-65 verify |
234858 cycles |
234621 cycles |
1.00 |
ML-DSA-87 keypair |
370606 cycles |
370244 cycles |
1.00 |
ML-DSA-87 sign |
1069071 cycles |
1069642 cycles |
1.00 |
ML-DSA-87 verify |
382838 cycles |
383256 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
40937 cycles |
41700 cycles |
0.98 |
ML-DSA-44 sign |
134014 cycles |
133791 cycles |
1.00 |
ML-DSA-44 verify |
44304 cycles |
44598 cycles |
0.99 |
ML-DSA-65 keypair |
71745 cycles |
73806 cycles |
0.97 |
ML-DSA-65 sign |
213973 cycles |
217561 cycles |
0.98 |
ML-DSA-65 verify |
74133 cycles |
74019 cycles |
1.00 |
ML-DSA-87 keypair |
112631 cycles |
108217 cycles |
1.04 |
ML-DSA-87 sign |
254435 cycles |
249678 cycles |
1.02 |
ML-DSA-87 verify |
115420 cycles |
109668 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-87 keypair |
112631 cycles |
108217 cycles |
1.04 |
ML-DSA-87 verify |
115420 cycles |
109668 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67548 cycles |
67519 cycles |
1.00 |
ML-DSA-44 sign |
203256 cycles |
202994 cycles |
1.00 |
ML-DSA-44 verify |
70708 cycles |
70652 cycles |
1.00 |
ML-DSA-65 keypair |
120034 cycles |
120143 cycles |
1.00 |
ML-DSA-65 sign |
330759 cycles |
331159 cycles |
1.00 |
ML-DSA-65 verify |
117620 cycles |
117687 cycles |
1.00 |
ML-DSA-87 keypair |
197135 cycles |
197530 cycles |
1.00 |
ML-DSA-87 sign |
428294 cycles |
429269 cycles |
1.00 |
ML-DSA-87 verify |
194439 cycles |
194379 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
156935 cycles |
157262 cycles |
1.00 |
ML-DSA-44 sign |
546406 cycles |
547323 cycles |
1.00 |
ML-DSA-44 verify |
168862 cycles |
168995 cycles |
1.00 |
ML-DSA-65 keypair |
268189 cycles |
269676 cycles |
0.99 |
ML-DSA-65 sign |
895287 cycles |
899547 cycles |
1.00 |
ML-DSA-65 verify |
274987 cycles |
275798 cycles |
1.00 |
ML-DSA-87 keypair |
449412 cycles |
453529 cycles |
0.99 |
ML-DSA-87 sign |
1159907 cycles |
1165869 cycles |
0.99 |
ML-DSA-87 verify |
458899 cycles |
463032 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120172 cycles |
123524 cycles |
0.97 |
ML-DSA-44 sign |
445464 cycles |
458580 cycles |
0.97 |
ML-DSA-44 verify |
131023 cycles |
133900 cycles |
0.98 |
ML-DSA-65 keypair |
204160 cycles |
209892 cycles |
0.97 |
ML-DSA-65 sign |
722267 cycles |
741074 cycles |
0.97 |
ML-DSA-65 verify |
210002 cycles |
215575 cycles |
0.97 |
ML-DSA-87 keypair |
338859 cycles |
345400 cycles |
0.98 |
ML-DSA-87 sign |
923385 cycles |
936742 cycles |
0.99 |
ML-DSA-87 verify |
348127 cycles |
352512 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128107 cycles |
128004 cycles |
1.00 |
ML-DSA-44 sign |
446022 cycles |
445772 cycles |
1.00 |
ML-DSA-44 verify |
138229 cycles |
142108 cycles |
0.97 |
ML-DSA-65 keypair |
220091 cycles |
219983 cycles |
1.00 |
ML-DSA-65 sign |
720521 cycles |
721241 cycles |
1.00 |
ML-DSA-65 verify |
222728 cycles |
223192 cycles |
1.00 |
ML-DSA-87 keypair |
365646 cycles |
389083 cycles |
0.94 |
ML-DSA-87 sign |
921270 cycles |
920684 cycles |
1.00 |
ML-DSA-87 verify |
373447 cycles |
373241 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71794 cycles |
71681 cycles |
1.00 |
ML-DSA-44 sign |
213095 cycles |
212955 cycles |
1.00 |
ML-DSA-44 verify |
75733 cycles |
75718 cycles |
1.00 |
ML-DSA-65 keypair |
126726 cycles |
126715 cycles |
1.00 |
ML-DSA-65 sign |
348264 cycles |
348747 cycles |
1.00 |
ML-DSA-65 verify |
125312 cycles |
125219 cycles |
1.00 |
ML-DSA-87 keypair |
205241 cycles |
207570 cycles |
0.99 |
ML-DSA-87 sign |
444871 cycles |
448999 cycles |
0.99 |
ML-DSA-87 verify |
205500 cycles |
205110 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
211923 cycles |
211637 cycles |
1.00 |
ML-DSA-44 sign |
758486 cycles |
758326 cycles |
1.00 |
ML-DSA-44 verify |
228907 cycles |
228658 cycles |
1.00 |
ML-DSA-65 keypair |
378802 cycles |
378077 cycles |
1.00 |
ML-DSA-65 sign |
1245502 cycles |
1244464 cycles |
1.00 |
ML-DSA-65 verify |
370935 cycles |
371521 cycles |
1.00 |
ML-DSA-87 keypair |
602399 cycles |
604346 cycles |
1.00 |
ML-DSA-87 sign |
1585487 cycles |
1587530 cycles |
1.00 |
ML-DSA-87 verify |
618042 cycles |
617552 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138321 cycles |
138216 cycles |
1.00 |
ML-DSA-44 sign |
482702 cycles |
482639 cycles |
1.00 |
ML-DSA-44 verify |
148768 cycles |
156489 cycles |
0.95 |
ML-DSA-65 keypair |
241490 cycles |
241428 cycles |
1.00 |
ML-DSA-65 sign |
786734 cycles |
786658 cycles |
1.00 |
ML-DSA-65 verify |
240649 cycles |
241139 cycles |
1.00 |
ML-DSA-87 keypair |
395311 cycles |
443622 cycles |
0.89 |
ML-DSA-87 sign |
1006802 cycles |
1006183 cycles |
1.00 |
ML-DSA-87 verify |
402947 cycles |
402491 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
458730 cycles |
461159 cycles |
0.99 |
ML-DSA-44 sign |
2126190 cycles |
2128509 cycles |
1.00 |
ML-DSA-44 verify |
546750 cycles |
550232 cycles |
0.99 |
ML-DSA-65 keypair |
771666 cycles |
774829 cycles |
1.00 |
ML-DSA-65 sign |
3458520 cycles |
3481060 cycles |
0.99 |
ML-DSA-65 verify |
850160 cycles |
854237 cycles |
1.00 |
ML-DSA-87 keypair |
1242634 cycles |
1252977 cycles |
0.99 |
ML-DSA-87 sign |
4261047 cycles |
4312551 cycles |
0.99 |
ML-DSA-87 verify |
1365314 cycles |
1383952 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112688 cycles |
112603 cycles |
1.00 |
ML-DSA-44 sign |
356591 cycles |
356703 cycles |
1.00 |
ML-DSA-44 verify |
118169 cycles |
118040 cycles |
1.00 |
ML-DSA-65 keypair |
195228 cycles |
195024 cycles |
1.00 |
ML-DSA-65 sign |
587818 cycles |
587526 cycles |
1.00 |
ML-DSA-65 verify |
194638 cycles |
194321 cycles |
1.00 |
ML-DSA-87 keypair |
321588 cycles |
320970 cycles |
1.00 |
ML-DSA-87 sign |
753806 cycles |
753974 cycles |
1.00 |
ML-DSA-87 verify |
319966 cycles |
319985 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212209 cycles |
212778 cycles |
1.00 |
ML-DSA-44 sign |
759219 cycles |
760574 cycles |
1.00 |
ML-DSA-44 verify |
229397 cycles |
233684 cycles |
0.98 |
ML-DSA-65 keypair |
378868 cycles |
379190 cycles |
1.00 |
ML-DSA-65 sign |
1245431 cycles |
1246848 cycles |
1.00 |
ML-DSA-65 verify |
372168 cycles |
371506 cycles |
1.00 |
ML-DSA-87 keypair |
646219 cycles |
621559 cycles |
1.04 |
ML-DSA-87 sign |
1589407 cycles |
1586087 cycles |
1.00 |
ML-DSA-87 verify |
617945 cycles |
618164 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-87 keypair |
646219 cycles |
621559 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
822306 cycles |
822311 cycles |
1.00 |
ML-DSA-44 sign |
3233057 cycles |
3234193 cycles |
1.00 |
ML-DSA-44 verify |
921679 cycles |
921704 cycles |
1.00 |
ML-DSA-65 keypair |
1389889 cycles |
1395328 cycles |
1.00 |
ML-DSA-65 sign |
5268058 cycles |
5257303 cycles |
1.00 |
ML-DSA-65 verify |
1472895 cycles |
1473562 cycles |
1.00 |
ML-DSA-87 keypair |
2298504 cycles |
2301175 cycles |
1.00 |
ML-DSA-87 sign |
6613086 cycles |
6645214 cycles |
1.00 |
ML-DSA-87 verify |
2409667 cycles |
2418420 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
220428 cycles |
222878 cycles |
0.99 |
ML-DSA-44 sign |
604384 cycles |
611922 cycles |
0.99 |
ML-DSA-44 verify |
217235 cycles |
220194 cycles |
0.99 |
ML-DSA-65 keypair |
398344 cycles |
387594 cycles |
1.03 |
ML-DSA-65 sign |
1035800 cycles |
1002001 cycles |
1.03 |
ML-DSA-65 verify |
382226 cycles |
368900 cycles |
1.04 |
ML-DSA-87 keypair |
671079 cycles |
657172 cycles |
1.02 |
ML-DSA-87 sign |
1399402 cycles |
1378779 cycles |
1.01 |
ML-DSA-87 verify |
638082 cycles |
637541 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-65 sign |
1035800 cycles |
1002001 cycles |
1.03 |
ML-DSA-65 verify |
382226 cycles |
368900 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 933ad18 | Previous: 7cc8dc1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
304623 cycles |
308136 cycles |
0.99 |
ML-DSA-44 sign |
1180931 cycles |
1205985 cycles |
0.98 |
ML-DSA-44 verify |
345876 cycles |
341143 cycles |
1.01 |
ML-DSA-65 keypair |
572371 cycles |
583043 cycles |
0.98 |
ML-DSA-65 sign |
1962850 cycles |
2004177 cycles |
0.98 |
ML-DSA-65 verify |
544392 cycles |
568318 cycles |
0.96 |
ML-DSA-87 keypair |
888067 cycles |
916596 cycles |
0.97 |
ML-DSA-87 sign |
2444956 cycles |
2554112 cycles |
0.96 |
ML-DSA-87 verify |
896704 cycles |
920344 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
Ports pq-code-package/mlkem-native#1654 Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Replace the row-level matrix buffer (mld_polyvecl) with a single-poly
buffer in REDUCE_RAM mode. In the lazy path, matrix elements A[k][l]
are sampled on demand one at a time, and the matrix-vector product
accumulates element-by-element instead of row-by-row.
Restructure polymat into eager/lazy variants following the same pattern
as s1hat/s2hat/t0hat:
implementations with CBMC contracts only on the eager variants
Move all polymat-related code from polyvec.h/polyvec.c into
polyvec_lazy.h/polyvec_lazy.c.