-
Notifications
You must be signed in to change notification settings - Fork 26
Update AVX2 decompose to use a more explainable (and very slightly faster) approach, along with bounds reasoning comments. #629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mkannwischer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jammychiou1.
I agree with the changes you made - it's easier to explain.
Please also remove the check-magic annotations concerning your comments and instead add these constants to the whitelist.
5082756 to
fa07d1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46416 cycles |
46421 cycles |
1.00 |
ML-DSA-44 sign |
132718 cycles |
132738 cycles |
1.00 |
ML-DSA-44 verify |
47837 cycles |
47840 cycles |
1.00 |
ML-DSA-65 keypair |
81452 cycles |
81443 cycles |
1.00 |
ML-DSA-65 sign |
219217 cycles |
219207 cycles |
1.00 |
ML-DSA-65 verify |
80136 cycles |
80134 cycles |
1.00 |
ML-DSA-87 keypair |
132753 cycles |
132758 cycles |
1.00 |
ML-DSA-87 sign |
280934 cycles |
280953 cycles |
1.00 |
ML-DSA-87 verify |
130316 cycles |
130326 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115272 cycles |
115262 cycles |
1.00 |
ML-DSA-44 sign |
431782 cycles |
431720 cycles |
1.00 |
ML-DSA-44 verify |
122176 cycles |
122167 cycles |
1.00 |
ML-DSA-65 keypair |
197436 cycles |
197490 cycles |
1.00 |
ML-DSA-65 sign |
700971 cycles |
701274 cycles |
1.00 |
ML-DSA-65 verify |
197693 cycles |
197702 cycles |
1.00 |
ML-DSA-87 keypair |
325389 cycles |
325412 cycles |
1.00 |
ML-DSA-87 sign |
884468 cycles |
884484 cycles |
1.00 |
ML-DSA-87 verify |
328634 cycles |
328655 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115718 cycles |
115652 cycles |
1.00 |
ML-DSA-44 sign |
377201 cycles |
377357 cycles |
1.00 |
ML-DSA-44 verify |
120344 cycles |
120215 cycles |
1.00 |
ML-DSA-65 keypair |
200127 cycles |
200073 cycles |
1.00 |
ML-DSA-65 sign |
623016 cycles |
622766 cycles |
1.00 |
ML-DSA-65 verify |
198223 cycles |
198195 cycles |
1.00 |
ML-DSA-87 keypair |
327615 cycles |
326756 cycles |
1.00 |
ML-DSA-87 sign |
791103 cycles |
789971 cycles |
1.00 |
ML-DSA-87 verify |
325264 cycles |
324409 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
35631 cycles |
35116 cycles |
1.01 |
ML-DSA-44 sign |
120705 cycles |
120958 cycles |
1.00 |
ML-DSA-44 verify |
38074 cycles |
38274 cycles |
0.99 |
ML-DSA-65 keypair |
61818 cycles |
62757 cycles |
0.99 |
ML-DSA-65 sign |
199188 cycles |
201252 cycles |
0.99 |
ML-DSA-65 verify |
62198 cycles |
62387 cycles |
1.00 |
ML-DSA-87 keypair |
94415 cycles |
94461 cycles |
1.00 |
ML-DSA-87 sign |
230678 cycles |
230993 cycles |
1.00 |
ML-DSA-87 verify |
94054 cycles |
95279 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
95091 cycles |
95412 cycles |
1.00 |
ML-DSA-44 sign |
349043 cycles |
349579 cycles |
1.00 |
ML-DSA-44 verify |
100848 cycles |
101012 cycles |
1.00 |
ML-DSA-65 keypair |
165116 cycles |
165049 cycles |
1.00 |
ML-DSA-65 sign |
566948 cycles |
567954 cycles |
1.00 |
ML-DSA-65 verify |
165483 cycles |
165700 cycles |
1.00 |
ML-DSA-87 keypair |
267238 cycles |
267808 cycles |
1.00 |
ML-DSA-87 sign |
723156 cycles |
723827 cycles |
1.00 |
ML-DSA-87 verify |
272344 cycles |
272309 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
227025 cycles |
233223 cycles |
0.97 |
ML-DSA-44 sign |
656604 cycles |
673055 cycles |
0.98 |
ML-DSA-44 verify |
226548 cycles |
231292 cycles |
0.98 |
ML-DSA-65 keypair |
399858 cycles |
399604 cycles |
1.00 |
ML-DSA-65 sign |
1093277 cycles |
1092503 cycles |
1.00 |
ML-DSA-65 verify |
382610 cycles |
378979 cycles |
1.01 |
ML-DSA-87 keypair |
668662 cycles |
662585 cycles |
1.01 |
ML-DSA-87 sign |
1457596 cycles |
1442394 cycles |
1.01 |
ML-DSA-87 verify |
632700 cycles |
631363 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
214068 cycles |
213795 cycles |
1.00 |
ML-DSA-44 sign |
781499 cycles |
782133 cycles |
1.00 |
ML-DSA-44 verify |
230065 cycles |
230257 cycles |
1.00 |
ML-DSA-65 keypair |
385084 cycles |
385239 cycles |
1.00 |
ML-DSA-65 sign |
1326386 cycles |
1314084 cycles |
1.01 |
ML-DSA-65 verify |
375339 cycles |
375765 cycles |
1.00 |
ML-DSA-87 keypair |
606587 cycles |
606848 cycles |
1.00 |
ML-DSA-87 sign |
1621233 cycles |
1623082 cycles |
1.00 |
ML-DSA-87 verify |
617288 cycles |
617742 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69198 cycles |
69604 cycles |
0.99 |
ML-DSA-44 sign |
184949 cycles |
187462 cycles |
0.99 |
ML-DSA-44 verify |
69047 cycles |
69269 cycles |
1.00 |
ML-DSA-65 keypair |
120248 cycles |
119917 cycles |
1.00 |
ML-DSA-65 sign |
295658 cycles |
297151 cycles |
0.99 |
ML-DSA-65 verify |
115575 cycles |
115546 cycles |
1.00 |
ML-DSA-87 keypair |
202548 cycles |
202342 cycles |
1.00 |
ML-DSA-87 sign |
386766 cycles |
386965 cycles |
1.00 |
ML-DSA-87 verify |
193569 cycles |
193643 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
57336 cycles |
56956 cycles |
1.01 |
ML-DSA-44 sign |
179376 cycles |
180499 cycles |
0.99 |
ML-DSA-44 verify |
60900 cycles |
61247 cycles |
0.99 |
ML-DSA-65 keypair |
99751 cycles |
99457 cycles |
1.00 |
ML-DSA-65 sign |
296170 cycles |
296461 cycles |
1.00 |
ML-DSA-65 verify |
99941 cycles |
100169 cycles |
1.00 |
ML-DSA-87 keypair |
153766 cycles |
154195 cycles |
1.00 |
ML-DSA-87 sign |
352782 cycles |
352935 cycles |
1.00 |
ML-DSA-87 verify |
152815 cycles |
153194 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
116044 cycles |
116677 cycles |
0.99 |
ML-DSA-44 sign |
377724 cycles |
379707 cycles |
0.99 |
ML-DSA-44 verify |
120646 cycles |
121174 cycles |
1.00 |
ML-DSA-65 keypair |
200451 cycles |
200327 cycles |
1.00 |
ML-DSA-65 sign |
623509 cycles |
623378 cycles |
1.00 |
ML-DSA-65 verify |
198593 cycles |
198489 cycles |
1.00 |
ML-DSA-87 keypair |
328191 cycles |
327340 cycles |
1.00 |
ML-DSA-87 sign |
792035 cycles |
790697 cycles |
1.00 |
ML-DSA-87 verify |
325645 cycles |
324830 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
42462 cycles |
42166 cycles |
1.01 |
ML-DSA-44 sign |
129986 cycles |
130558 cycles |
1.00 |
ML-DSA-44 verify |
44008 cycles |
44242 cycles |
0.99 |
ML-DSA-65 keypair |
72320 cycles |
72946 cycles |
0.99 |
ML-DSA-65 sign |
210845 cycles |
210881 cycles |
1.00 |
ML-DSA-65 verify |
72922 cycles |
72765 cycles |
1.00 |
ML-DSA-87 keypair |
109431 cycles |
111282 cycles |
0.98 |
ML-DSA-87 sign |
248355 cycles |
252306 cycles |
0.98 |
ML-DSA-87 verify |
109568 cycles |
110921 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135055 cycles |
136238 cycles |
0.99 |
ML-DSA-44 sign |
539959 cycles |
544574 cycles |
0.99 |
ML-DSA-44 verify |
148401 cycles |
149362 cycles |
0.99 |
ML-DSA-65 keypair |
228535 cycles |
230475 cycles |
0.99 |
ML-DSA-65 sign |
893053 cycles |
895163 cycles |
1.00 |
ML-DSA-65 verify |
238247 cycles |
239847 cycles |
0.99 |
ML-DSA-87 keypair |
373776 cycles |
376850 cycles |
0.99 |
ML-DSA-87 sign |
1108012 cycles |
1112531 cycles |
1.00 |
ML-DSA-87 verify |
387508 cycles |
389978 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
74308 cycles |
74255 cycles |
1.00 |
ML-DSA-44 sign |
228603 cycles |
228755 cycles |
1.00 |
ML-DSA-44 verify |
78250 cycles |
78127 cycles |
1.00 |
ML-DSA-65 keypair |
130496 cycles |
130420 cycles |
1.00 |
ML-DSA-65 sign |
378316 cycles |
378291 cycles |
1.00 |
ML-DSA-65 verify |
129294 cycles |
129164 cycles |
1.00 |
ML-DSA-87 keypair |
209590 cycles |
211688 cycles |
0.99 |
ML-DSA-87 sign |
479315 cycles |
479661 cycles |
1.00 |
ML-DSA-87 verify |
208641 cycles |
210182 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
70000 cycles |
69864 cycles |
1.00 |
ML-DSA-44 sign |
214918 cycles |
215244 cycles |
1.00 |
ML-DSA-44 verify |
72777 cycles |
72692 cycles |
1.00 |
ML-DSA-65 keypair |
124033 cycles |
123579 cycles |
1.00 |
ML-DSA-65 sign |
353286 cycles |
353468 cycles |
1.00 |
ML-DSA-65 verify |
120824 cycles |
120718 cycles |
1.00 |
ML-DSA-87 keypair |
202214 cycles |
201648 cycles |
1.00 |
ML-DSA-87 sign |
451358 cycles |
451997 cycles |
1.00 |
ML-DSA-87 verify |
198404 cycles |
198649 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
159164 cycles |
157982 cycles |
1.01 |
ML-DSA-44 sign |
569001 cycles |
567777 cycles |
1.00 |
ML-DSA-44 verify |
170753 cycles |
169763 cycles |
1.01 |
ML-DSA-65 keypair |
271354 cycles |
271455 cycles |
1.00 |
ML-DSA-65 sign |
926525 cycles |
925734 cycles |
1.00 |
ML-DSA-65 verify |
275640 cycles |
275498 cycles |
1.00 |
ML-DSA-87 keypair |
451014 cycles |
451543 cycles |
1.00 |
ML-DSA-87 sign |
1182715 cycles |
1183249 cycles |
1.00 |
ML-DSA-87 verify |
460835 cycles |
460624 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120143 cycles |
120533 cycles |
1.00 |
ML-DSA-44 sign |
453777 cycles |
456002 cycles |
1.00 |
ML-DSA-44 verify |
130320 cycles |
132129 cycles |
0.99 |
ML-DSA-65 keypair |
204830 cycles |
207895 cycles |
0.99 |
ML-DSA-65 sign |
732904 cycles |
742729 cycles |
0.99 |
ML-DSA-65 verify |
209363 cycles |
211005 cycles |
0.99 |
ML-DSA-87 keypair |
337665 cycles |
337927 cycles |
1.00 |
ML-DSA-87 sign |
923416 cycles |
923041 cycles |
1.00 |
ML-DSA-87 verify |
344913 cycles |
345844 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
214842 cycles |
214202 cycles |
1.00 |
ML-DSA-44 sign |
782554 cycles |
794999 cycles |
0.98 |
ML-DSA-44 verify |
230631 cycles |
229962 cycles |
1.00 |
ML-DSA-65 keypair |
385817 cycles |
385876 cycles |
1.00 |
ML-DSA-65 sign |
1310148 cycles |
1307768 cycles |
1.00 |
ML-DSA-65 verify |
376009 cycles |
376256 cycles |
1.00 |
ML-DSA-87 keypair |
607294 cycles |
607000 cycles |
1.00 |
ML-DSA-87 sign |
1624685 cycles |
1625772 cycles |
1.00 |
ML-DSA-87 verify |
617770 cycles |
617491 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138816 cycles |
138783 cycles |
1.00 |
ML-DSA-44 sign |
493083 cycles |
493854 cycles |
1.00 |
ML-DSA-44 verify |
148367 cycles |
148389 cycles |
1.00 |
ML-DSA-65 keypair |
242529 cycles |
242264 cycles |
1.00 |
ML-DSA-65 sign |
809972 cycles |
809969 cycles |
1.00 |
ML-DSA-65 verify |
240719 cycles |
240614 cycles |
1.00 |
ML-DSA-87 keypair |
396675 cycles |
396621 cycles |
1.00 |
ML-DSA-87 sign |
1027482 cycles |
1027277 cycles |
1.00 |
ML-DSA-87 verify |
401597 cycles |
401369 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
133144 cycles |
133258 cycles |
1.00 |
ML-DSA-44 sign |
498479 cycles |
498179 cycles |
1.00 |
ML-DSA-44 verify |
144897 cycles |
144918 cycles |
1.00 |
ML-DSA-65 keypair |
227070 cycles |
226755 cycles |
1.00 |
ML-DSA-65 sign |
812705 cycles |
812078 cycles |
1.00 |
ML-DSA-65 verify |
231517 cycles |
231580 cycles |
1.00 |
ML-DSA-87 keypair |
374798 cycles |
375108 cycles |
1.00 |
ML-DSA-87 sign |
1020759 cycles |
1020839 cycles |
1.00 |
ML-DSA-87 verify |
383690 cycles |
383524 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
289514 cycles |
290746 cycles |
1.00 |
ML-DSA-44 sign |
930458 cycles |
937533 cycles |
0.99 |
ML-DSA-44 verify |
291385 cycles |
291943 cycles |
1.00 |
ML-DSA-65 keypair |
491840 cycles |
493090 cycles |
1.00 |
ML-DSA-65 sign |
1538201 cycles |
1526359 cycles |
1.01 |
ML-DSA-65 verify |
477106 cycles |
476058 cycles |
1.00 |
ML-DSA-87 keypair |
833556 cycles |
843754 cycles |
0.99 |
ML-DSA-87 sign |
2048886 cycles |
2088455 cycles |
0.98 |
ML-DSA-87 verify |
813904 cycles |
818519 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
309445 cycles |
304061 cycles |
1.02 |
ML-DSA-44 sign |
1226595 cycles |
1204370 cycles |
1.02 |
ML-DSA-44 verify |
347072 cycles |
331394 cycles |
1.05 |
ML-DSA-65 keypair |
574923 cycles |
577955 cycles |
0.99 |
ML-DSA-65 sign |
2020807 cycles |
1998603 cycles |
1.01 |
ML-DSA-65 verify |
550026 cycles |
552006 cycles |
1.00 |
ML-DSA-87 keypair |
869120 cycles |
870486 cycles |
1.00 |
ML-DSA-87 sign |
2508517 cycles |
2493534 cycles |
1.01 |
ML-DSA-87 verify |
894364 cycles |
896546 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: fa07d1f | Previous: 100e446 | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
347072 cycles |
331394 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
mkannwischer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jammychiou1. I am good with the changes now.
WDYT @hanno-becker?
hanno-becker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM and really helps understanding the code, thank you @jammychiou1. A few smaller change requests, see comments.
db2e5c8 to
53abde1
Compare
53abde1 to
d96ec5d
Compare
|
Thank you @hanno-becker for your suggestions (and your wonderful work on #659)! Please let me know if there's more things to change. If not, I'll clean up the commit history to prepare for the merge into main. |
The new approach is adapted from our Neon implementation. See <#411 (comment)> for more information on the idea. Bounds reasoning comments are also added. Signed-off-by: jammychiou1 <[email protected]>
Edit some comments while we're at it. Signed-off-by: jammychiou1 <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
d96ec5d to
b79682b
Compare
hanno-becker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you @jammychiou1
This approach was suggested by Hanno Becker in #411 (comment), when we implemented the same function in AArch64.
The speedup is barely noticeable, less than 50 cycles per call on my laptop.