Skip to content

Commit dfacb88

Browse files
committed
Remark differences between REF_AVX2's approach for decompose and ours
Signed-off-by: jammychiou1 <[email protected]>
1 parent 2335f16 commit dfacb88

File tree

2 files changed

+24
-0
lines changed

2 files changed

+24
-0
lines changed

dev/x86_64/src/poly_decompose_32_avx2.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,18 @@ void mld_poly_decompose_32_avx2(__m256i *a1, __m256i *a0, const __m256i *a)
8686
* If f1 = 16, i.e. f > 31*GAMMA2, proceed as if f' = f - Q was given
8787
* instead. (For f = 31*GAMMA2 + 1 thus f' = -GAMMA2, we still round it to 0
8888
* like other "wrapped around" cases.)
89+
*
90+
* Reference: They handle wrap-around in a somewhat convoluted way. Most
91+
* notably, they compute remainder f0 with quotient f1 that's
92+
* already wrapped around, so is off by q (instead of by 1) from
93+
* what it should be ultimately. They detect the need for
94+
* correction by checking if f0 is abnormally large.
95+
*
96+
* Our approach is closer to Algorithm 36 in the specification,
97+
* in that we compute f0 normally and correct f1, f0 in the way
98+
* they prescribed. The only real difference is that we check for
99+
* wrap-around by examining f directly, instead of some other
100+
* intermidiates computed from it.
89101
*/
90102

91103
/* Check for wrap-around */

dev/x86_64/src/poly_decompose_88_avx2.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,18 @@ void mld_poly_decompose_88_avx2(__m256i *a1, __m256i *a0, const __m256i *a)
8787
* If f1 = 44, i.e. f > 87*GAMMA2, proceed as if f' = f - Q was given
8888
* instead. (For f = 87*GAMMA2 + 1 thus f' = -GAMMA2, we still round it to 0
8989
* like other "wrapped around" cases.)
90+
*
91+
* Reference: They handle wrap-around in a somewhat convoluted way. Most
92+
* notably, they compute remainder f0 with quotient f1 that's
93+
* already wrapped around, so is off by q (instead of by 1) from
94+
* what it should be ultimately. They detect the need for
95+
* correction by checking if f0 is abnormally large.
96+
*
97+
* Our approach is closer to Algorithm 36 in the specification,
98+
* in that we compute f0 normally and correct f1, f0 in the way
99+
* they prescribed. The only real difference is that we check for
100+
* wrap-around by examining f directly, instead of some other
101+
* intermidiates computed from it.
90102
*/
91103

92104
/* Check for wrap-around */

0 commit comments

Comments
 (0)