Remark differences between REF_AVX2's approach for decompose and ours

jammychiou1 · jammychiou1 · commit dfacb88d0dbe · 2025-11-08T15:30:46.000+08:00
Signed-off-by: jammychiou1 &lt;jammy.chiou1@gmail.com&gt;
diff --git a/dev/x86_64/src/poly_decompose_32_avx2.c b/dev/x86_64/src/poly_decompose_32_avx2.c
@@ -86,6 +86,18 @@ void mld_poly_decompose_32_avx2(__m256i *a1, __m256i *a0, const __m256i *a)
      * If f1 = 16, i.e. f > 31*GAMMA2, proceed as if f' = f - Q was given
      * instead. (For f = 31*GAMMA2 + 1 thus f' = -GAMMA2, we still round it to 0
      * like other "wrapped around" cases.)
+     *
+     * Reference: They handle wrap-around in a somewhat convoluted way. Most
+     *            notably, they compute remainder f0 with quotient f1 that's
+     *            already wrapped around, so is off by q (instead of by 1) from
+     *            what it should be ultimately. They detect the need for
+     *            correction by checking if f0 is abnormally large.
+     *
+     *            Our approach is closer to Algorithm 36 in the specification,
+     *            in that we compute f0 normally and correct f1, f0 in the way
+     *            they prescribed. The only real difference is that we check for
+     *            wrap-around by examining f directly, instead of some other
+     *            intermidiates computed from it.
      */
 
     /* Check for wrap-around */
diff --git a/dev/x86_64/src/poly_decompose_88_avx2.c b/dev/x86_64/src/poly_decompose_88_avx2.c
@@ -87,6 +87,18 @@ void mld_poly_decompose_88_avx2(__m256i *a1, __m256i *a0, const __m256i *a)
      * If f1 = 44, i.e. f > 87*GAMMA2, proceed as if f' = f - Q was given
      * instead. (For f = 87*GAMMA2 + 1 thus f' = -GAMMA2, we still round it to 0
      * like other "wrapped around" cases.)
+     *
+     * Reference: They handle wrap-around in a somewhat convoluted way. Most
+     *            notably, they compute remainder f0 with quotient f1 that's
+     *            already wrapped around, so is off by q (instead of by 1) from
+     *            what it should be ultimately. They detect the need for
+     *            correction by checking if f0 is abnormally large.
+     *
+     *            Our approach is closer to Algorithm 36 in the specification,
+     *            in that we compute f0 normally and correct f1, f0 in the way
+     *            they prescribed. The only real difference is that we check for
+     *            wrap-around by examining f directly, instead of some other
+     *            intermidiates computed from it.
      */
 
     /* Check for wrap-around */