Prototype: Add support for fp16 iGEMM with SME2 #8687

gmiodice · 2025-07-08T20:46:36Z

Initial prototype to enable fp16 iGEMM with SME2 in conv2d

- Initial prototype to enable fp16 iGEMM with SME2 in conv2d Signed-off-by: Gian Marco Iodice <[email protected]>

Signed-off-by: Gian Marco Iodice <[email protected]>

fbarchard · 2025-07-09T18:53:15Z

src/x16-pack-lh/x16-packlh-igemm-neonsme2.c

+#if XNN_ENABLE_KLEIDIAI
+  assert(kr == 2);
+
+  return kai_get_lhs_packed_offset_lhs_imatmul_pack_x16p2vlx2_x16p_sme(


This appears to be an SME kernel. SME2 is not required?
If so we have emulator support for SME that could test this function with sme, but not SME2.
Is there a unittest we can run for x16-pack-lh microkernels?

This is the KleidiAI naming convention, so not something that needs to be addressed here.

fbarchard · 2025-07-09T18:55:57Z

src/x16-pack-lh/x16-packlh-igemm-neonsme2.c

+#endif  // XNN_ENABLE_KLEIDIAI
+}
+
+size_t xnn_x16_pack_lh_offset__igemm_neonsme2(size_t m, size_t kc, size_t ks,


neon should be removed from these function/file isa names. in this case, the function it calls is _sme so the cpuinfo should detect sme (1.0) and typically the compiler would need to support sme, but not sme2, to build microkernels.

This is OK as long as it follows the naming convention for the other pack-lh kernels, which it does.

fbarchard · 2025-07-09T19:00:40Z

src/reference/packing.cc

  }
 }

+void transpose_weights_x16(const uint16_t* in, uint16_t* out, size_t height,


is this 'x16' meant to be 16 bits per element? if so it should be at the start of function/file name
x16_transpose_weights()
unless you're proposing a public api for transpose, this should be static or inlined into the calling function.

I think this function is meant to be internal-only.

Yeah seemed that was the intention, so have made it static

src/reference/packing.cc

fbarchard · 2025-07-09T19:02:33Z

src/subgraph.c


      case xnn_node_type_convolution_2d:
      case xnn_node_type_deconvolution_2d: {
+


remove blank line

Please run clang-format using this project's config on all modified files.

clang format was ran against all the files in the latest push

fbarchard · 2025-07-09T19:10:03Z

src/x16-pack-lh/x16-packlh-igemm-neonsme2.c

+                                             size_t mr_packed, size_t kr,
+                                             size_t sr) {
+#if XNN_ENABLE_KLEIDIAI
+  assert(kr == 2);


the kai function appears to be for x16 (NR=16)?
I see kai struggles with the over use of 'x' just like XNNPack :-)
one of the x16 appears to be bits per element. In XNNPack that goes first in the name.
The x in pack functions is from MRxNR but MR is not applicable, so x16 is NR=16
We used to also have x2 means unrolled by 2, but renamed that to u2.
Anyway, the kai name implies a specific tiling.
In packw microkernels it is the same and the function/file are specific to a set of parameters. NR, KR, SR
The function is specialized and expected to fail if called with anything else.
If kai is the same, the assert for KR=2 (ah x2. We would call that c2).. the assert should check all applicable parameters, but especially NR.

KleidiAI has a very extensive documentation of their type naming conventions on their GitLab/GitHub pages.

gonnet

Sorry for the delay in reviewing this!

I think it's really important to get all the microkernel and subgraph tests hooked up to make sure that they all pass.

gonnet · 2025-08-19T12:59:08Z

src/configs/pack-lh-config.c

+  x16_igemm_pack_lh_config.log2_input_element_size = 0;
+  x16_igemm_pack_lh_config.log2_packed_element_size = 0;


These values should be 1 since fp16 is 1 << 1 bytes.

Updated to the correct value

gonnet · 2025-08-19T13:05:47Z

src/operators/convolution-nhwc.c

  return status;
 }

+enum xnn_status xnn_create_convolution2d_nhwc_pf16(


This function and xnn_create_convolution2d_nhwc_f16 (below it) differ only in the gemm_config and operator_type.

Could you create a single create_convolution2d_nhwc_f16 that takes the gemm_config and operator_type as an argument, and have this function and xnn_create_convolution2d_nhwc_f16 call it?

Single common creation function create_convolution2d_nhwc_f16 which takes the gemm config and op type as you have suggested

gonnet · 2025-08-19T13:08:42Z

src/pf16-f16-f16-igemm/pf16-f16-f16-igemm-32x32c2-minmax-neonsme2.c

+      "`XNN_ENABLE_KLEIDIAI`." &&
+      0);
+#endif  // XNN_ENABLE_KLEIDIAI
+}


Please add a final newline.

New line added in latest push

src/pf16-f16-f16-igemm/pf16-f16-f16-igemm-32x32c2-minmax-neonsme2.c

gonnet · 2025-08-19T13:46:22Z

src/reference/packing.cc

  }
 }

+void transpose_weights_x16(const uint16_t* in, uint16_t* out, size_t height,


I think this function is meant to be internal-only.

gonnet · 2025-08-19T14:09:19Z

src/x16-pack-lh/x16-packlh-igemm-neonsme2.c

+#if XNN_ENABLE_KLEIDIAI
+  assert(kr == 2);
+
+  return kai_get_lhs_packed_offset_lhs_imatmul_pack_x16p2vlx2_x16p_sme(


This is the KleidiAI naming convention, so not something that needs to be addressed here.

gonnet · 2025-08-19T14:10:06Z

src/xnnpack/gemm.h

    xnn_pqs8_qc8w_igemm_minmax_fp32_ukernel_32x32c4__neonsme2)

+
+#define DECLARE_PF16_F16_PACKED_IGEMM_MINMAX_UKERNEL_FUNCTION(fn_name)       \


Please fix formatting.

gonnet · 2025-08-19T14:12:03Z

src/xnnpack/pack.h

    size_t k_stride,                            //
    size_t extra_bytes);

+XNN_INTERNAL void xnn_pack_kai_f16_conv_goki_w_sme2(


Please fix formatting.

gonnet · 2025-08-19T14:13:17Z

test/gemm-microkernel-tester.cc

+  xnnpack::Buffer<xnn_float16> c_ref(m() * n(), 0, /*extra_bytes=*/{0},
+                                     "c_ref");
+
+#if 0


Does this test pass?

gonnet · 2025-08-19T14:17:31Z

test/gemm-microkernel-tester.h

                 xnn_pack_weights_and_biases_fn pack,
                 xnn_packed_stride_weights_and_biases_fn packed_stride);

+void Test_PF16(xnn_packed_f16_lhs_igemm_ukernel_fn packed_igemm,


For this function to be called, you need to add your new kernel to the corresponding .yaml files and likely also add a few lines to the test generator script (see e.g. XNNPACK/test/qp8-f32-qc8w-gemm-minmax.cc).

Added the yml files and updated tests, they should now run and pass

Signed-off-by: Gian Marco Iodice <[email protected]>

dsharlet · 2025-09-22T21:32:40Z

@gmiodice It looks like this needs conflicts resolved

@gonnet @fbarchard can you please follow up? Your comments have been responded to.

gonnet · 2025-09-29T11:31:37Z

test/BUILD.bazel

+xnnpack_unit_test(
+    name = "pf16_f16_igemm_minmax_test",
+    srcs = [
+        "pf16-f16-igemm-minmax.cc",


This also needs to be added to the corresponding CMakeLists.txt file.

gonnet · 2025-09-29T11:39:57Z

src/reference/packing.cc

+}
+
+size_t xnn_packed_size_kai_f16_conv_goki_w(size_t nc, size_t ks, size_t kc) {
+#if XNN_ENABLE_KLEIDIAI


I think you are already inside an #if XNN_ENABLE_KLEIDIAI, so the #else below is not needed.

True, removed the additional #if

gonnet · 2025-09-29T11:44:03Z

src/reference/packing.cc

+
+  for (size_t g_idx = 0; g_idx < g; ++g_idx) {
+    transpose_weights_x16(k, tmp_data, nc, ks * kc);
+    kai_run_rhs_imatmul_pack_kxn_x16p2vlx2b_x16_x16_sme(


The answer to both really would be to have a new function variant which would require the kleidi team to implement new function versions and provide them in a release. The majority of their function signatures are the similar in terms of the bias. So I suspect they would hesitate to remove the bias.

Would it be sufficient for the function to handle b == NULL as if it were full of zeros? That would not require a change in the API?

But as for the transpose this variant of the pack kernel is kxn if we had a nxk version we could remove the manual transpose, as for right now though we will need it.

OK, could you add a TODO to remind us? Thanks!

JonathanC-ARM · 2025-10-20T08:32:07Z

Hi @gonnet, I have picked up this work from @gmiodice. I've created a new PR with the updates requested above in the latest commit #9005

gmiodice added 2 commits July 8, 2025 16:10

Prototype: Add support for fp16 iGEMM with SME2

dc09d63

- Initial prototype to enable fp16 iGEMM with SME2 in conv2d Signed-off-by: Gian Marco Iodice <[email protected]>

Include missing files

4ee40b9

Signed-off-by: Gian Marco Iodice <[email protected]>

fbarchard reviewed Jul 9, 2025

View reviewed changes

src/reference/packing.cc Show resolved Hide resolved

fbarchard reviewed Jul 9, 2025

View reviewed changes

gonnet self-assigned this Aug 19, 2025

gonnet requested changes Aug 19, 2025

View reviewed changes

Update FP16 iGEMM based on review comments

ac24901

Signed-off-by: Gian Marco Iodice <[email protected]>

gonnet requested changes Sep 29, 2025

View reviewed changes

JonathanC-ARM mentioned this pull request Oct 20, 2025

Add support for fp16 iGEMM with SME2 #9005

Open


		case xnn_node_type_convolution_2d:
		case xnn_node_type_deconvolution_2d: {

		x16_igemm_pack_lh_config.log2_input_element_size = 0;
		x16_igemm_pack_lh_config.log2_packed_element_size = 0;

		xnn_pqs8_qc8w_igemm_minmax_fp32_ukernel_32x32c4__neonsme2)


		#define DECLARE_PF16_F16_PACKED_IGEMM_MINMAX_UKERNEL_FUNCTION(fn_name) \

Prototype: Add support for fp16 iGEMM with SME2 #8687

Are you sure you want to change the base?

Prototype: Add support for fp16 iGEMM with SME2 #8687

Conversation

gmiodice commented Jul 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gonnet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsharlet commented Sep 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonathanC-ARM commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!