feat: add wrappers for ATB and ACLNN fused operators. #474

yingxudeng · 2025-12-02T15:06:11Z

No description provided.

XuZhang99 · 2025-12-02T15:14:56Z

xllm/core/kernels/ops_api.cpp

 #elif defined(USE_CUDA)
  cuda::act_and_mul(params.output, params.input, params.act_mode);
 #else
  LOG(FATAL) << "active not implemented";


remove torch::Tensor active_tensor(ActivationParams& params) and add params.output = npu::active(params.input, params.act_mode) here for npu device.

auto output = torch::empty( {batch_size, intermediate_size_ / parallel_args_.tp_group_->world_size()}, gate_up.options());

This is a good modification. However, as described, the current code's output still allocates space preemptively. For NPU operators, they typically allocate their own space and return the result. This unavoidable difference still forces the external calling code to use an #if block to skip space allocation specifically for the NPU case.

To standardize the external calling code, I personally recommend aligning with the NPU's behavior: allocate the space within the operator wrapper/layer and then return it. This approach allows for a unified code structure for all external calls.

so don't add active_tensor and fused_layernorm_tensor these two func in ops_api.h, because no other platform will use such api.
put they in npu_ops_api.h and call them directly in npu layer.

Regarding the code snippet above: if we implement the changes as suggested, we would need to introduce #if directives here to skip memory allocation, since the NPU operator handles this internally.
Could we instead consider moving the memory allocation logic for MLU and CUDA into their respective kernel wrappers? This would make the behavior more similar to PyTorch and allow us to unify the calling code here.

(PS: I haven't modified the CUDA or MLU code yet.)

XuZhang99 · 2025-12-02T15:15:25Z

xllm/core/kernels/ops_api.cpp

 #endif
 }

+torch::Tensor fused_layernorm_tensor(FusedLayerNormParams& params) {


same as above

Similar to the previous comment.

XuZhang99 · 2025-12-02T15:16:08Z

xllm/core/kernels/param.h

  // Must be less than or equal to rope_seqlen if not using discrete
  // position_ids.
  int64_t max_query_len;
+  torch::Tensor positions;


std::optional<torch::Tensor> position_ids already exists.

During the implementation, I noticed that position_ids are set to empty during the prefill stage, so I initially added position. However, I see that the latest CUDA code addresses the same issue using a different approach. To ensure consistency, I plan to align my implementation with the CUDA method.

yingxudeng requested review from liutongxuan and yq33victor December 2, 2025 15:06

XuZhang99 reviewed Dec 2, 2025

View reviewed changes

feat: add wrappers for ATB and ACLNN fused operators.

9711f41

yingxudeng force-pushed the feat/npu_backend_torch_2_kernels branch from 7485463 to 9711f41 Compare December 2, 2025 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add wrappers for ATB and ACLNN fused operators. #474

feat: add wrappers for ATB and ACLNN fused operators. #474

yingxudeng commented Dec 2, 2025

Uh oh!

XuZhang99 Dec 2, 2025

Uh oh!

yingxudeng Dec 2, 2025

Uh oh!

XuZhang99 Dec 3, 2025

Uh oh!

yingxudeng Dec 3, 2025

Uh oh!

XuZhang99 Dec 2, 2025

Uh oh!

yingxudeng Dec 2, 2025 •

edited

Loading

Uh oh!

XuZhang99 Dec 2, 2025

Uh oh!

yingxudeng Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add wrappers for ATB and ACLNN fused operators. #474

Are you sure you want to change the base?

feat: add wrappers for ATB and ACLNN fused operators. #474

Conversation

yingxudeng commented Dec 2, 2025

Uh oh!

XuZhang99 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yingxudeng Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

yingxudeng Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yingxudeng Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yingxudeng Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yingxudeng Dec 2, 2025 •

edited

Loading