Skip to content

Conversation

@DrRyanHuang
Copy link
Collaborator

@DrRyanHuang DrRyanHuang commented Nov 3, 2025

Motivation

支持 Deepseek V3 SOT 动转静 + 开启 CUDAGraph

Modifications

修改 Deepseek V3 组网,将两个 if 分支移出到新函数,并添加装饰器 @paddle.jit.marker.capture_control_flow 进行 AST 局部转静,前置PR:

Usage or Command

None

Accuracy Tests

None

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 3, 2025

Thanks for your contribution!


def forward(
@paddle.jit.marker.capture_control_flow
def prefill_or_decode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确定要叫这个名字么?

Copy link
Collaborator Author

@DrRyanHuang DrRyanHuang Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chang-wenbin 这个改成什么名字合适一些呢?

你在 prefill 和 decode 之间,选择了 or ? 🤪

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议就改成mla_attention吧;
但是这个改动后和之前代码的区别是啥呀,这样不会有控制流吗?

Copy link
Collaborator Author

@DrRyanHuang DrRyanHuang Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样是将 mla_attention 函数做AST局部转静,这样 if 控制流语句就可以转化为if算子(SOT转静不支持Tensor做if的条件),这样做 Deepseek V3 就可以整图转静

下图是IR图中的if算子,红框内部的是 mla_attention 中的两个 if Op,两个红框之间的是 CUDAGraph Op

d44554f238c8e08f2c88c41084ac3e85

@DrRyanHuang DrRyanHuang changed the title [PD Disaggregation] Support deepseekV3 sot dy2st [Graph Optimization] Support deepseekV3 sot dy2st Nov 4, 2025
@DrRyanHuang DrRyanHuang changed the title [Graph Optimization] Support deepseekV3 sot dy2st [Graph Optimization] Support deepseekV3 SOT Dy2St && CUDAGraph Nov 5, 2025

def forward(
@paddle.jit.marker.capture_control_flow
def prefill_or_decode(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议就改成mla_attention吧;
但是这个改动后和之前代码的区别是啥呀,这样不会有控制流吗?

self.block_size,
self.speculate_max_draft_token_num + 1,
)
forward_meta.needs_prefill = forward_meta.max_len_tensor_cpu[1] > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个参数能否直接用之前的逻辑?

Copy link
Collaborator Author

@DrRyanHuang DrRyanHuang Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果 不开CUDAGraph,只开SOT转静 可以复用之前的逻辑
开CUDAGraph+SOT转静 不能复用之前的逻辑,原因如下:

之前的代码是:

if forward_meta.max_len_tensor_cpu[0]:
	...

这里包含了两个隐形操作:

  • 对这个 CPU 的 int Tensor max_len_tensor_cpu 做索引操作
  • max_len_tensor_cpu[0] 这个 CPU 的 int Scalar,cast 为 bool Scalar

这两个操作都是 CPU Kernel 的操作,由于是 if op 之前的算子,所以这俩算子会变成 CUDAGraph OP 的子 OP
由于是 CPU Kernel 所以无法被 CUDAGraph Capture 到

这样会导致 max_len_tensor_cpu[0] == 0 时,依然会进入 prefill 的分支

改成这样之后:

forward_meta.needs_prefill = forward_meta.max_len_tensor_cpu[1] > 0

if forward_meta.needs_prefill:
	....

保证 if op 能选择正确的分支

forward_meta=forward_meta,
)

fmha_out_prefill = fmha_out_prefill.reshape([-1, self.num_attention_heads_tp, self.qk_head_dim])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里check过输出shape已经是[-1, self.num_attention_heads_tp, self.qk_head_dim]了吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,这里的 querykeyfmha_out_prefill 都是同 shape
都是 [bs, self.num_attention_heads_tp, self.qk_head_dim]

raise RuntimeError("CUDAGraph full graph capture is not supported due to the presence of control flow.")
else:
flag = "FLAGS_cuda_graph_blacklist"
paddle.set_flags({flag: ",".join(list(set(paddle.get_flags(flag)[flag].split(",") + ["pd_op.if"])))})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里仍然需要在设置 full_cuda_graph=false 的时候手动在外面设置 FLAGS_cuda_graph_blacklist 么?用户可以做到只设置 full_cuda_graph=false 就可以跑么?

Copy link
Collaborator Author

@DrRyanHuang DrRyanHuang Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在这个就是只要设置 full_cuda_graph=false 就行,默认会有 pd_op.if

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于其他模型应该不对吧,应该不是 pd_op.if

Copy link
Collaborator Author

@DrRyanHuang DrRyanHuang Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是 MLA 的后端,里面有 if 语句,目前只给 Deepseek V3 用,其他 Attention 后端不一定有 if 语句

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其他模型目前直接开启 full_cuda_graph=false 是不能做到直接能跑的是吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能跑,大部分模型这个参数都不生效,只有 append attention 后端(ERNIE4.5Turbo)会受影响
后面单独提 PR 给 append attention 后端加上这个 paddle.set/get flag

SigureMo
SigureMo previously approved these changes Nov 6, 2025
gongshaotian
gongshaotian previously approved these changes Nov 6, 2025
Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants