Skip to content

Conversation

@1145284121
Copy link
Contributor

@1145284121 1145284121 commented Sep 6, 2025

On sm90 architecture, sparge_mask_convert() repeats mask along query_idx dimension. This fix corrects the mask_id indexing in block_sparse_sage2_attn_cuda().

Before fix, the query's mask_id in block_sparse_sage2_attn_cuda() is wrong ,error occors:

sp-radial-attention/radial_attn/attn_mask.py", line 363, in RadialAttention return SpargeSageAttnBackend(query, key, value, mask_map, video_mask, pre_defined_mask, block_size=block_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "sp-radial-attention/radial_attn/attn_mask.py", line 276, in SpargeSageAttnBackend k=key[:pre_defined_mask[0].sum(), :, :], ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

After Fix , HunyuanVideo + Radial + SageAttention works successfully.

hunyuan_radial_sage_sp4.mp4

But Wan2.1 (no text tokens in attention)produces higher quality results:

wan_radial_sp4_sage.mp4

Limitations

  • Generated video is still in low quanity when using sageattention in hunyuanvideo, compared with Wan2.1/Wan2.2.This is likely because the block mask granularity of 128 causes many text_padding key/value tokens participate attention computation.
  • Optimal solution requires separating key_video/key_text processing (like FlashInferBackend), But this needs block_sparse_sage2_attn_cuda() to return lse like flashinfer.single_prefill_with_kv_cache().
  • Another approach would be for block_sparse_sage2_attn_cuda() to support a 'last_page' parameter, similar to FlashInfer's BSR sparse mask representation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant