-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
blackwellSM100/SM120SM100/SM120
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
Generally, I believe it used to work. Also, it is supported by the kernel itself
python3 -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--tp 8 \
--kv-cache-dtype fp8_e4m3 \
--attention-backend trtllm_mha \
File "/sgl-workspace/sglang/python/sglang/srt/layers/radix_attention.py", line 108, in forward
return forward_batch.attn_backend.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/base_attn_backend.py", line 82, in forward
return self.forward_extend(
^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/trtllm_mha_backend.py", line 612, in forward_extend
o = flashinfer.prefill.trtllm_batch_context_with_kv_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/flashinfer/prefill.py", line 3469, in trtllm_batch_context_with_kv_cache
run_func(
File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
return self._op(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Error in function 'trtllm_paged_attention_launcher' at /usr/local/lib/python3.12/dist-packages/flashinfer/data/csrc/trtllm_fmha_kernel_launcher.cu:172: Missing TRTLLM-GEN kernel (context): qkvLayout=2, maskType=1, kernelType=0, tileScheduler=1, multiCtasKvMode=0, headDimPerCtaV=128, headDimQk=128, headDimV=128, tileSizeKv=128, numTokensPerPage=64, maxNumHeadsQPerKvInCta=1, reuseSmemKForV=0, uses2CtaMma=0
Reproduction
Cmd is given
Environment
Main
Metadata
Metadata
Assignees
Labels
blackwellSM100/SM120SM100/SM120