[Bug] TRT-LLM gen MHA + FP8 KV cache issue

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

Generally, I believe it used to work. Also, it is supported by the kernel itself

```
python3 -m sglang.launch_server \
  --model-path meta-llama/Llama-3.1-8B-Instruct \
  --tp 8 \
  --kv-cache-dtype fp8_e4m3 \
  --attention-backend trtllm_mha \
```

```
  File "/sgl-workspace/sglang/python/sglang/srt/layers/radix_attention.py", line 108, in forward
    return forward_batch.attn_backend.forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/base_attn_backend.py", line 82, in forward
    return self.forward_extend(
           ^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/trtllm_mha_backend.py", line 612, in forward_extend
    o = flashinfer.prefill.trtllm_batch_context_with_kv_cache(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/flashinfer/prefill.py", line 3469, in trtllm_batch_context_with_kv_cache
    run_func(
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Error in function 'trtllm_paged_attention_launcher' at /usr/local/lib/python3.12/dist-packages/flashinfer/data/csrc/trtllm_fmha_kernel_launcher.cu:172: Missing TRTLLM-GEN kernel (context): qkvLayout=2, maskType=1, kernelType=0, tileScheduler=1, multiCtasKvMode=0, headDimPerCtaV=128, headDimQk=128, headDimV=128, tileSizeKv=128, numTokensPerPage=64, maxNumHeadsQPerKvInCta=1, reuseSmemKForV=0, uses2CtaMma=0
```

### Reproduction

Cmd is given

### Environment

Main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] TRT-LLM gen MHA + FP8 KV cache issue #12372

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] TRT-LLM gen MHA + FP8 KV cache issue #12372

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions