Revert "fix llama4 kv cache layout" #12437

b8zhong · 2025-10-31T05:29:38Z

Because #12307 is actually the correct solution

Update FP8 column for trtllm mha.

…-project#12347)" This reverts commit bacb382.

gemini-code-assist · 2025-10-31T05:29:41Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Fridge003

Cool

Copilot

Pull Request Overview

This PR removes a temporary workaround that was forcing bfloat16 KV cache for Llama4 models using the TRTLLM MHA backend on SM100 hardware, now that FP8 KV cache support is available.

Removes automatic downgrade from FP8 to bfloat16 for Llama4 with trtllm_mha backend
Updates documentation to reflect FP8 KV Cache support in TRTLLM MHA backend

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
python/sglang/srt/server_args.py	Removes workaround that forced kv_cache_dtype to bfloat16 for Llama4 with trtllm_mha on SM100
docs/advanced_features/attention_backend.md	Updates support matrix to indicate TRTLLM MHA now supports FP8 KV Cache

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

b8zhong added 2 commits October 30, 2025 01:22

Revert "fix: llama 4 + trtllm gen + fp8 kv cache incompatibility (sgl…

4b70ae8

…-project#12347)" This reverts commit bacb382.

more

9bdb061

b8zhong requested a review from Copilot October 31, 2025 05:30

Fridge003 approved these changes Oct 31, 2025

View reviewed changes

Fridge003 merged commit a076ec1 into sgl-project:main Oct 31, 2025
6 checks passed

Copilot AI reviewed Oct 31, 2025

View reviewed changes

b8zhong deleted the revert-12347-fix-llama4-kv-cache-layout branch October 31, 2025 05:35

mingfeima pushed a commit to mingfeima/sglang that referenced this pull request Nov 6, 2025

Revert "fix llama4 kv cache layout" (sgl-project#12437)

bee7136

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "fix llama4 kv cache layout" #12437

Revert "fix llama4 kv cache layout" #12437

Uh oh!

b8zhong commented Oct 31, 2025

Uh oh!

gemini-code-assist bot commented Oct 31, 2025

Uh oh!

Fridge003 left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Revert "fix llama4 kv cache layout" #12437

Revert "fix llama4 kv cache layout" #12437

Uh oh!

Conversation

b8zhong commented Oct 31, 2025

Uh oh!

gemini-code-assist bot commented Oct 31, 2025

Uh oh!

Fridge003 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants