Release v0.4.5 #5119
ispobock
announced in
Announcements
Release v0.4.5
#5119
Replies: 1 comment
-
|
[BREAKING CHANGES] Quantization support requires a separate installation of |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Highlights
The SGLang team is excited to the release of v0.4.5! This version introduces several significant features, including Llama 4 support, FlashAttention 3 backend, EAGLE3 speculative decoding, DeepEP integration, and disaggregated prefill and decoding.
New Features
Llama 4 Support: We supported Llama 4 model with accuracy matching official benchmark numbers, achieving a zero-shot score of 75.2 on the MMLU Pro dataset for
Llama-4-Scout-17B-16E-Instructmodel and 80.7 forLlama-4-Maverick-17B-128E-Instructmodel.FlashAttention 3 Backend: Our implementation of the FlashAttention 3 backend delivers significant acceleration for long-context tasks.
EAGLE3 Speculative Decoding: We’re proud to be the first to support EAGLE3 speculative decoding, offering substantial gains in decoding throughput. Learn more in our documentation and the EAGLE3 paper.
DeepEP Integration: By incorporating DeepEP, we enhanced performance for MoE inference.
Disaggregated Prefill and Decoding: We introduced a prototype for disaggregated prefill and decoding, with plans for further optimizations.
Thanks very much to the NVIDIA team, LinkedIn team, EAGLE team, Oracle team, Meituan team, and our incredible open-source community for their invaluable contributions!
Coming Soon
Disaggregated Prefill and Decoding: [Roadmap] Prefill and Decoding Disaggregation #4655
Llama 4 Optimization: [Roadmap] Llama 4 Support #5118
EP Enhancement: [Roadmap] EP Enhancement #4734
FA3 Enhancement: [Roadmap] FlashAttention3 Support as SGLang Attention Backend #4709
We’re thrilled about these advancements and eager to hear your feedback! Join us on our Slack channel at slack.sglang.ai to connect and share your thoughts. Cheers!
Beta Was this translation helpful? Give feedback.
All reactions