-
Notifications
You must be signed in to change notification settings - Fork 84
[ROCm] add Qwen3 235B recipe #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @gbyu-amd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational recipe for running the Qwen3-235B-A22B model on ROCm-enabled hardware using vLLM. It provides a complete guide from environment setup to model serving and initial performance/accuracy testing, serving as a baseline while acknowledging ongoing optimization efforts for improved efficiency and broader parallelism support. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new recipe for running Qwen3-235B-A22B on ROCm platforms. The documentation is a good starting point, but it contains several typos and critically incorrect shell commands that will fail. I have provided specific suggestions to correct these issues to ensure the recipe is usable.
2c16f57 to
9dec9ee
Compare
Signed-off-by: Guanbao Yu <[email protected]>
Signed-off-by: Guanbao Yu <[email protected]>
9dec9ee to
5f6040e
Compare
This pr adds a recipe for Qwen3-235B-A22B running on ROCm platforms.
The recipe is subject to changes as some optimizations are still on the way:
fused_qknorm_rope_kernelrocm compatibility vllm#28500 enables q_norm + k_norm + rope fusion on ROCm platforms, which was initially implemented for cuda in [Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model vllm#27165. PR already merged to vLLM main!allreduce+rmsnormfusion kernel ROCm/vllm#803.The current recipe only provides TP8+EP8 deployment as an example. We will try other parallel strategies for best performance across different scenarios.