-
Notifications
You must be signed in to change notification settings - Fork 281
feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. #593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. #593
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
@szedan-rh i have a better idea to solve this in short term, in mid term, we need to wait for #486 |
|
Hi @Xunzhuo , Thanks for your comment, could you please share your idea? |
|
i guess this is to support to configure k8s service as backend, right? like configuring service namespace/name |
|
Nope. actually it's just changing how we was deploying the semantic routers pods, it was creating just for the demo that Yossi did in PyTorch conference. |
ae2bfec to
a28833a
Compare
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
8cc8bdc to
41d72ab
Compare
|
@Xunzhuo - could you please take another look? |
b81d806 to
a68b2df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm_endpoint is for testing/PoC in local/docker-compose, i suggest looking and testing https://vllm-semantic-router.com/docs/installation/k8s/ai-gateway
With this we dont need to care about how to manage backend ip or service name, i just select the model, the model can be an in-cluster and also can be an external LLM Provider.
not a blocker for this PR, plz fix the lint and i will get this in. thanks
5aa352f to
cdd53e6
Compare
cdd53e6 to
d317d99
Compare
|
Thank you for sharing @Xunzhuo , will check that for sure. |
d317d99 to
fced99f
Compare
…lint The pre-commit yaml-and-yml-fmt hook was incorrectly calling 'make markdown-lint' instead of 'make yaml-lint', causing YAML files to not be properly linted locally while GitHub Actions CI would catch the issues. This led to confusion where PRs would fail in CI even though 'pre-commit run --all-files' passed locally, forcing contributors to fix pre-existing YAML linting issues in files they never touched (e.g., PR vllm-project#593). Fixes vllm-project#608 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
|
Hi @szedan-rh , @Xunzhuo , i looked into it ( the lint issues ) , it seems we had a bug in the the chicken and egg would be resolved once this PR will be merged ( then the #609) pre-commit test will pass. |
…lint (#609) The pre-commit yaml-and-yml-fmt hook was incorrectly calling 'make markdown-lint' instead of 'make yaml-lint', causing YAML files to not be properly linted locally while GitHub Actions CI would catch the issues. This led to confusion where PRs would fail in CI even though 'pre-commit run --all-files' passed locally, forcing contributors to fix pre-existing YAML linting issues in files they never touched (e.g., PR #593). Fixes #608 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>
This commit consolidates the OpenShift deployment into a single unified script with automatic ClusterIP discovery for cross-cluster portability. Changes: - Enhanced deploy-to-openshift.sh with dynamic IP discovery - Auto-discovers vLLM service ClusterIPs at deployment time - Generates configuration with actual IPs (portable across clusters) - Fallback sed replacement for robustness - Updated deployment.yaml for split architecture - Separate pods for vllm-model-a, vllm-model-b, and semantic-router - Each vLLM model with dedicated cache PVC - semantic-router + envoy-proxy in single pod (2 containers) - Updated config-openshift.yaml with placeholder IPs - Comments indicate dynamic replacement by deploy script - Template IPs: 172.30.64.134 (model-a), 172.30.116.177 (model-b) - Added comprehensive documentation - README-DYNAMIC-IPS.md: Technical details on dynamic IP feature - Updated README.md: Reflects consolidated script usage - Removed single-namespace/ directory (consolidation complete) Architecture: - 3 Deployments: vllm-model-a, vllm-model-b, semantic-router - Dynamic service discovery using oc get svc -o jsonpath - llm-katan image built from Dockerfile via OpenShift BuildConfig - gp3-csi storage class for all PVCs Tested on OpenShift cluster with successful deployment verification: - Model-A ClusterIP: 172.30.89.145:8000 ✓ - Model-B ClusterIP: 172.30.255.34:8001 ✓ - Both models responding to health checks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: szedan <[email protected]>
Applied markdownlint auto-fixes to ensure documentation follows project style guidelines: - Added blank lines around lists (MD032) - Added blank lines around fenced code blocks (MD031) Files fixed: - deploy/openshift/README.md - deploy/openshift/README-DYNAMIC-IPS.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: szedan <[email protected]>
Add PersistentVolumeClaim creation for vllm-model-a-cache and vllm-model-b-cache to ensure all required storage is provisioned during automated deployment. This fixes pods being stuck in Pending state when running the deploy-to-openshift.sh script. Changes: - Add vllm-model-a-cache PVC (10Gi) - Add vllm-model-b-cache PVC (10Gi) - Ensures full automation without manual PVC creation Signed-off-by: szedan <[email protected]>
Signed-off-by: szedan <[email protected]>
5d058f7 to
52b7855
Compare

Auto-discovers vLLM service ClusterIPs at deployment time, enabling portable OpenShift deployments across different clusters.
Tested on OpenShift with split architecture (3 pods: vllm-model-a, vllm-model-b, semantic-router+envoy).