feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. #593

szedan-rh · 2025-11-05T07:33:00Z

Auto-discovers vLLM service ClusterIPs at deployment time, enabling portable OpenShift deployments across different clusters.

Tested on OpenShift with split architecture (3 pods: vllm-model-a, vllm-model-b, semantic-router+envoy).

netlify · 2025-11-05T07:33:05Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`52b7855`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/690e70932fe2450008add62e
😎 Deploy Preview	https://deploy-preview-593--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Xunzhuo · 2025-11-05T09:44:55Z

@szedan-rh i have a better idea to solve this in short term, in mid term, we need to wait for #486

szedan-rh · 2025-11-05T09:51:27Z

Hi @Xunzhuo , Thanks for your comment, could you please share your idea?
This is existing script that was developed by Yossi for the demo, I did just the changes for running it in split pods rather than one pod with all containers inside it.

Xunzhuo · 2025-11-05T10:41:31Z

i guess this is to support to configure k8s service as backend, right? like configuring service namespace/name

szedan-rh · 2025-11-05T10:46:31Z

Nope. actually it's just changing how we was deploying the semantic routers pods, it was creating just for the demo that Yossi did in PyTorch conference.
The previous implementation was to create one pod with 4 containers, semantic router, envoy, vllm-katan-a and vllm-katan-b
The current implementation is that vllm-a and vllm-b will be in separated pods and one pod for semantic router and envoy to demonstrate that the semantic router is routing to model-a and model-b according to the classification

github-actions · 2025-11-05T17:46:44Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/openshift/README-DYNAMIC-IPS.md
deploy/openshift/README.md
deploy/openshift/config-openshift.yaml
deploy/openshift/deploy-to-openshift.sh
deploy/openshift/deployment.yaml

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

szedan-rh · 2025-11-06T15:07:33Z

@Xunzhuo - could you please take another look?

Xunzhuo

vllm_endpoint is for testing/PoC in local/docker-compose, i suggest looking and testing https://vllm-semantic-router.com/docs/installation/k8s/ai-gateway

With this we dont need to care about how to manage backend ip or service name, i just select the model, the model can be an in-cluster and also can be an external LLM Provider.

not a blocker for this PR, plz fix the lint and i will get this in. thanks

szedan-rh · 2025-11-07T15:44:38Z

Thank you for sharing @Xunzhuo , will check that for sure.

…lint The pre-commit yaml-and-yml-fmt hook was incorrectly calling 'make markdown-lint' instead of 'make yaml-lint', causing YAML files to not be properly linted locally while GitHub Actions CI would catch the issues. This led to confusion where PRs would fail in CI even though 'pre-commit run --all-files' passed locally, forcing contributors to fix pre-existing YAML linting issues in files they never touched (e.g., PR vllm-project#593). Fixes vllm-project#608 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

yossiovadia · 2025-11-07T17:09:16Z

Hi @szedan-rh , @Xunzhuo , i looked into it ( the lint issues ) , it seems we had a bug in the
.pre-commit-config.yaml this should resolve it -
#609

the chicken and egg would be resolved once this PR will be merged ( then the #609) pre-commit test will pass.

…lint (#609) The pre-commit yaml-and-yml-fmt hook was incorrectly calling 'make markdown-lint' instead of 'make yaml-lint', causing YAML files to not be properly linted locally while GitHub Actions CI would catch the issues. This led to confusion where PRs would fail in CI even though 'pre-commit run --all-files' passed locally, forcing contributors to fix pre-existing YAML linting issues in files they never touched (e.g., PR #593). Fixes #608 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>

This commit consolidates the OpenShift deployment into a single unified script with automatic ClusterIP discovery for cross-cluster portability. Changes: - Enhanced deploy-to-openshift.sh with dynamic IP discovery - Auto-discovers vLLM service ClusterIPs at deployment time - Generates configuration with actual IPs (portable across clusters) - Fallback sed replacement for robustness - Updated deployment.yaml for split architecture - Separate pods for vllm-model-a, vllm-model-b, and semantic-router - Each vLLM model with dedicated cache PVC - semantic-router + envoy-proxy in single pod (2 containers) - Updated config-openshift.yaml with placeholder IPs - Comments indicate dynamic replacement by deploy script - Template IPs: 172.30.64.134 (model-a), 172.30.116.177 (model-b) - Added comprehensive documentation - README-DYNAMIC-IPS.md: Technical details on dynamic IP feature - Updated README.md: Reflects consolidated script usage - Removed single-namespace/ directory (consolidation complete) Architecture: - 3 Deployments: vllm-model-a, vllm-model-b, semantic-router - Dynamic service discovery using oc get svc -o jsonpath - llm-katan image built from Dockerfile via OpenShift BuildConfig - gp3-csi storage class for all PVCs Tested on OpenShift cluster with successful deployment verification: - Model-A ClusterIP: 172.30.89.145:8000 ✓ - Model-B ClusterIP: 172.30.255.34:8001 ✓ - Both models responding to health checks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: szedan <[email protected]>

Applied markdownlint auto-fixes to ensure documentation follows project style guidelines: - Added blank lines around lists (MD032) - Added blank lines around fenced code blocks (MD031) Files fixed: - deploy/openshift/README.md - deploy/openshift/README-DYNAMIC-IPS.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: szedan <[email protected]>

Add PersistentVolumeClaim creation for vllm-model-a-cache and vllm-model-b-cache to ensure all required storage is provisioned during automated deployment. This fixes pods being stuck in Pending state when running the deploy-to-openshift.sh script. Changes: - Add vllm-model-a-cache PVC (10Gi) - Add vllm-model-b-cache PVC (10Gi) - Ensures full automation without manual PVC creation Signed-off-by: szedan <[email protected]>

Signed-off-by: szedan <[email protected]>

szedan-rh requested review from Xunzhuo and rootfs as code owners November 5, 2025 07:33

szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from ae2bfec to a28833a Compare November 5, 2025 17:46

github-actions bot assigned rootfs and Xunzhuo Nov 5, 2025

szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from 8cc8bdc to 41d72ab Compare November 5, 2025 20:28

szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from b81d806 to a68b2df Compare November 6, 2025 15:36

Xunzhuo reviewed Nov 7, 2025

View reviewed changes

szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch 2 times, most recently from 5aa352f to cdd53e6 Compare November 7, 2025 15:33

szedan-rh marked this pull request as draft November 7, 2025 15:34

szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from cdd53e6 to d317d99 Compare November 7, 2025 15:42

szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from d317d99 to fced99f Compare November 7, 2025 15:47

szedan-rh marked this pull request as ready for review November 7, 2025 16:08

szedan-rh requested a review from Xunzhuo November 7, 2025 16:08

szedan-rh changed the title ~~feat(openshift): dynamic IP configuration for cross-cluster deployment~~ feat(openshift): Split vllm-katan-a and vllm-katan-b to run on seprte pods rather than the same semantic router pod. Nov 7, 2025

szedan-rh changed the title ~~feat(openshift): Split vllm-katan-a and vllm-katan-b to run on seprte pods rather than the same semantic router pod.~~ feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. Nov 7, 2025

yossiovadia mentioned this pull request Nov 7, 2025

fix: pre-commit yaml linting hook calls wrong command (markdown-lint instead of yaml-lint) #608

Closed

yossiovadia mentioned this pull request Nov 7, 2025

fix: correct yaml linting hook to call yaml-lint instead of markdown-lint #609

Merged

4 tasks

yossiovadia mentioned this pull request Nov 7, 2025

Revert "fix: correct yaml linting hook to call yaml-lint instead of markdown-lint" #610

Merged

szedan-rh and others added 4 commits November 8, 2025 00:19

fix: remove useless cat in deploy script to fix shellcheck SC2002

52b7855

Signed-off-by: szedan <[email protected]>

szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from 5d058f7 to 52b7855 Compare November 7, 2025 22:20

rootfs approved these changes Nov 8, 2025

View reviewed changes

rootfs merged commit f203719 into vllm-project:main Nov 8, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. #593

feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. #593

szedan-rh commented Nov 5, 2025

Uh oh!

netlify bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

Xunzhuo commented Nov 5, 2025

Uh oh!

szedan-rh commented Nov 5, 2025

Uh oh!

Xunzhuo commented Nov 5, 2025 •

edited

Loading

Uh oh!

szedan-rh commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

szedan-rh commented Nov 6, 2025

Uh oh!

Xunzhuo left a comment •

edited

Loading

Uh oh!

szedan-rh commented Nov 7, 2025

Uh oh!

yossiovadia commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. #593

feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. #593

Conversation

szedan-rh commented Nov 5, 2025

Uh oh!

netlify bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

Xunzhuo commented Nov 5, 2025

Uh oh!

szedan-rh commented Nov 5, 2025

Uh oh!

Xunzhuo commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szedan-rh commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 deploy

🎉 Thanks for your contributions!

Uh oh!

szedan-rh commented Nov 6, 2025

Uh oh!

Xunzhuo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szedan-rh commented Nov 7, 2025

Uh oh!

yossiovadia commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Nov 5, 2025 •

edited

Loading

Xunzhuo commented Nov 5, 2025 •

edited

Loading

github-actions bot commented Nov 5, 2025 •

edited

Loading

📁 `deploy`

Xunzhuo left a comment •

edited

Loading