Skip to content

Conversation

@szedan-rh
Copy link
Contributor

Auto-discovers vLLM service ClusterIPs at deployment time, enabling portable OpenShift deployments across different clusters.

Tested on OpenShift with split architecture (3 pods: vllm-model-a, vllm-model-b, semantic-router+envoy).

@netlify
Copy link

netlify bot commented Nov 5, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 52b7855
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/690e70932fe2450008add62e
😎 Deploy Preview https://deploy-preview-593--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@Xunzhuo
Copy link
Member

Xunzhuo commented Nov 5, 2025

@szedan-rh i have a better idea to solve this in short term, in mid term, we need to wait for #486

@szedan-rh
Copy link
Contributor Author

Hi @Xunzhuo , Thanks for your comment, could you please share your idea?
This is existing script that was developed by Yossi for the demo, I did just the changes for running it in split pods rather than one pod with all containers inside it.

@Xunzhuo
Copy link
Member

Xunzhuo commented Nov 5, 2025

i guess this is to support to configure k8s service as backend, right? like configuring service namespace/name

@szedan-rh
Copy link
Contributor Author

Nope. actually it's just changing how we was deploying the semantic routers pods, it was creating just for the demo that Yossi did in PyTorch conference.
The previous implementation was to create one pod with 4 containers, semantic router, envoy, vllm-katan-a and vllm-katan-b
The current implementation is that vllm-a and vllm-b will be in separated pods and one pod for semantic router and envoy to demonstrate that the semantic router is routing to model-a and model-b according to the classification

@szedan-rh szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from ae2bfec to a28833a Compare November 5, 2025 17:46
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/openshift/README-DYNAMIC-IPS.md
  • deploy/openshift/README.md
  • deploy/openshift/config-openshift.yaml
  • deploy/openshift/deploy-to-openshift.sh
  • deploy/openshift/deployment.yaml

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@szedan-rh szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from 8cc8bdc to 41d72ab Compare November 5, 2025 20:28
@szedan-rh
Copy link
Contributor Author

@Xunzhuo - could you please take another look?

@szedan-rh szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from b81d806 to a68b2df Compare November 6, 2025 15:36
Copy link
Member

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm_endpoint is for testing/PoC in local/docker-compose, i suggest looking and testing https://vllm-semantic-router.com/docs/installation/k8s/ai-gateway

With this we dont need to care about how to manage backend ip or service name, i just select the model, the model can be an in-cluster and also can be an external LLM Provider.

not a blocker for this PR, plz fix the lint and i will get this in. thanks

@szedan-rh szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch 2 times, most recently from 5aa352f to cdd53e6 Compare November 7, 2025 15:33
@szedan-rh szedan-rh marked this pull request as draft November 7, 2025 15:34
@szedan-rh szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from cdd53e6 to d317d99 Compare November 7, 2025 15:42
@szedan-rh
Copy link
Contributor Author

Thank you for sharing @Xunzhuo , will check that for sure.

@szedan-rh szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from d317d99 to fced99f Compare November 7, 2025 15:47
@szedan-rh szedan-rh marked this pull request as ready for review November 7, 2025 16:08
@szedan-rh szedan-rh requested a review from Xunzhuo November 7, 2025 16:08
@szedan-rh szedan-rh changed the title feat(openshift): dynamic IP configuration for cross-cluster deployment feat(openshift): Split vllm-katan-a and vllm-katan-b to run on seprte pods rather than the same semantic router pod. Nov 7, 2025
@szedan-rh szedan-rh changed the title feat(openshift): Split vllm-katan-a and vllm-katan-b to run on seprte pods rather than the same semantic router pod. feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. Nov 7, 2025
yossiovadia added a commit to yossiovadia/semantic-router that referenced this pull request Nov 7, 2025
…lint

The pre-commit yaml-and-yml-fmt hook was incorrectly calling 'make markdown-lint'
instead of 'make yaml-lint', causing YAML files to not be properly linted locally
while GitHub Actions CI would catch the issues.

This led to confusion where PRs would fail in CI even though 'pre-commit run
--all-files' passed locally, forcing contributors to fix pre-existing YAML
linting issues in files they never touched (e.g., PR vllm-project#593).

Fixes vllm-project#608

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
@yossiovadia
Copy link
Collaborator

Hi @szedan-rh , @Xunzhuo , i looked into it ( the lint issues ) , it seems we had a bug in the
.pre-commit-config.yaml this should resolve it -
#609

the chicken and egg would be resolved once this PR will be merged ( then the #609) pre-commit test will pass.

rootfs pushed a commit that referenced this pull request Nov 7, 2025
…lint (#609)

The pre-commit yaml-and-yml-fmt hook was incorrectly calling 'make markdown-lint'
instead of 'make yaml-lint', causing YAML files to not be properly linted locally
while GitHub Actions CI would catch the issues.

This led to confusion where PRs would fail in CI even though 'pre-commit run
--all-files' passed locally, forcing contributors to fix pre-existing YAML
linting issues in files they never touched (e.g., PR #593).

Fixes #608

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
szedan-rh and others added 4 commits November 8, 2025 00:19
This commit consolidates the OpenShift deployment into a single unified
script with automatic ClusterIP discovery for cross-cluster portability.

Changes:
- Enhanced deploy-to-openshift.sh with dynamic IP discovery
  - Auto-discovers vLLM service ClusterIPs at deployment time
  - Generates configuration with actual IPs (portable across clusters)
  - Fallback sed replacement for robustness
- Updated deployment.yaml for split architecture
  - Separate pods for vllm-model-a, vllm-model-b, and semantic-router
  - Each vLLM model with dedicated cache PVC
  - semantic-router + envoy-proxy in single pod (2 containers)
- Updated config-openshift.yaml with placeholder IPs
  - Comments indicate dynamic replacement by deploy script
  - Template IPs: 172.30.64.134 (model-a), 172.30.116.177 (model-b)
- Added comprehensive documentation
  - README-DYNAMIC-IPS.md: Technical details on dynamic IP feature
  - Updated README.md: Reflects consolidated script usage
- Removed single-namespace/ directory (consolidation complete)

Architecture:
- 3 Deployments: vllm-model-a, vllm-model-b, semantic-router
- Dynamic service discovery using oc get svc -o jsonpath
- llm-katan image built from Dockerfile via OpenShift BuildConfig
- gp3-csi storage class for all PVCs

Tested on OpenShift cluster with successful deployment verification:
- Model-A ClusterIP: 172.30.89.145:8000 ✓
- Model-B ClusterIP: 172.30.255.34:8001 ✓
- Both models responding to health checks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: szedan <[email protected]>
Applied markdownlint auto-fixes to ensure documentation follows
project style guidelines:

- Added blank lines around lists (MD032)
- Added blank lines around fenced code blocks (MD031)

Files fixed:
- deploy/openshift/README.md
- deploy/openshift/README-DYNAMIC-IPS.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: szedan <[email protected]>
Add PersistentVolumeClaim creation for vllm-model-a-cache and vllm-model-b-cache to ensure all required storage is provisioned during automated deployment. This fixes pods being stuck in Pending state when running the deploy-to-openshift.sh script.

Changes:
- Add vllm-model-a-cache PVC (10Gi)
- Add vllm-model-b-cache PVC (10Gi)
- Ensures full automation without manual PVC creation

Signed-off-by: szedan <[email protected]>
@szedan-rh szedan-rh force-pushed the feat/openshift-dynamic-ip-deployment branch from 5d058f7 to 52b7855 Compare November 7, 2025 22:20
@rootfs rootfs merged commit f203719 into vllm-project:main Nov 8, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants