generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 191
docs: Adding the Gateway inference support documentation for Nginx Ga… #1789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sindhushiv
wants to merge
9
commits into
kubernetes-sigs:main
Choose a base branch
from
sindhushiv:ngf/inference-support-documentation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+237
−0
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
343e558
docs: Adding the Gateway inference support documentation for Nginx Ga…
sindhushiv c694cf8
docs: Addressing comments
sindhushiv 1bca5c6
docs: Addressed new set of comments
sindhushiv b0fbcc8
docs: Fixed the helm command
sindhushiv 308bf89
docs: Fixed cleaned up command
sindhushiv d0e33d6
docs: Fixed cleaned up command
sindhushiv 3888530
docs: Adding released version
sindhushiv d3dceff
docs: Fixing the YAML files for test failure
sindhushiv 8e0382e
docs: Fixing the HTTP YAML file for test failure
sindhushiv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| apiVersion: gateway.networking.k8s.io/v1 | ||
| kind: Gateway | ||
| metadata: | ||
| name: inference-gateway | ||
| spec: | ||
| gatewayClassName: nginx | ||
| listeners: | ||
| - name: http | ||
| port: 80 | ||
| protocol: HTTP |
18 changes: 18 additions & 0 deletions
18
config/manifests/gateway/nginxgatewayfabric/httproute.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| apiVersion: gateway.networking.k8s.io/v1 | ||
| kind: HTTPRoute | ||
| metadata: | ||
| name: llm-route | ||
| namespace: default | ||
| spec: | ||
| parentRefs: | ||
| - name: inference-gateway | ||
| rules: | ||
| - matches: | ||
| - path: | ||
| type: PathPrefix | ||
| value: / | ||
| backendRefs: | ||
| - group: inference.networking.k8s.io | ||
| kind: InferencePool | ||
| name: vllm-llama3-8b-instruct | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -87,6 +87,22 @@ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extens | |
| helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true | ||
| ``` | ||
|
|
||
| === "NGINX Gateway Fabric" | ||
|
|
||
| 1. Requirements | ||
|
|
||
| - Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed (Standard or Experimental channel). | ||
| - [Helm](https://helm.sh/docs/intro/install/) installed. | ||
| - A Kubernetes cluster with LoadBalancer or NodePort access. | ||
|
|
||
| 2. Install NGINX Gateway Fabric with the Inference Extension enabled by setting the `nginxGateway.gwAPIInferenceExtension.enable=true` Helm value | ||
|
|
||
| ```bash | ||
| helm install ngf oci://ghcr.io/nginx/charts/nginx-gateway-fabric --create-namespace -n nginx-gateway --set nginxGateway.gwAPIInferenceExtension.enable=true | ||
| ``` | ||
| This enables NGINX Gateway Fabric to watch and manage Inference Extension resources such as InferencePool and InferenceObjective. | ||
|
|
||
|
|
||
| ### Deploy the InferencePool and Endpoint Picker Extension | ||
|
|
||
| Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label `app: vllm-llama3-8b-instruct` and listening on port 8000. The Helm install command automatically installs the endpoint-picker, InferencePool along with provider specific resources. | ||
|
|
@@ -200,6 +216,57 @@ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extens | |
| kubectl get httproute llm-route -o yaml | ||
| ``` | ||
|
|
||
| === "NGINX Gateway Fabric" | ||
|
|
||
| NGINX Gateway Fabric is an implementation of the Gateway API that supports the Inference Extension. Follow these steps to deploy an Inference Gateway using NGINX Gateway Fabric. | ||
|
|
||
| 1. Deploy the Gateway | ||
|
|
||
| ```bash | ||
| kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/nginxgatewayfabric/gateway.yaml | ||
| ``` | ||
|
|
||
| 2. Verify the Gateway status | ||
|
|
||
| Ensure that the Gateway is running and has been assigned an address: | ||
|
|
||
| ```bash | ||
| kubectl get gateway inference-gateway | ||
| ``` | ||
|
|
||
| Check that the Gateway has been successfully provisioned and that its status shows Programmed=True | ||
|
|
||
| 3. Deploy the HTTPRoute | ||
|
|
||
| Create the HTTPRoute resource to route traffic to your InferencePool: | ||
|
|
||
| ```bash | ||
| kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/nginxgatewayfabric/httproute.yaml | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
| ``` | ||
|
|
||
| 4. Verify the route status | ||
|
|
||
| Check that the HTTPRoute was successfully configured and references were resolved: | ||
|
|
||
| ```bash | ||
| kubectl get httproute llm-route -o yaml | ||
| ``` | ||
|
|
||
| The route status should include Accepted=True and ResolvedRefs=True. | ||
|
|
||
| 5. Verify the InferencePool Status | ||
|
|
||
| Make sure the InferencePool is active before sending traffic. | ||
|
|
||
| ```bash | ||
| kubectl describe inferencepools.inference.networking.k8s.io vllm-llama3-8b-instruct | ||
| ``` | ||
|
|
||
| Check that the status shows Accepted=True and ResolvedRefs=True. This confirms the InferencePool is ready to handle traffic. | ||
|
|
||
| For more information, see the [NGINX Gateway Fabric - Inference Gateway Setup guide](https://docs.nginx.com/nginx-gateway-fabric/how-to/gateway-api-inference-extension/#overview) | ||
|
|
||
|
|
||
| ### Deploy InferenceObjective (Optional) | ||
|
|
||
| Deploy the sample InferenceObjective which allows you to specify priority of requests. | ||
|
|
@@ -293,3 +360,27 @@ Deploy the sample InferenceObjective which allows you to specify priority of req | |
| ```bash | ||
| kubectl delete ns kgateway-system | ||
| ``` | ||
|
|
||
| === "NGINX Gateway Fabric" | ||
|
|
||
| Follow these steps to remove the NGINX Gateway Fabric Inference Gateway and all related resources. | ||
|
|
||
|
|
||
| 1. Remove Inference Gateway and HTTPRoute: | ||
|
|
||
| ```bash | ||
| kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/nginxgatewayfabric/gateway.yaml --ignore-not-found | ||
| kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/nginxgatewayfabric/httproute.yaml --ignore-not-found | ||
|
Comment on lines
+372
to
+373
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here |
||
| ``` | ||
|
|
||
| 2. Uninstall NGINX Gateway Fabric: | ||
|
|
||
| ```bash | ||
| helm uninstall ngf -n nginx-gateway | ||
| ``` | ||
|
|
||
| 3. Clean up namespace: | ||
|
|
||
| ```bash | ||
| kubectl delete ns nginx-gateway | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we update the link to point to the latest release instead of main?