Skip to content

Commit 6f43af4

Browse files
authored
fix: vllm rerank endpoint and upgrade the documentation (#609)
Fixes #608 Signed-off-by: ffais <[email protected]>
1 parent e2775c4 commit 6f43af4

File tree

4 files changed

+4
-4
lines changed

4 files changed

+4
-4
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
IMG ?= controller:latest
33
# ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary.
44
ENVTEST_K8S_VERSION = 1.33.0
5-
CRD_REF_DOCS_VERSION = v0.1.0
5+
CRD_REF_DOCS_VERSION = v0.2.0
66
SKAFFOLD_VERSION = v2.13.2
77

88
# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)

docs/how-to/configure-reranking-models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Once the pod is ready, you can call the rerank endpoint:
3131
```python
3232
import requests
3333
resp = requests.post(
34-
"http://localhost:8000/vllm/v1/rerank",
34+
"http://localhost:8000/openai/v1/rerank",
3535
json={
3636
"model": "bge-rerank-base-cpu",
3737
"query": "Which document talks about apples?",

docs/reference/kubernetes-api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ _Appears in:_
128128

129129
| Field | Description | Default | Validation |
130130
| --- | --- | --- | --- |
131-
| `url` _string_ | URL of the model to be served.<br />Currently the following formats are supported:<br /><br />For VLLM, FasterWhisper, Infinity engines:<br /><br />"hf://<repo>/<model>"<br />"pvc://<pvcName>"<br />"pvc://<pvcName>/<pvcSubpath>"<br />"gs://<bucket>/<path>" (only with cacheProfile)<br />"oss://<bucket>/<path>" (only with cacheProfile)<br />"s3://<bucket>/<path>" (only with cacheProfile)<br /><br />For OLlama engine:<br /><br />"ollama://<model>" | | Required: \{\} <br /> |
131+
| `url` _string_ | URL of the model to be served.<br />Currently the following formats are supported:<br />For VLLM, FasterWhisper, Infinity engines:<br />"hf://<repo>/<model>"<br />"pvc://<pvcName>"<br />"pvc://<pvcName>/<pvcSubpath>"<br />"gs://<bucket>/<path>" (only with cacheProfile)<br />"oss://<bucket>/<path>" (only with cacheProfile)<br />"s3://<bucket>/<path>" (only with cacheProfile)<br />For OLlama engine:<br />"ollama://<model>" | | Required: \{\} <br /> |
132132
| `adapters` _[Adapter](#adapter) array_ | | | |
133133
| `features` _[ModelFeature](#modelfeature) array_ | Features that the model supports.<br />Dictates the APIs that are available for the model. | | Enum: [TextGeneration TextEmbedding Reranking SpeechToText] <br /> |
134134
| `engine` _string_ | Engine to be used for the server process. | | Enum: [OLlama VLLM FasterWhisper Infinity] <br />Required: \{\} <br /> |

internal/openaiserver/handler.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ func NewHandler(k8sClient client.Client, modelProxy *modelproxy.Handler) *Handle
3838
handle("/openai/v1/chat/completions", http.StripPrefix("/openai", modelProxy))
3939
handle("/openai/v1/completions", http.StripPrefix("/openai", modelProxy))
4040
handle("/openai/v1/embeddings", http.StripPrefix("/openai", modelProxy))
41-
handle("/vllm/v1/rerank", http.StripPrefix("/vllm", modelProxy))
41+
handle("/openai/v1/rerank", http.StripPrefix("/openai", modelProxy))
4242
handle("/openai/v1/audio/transcriptions", http.StripPrefix("/openai", modelProxy))
4343
handle("/openai/v1/models", http.HandlerFunc(h.getModels))
4444

0 commit comments

Comments
 (0)