Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions _snippets/embedder-vllm-reference.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `id` | `str` | Model identifier (HuggingFace model name) | `"intfloat/e5-mistral-7b-instruct"` |
| `dimensions` | `int` | Embedding vector dimensions | `4096` |
| `base_url` | `Optional[str]` | Remote vLLM server URL (enables remote mode) | `None` |
| `api_key` | `Optional[str]` | API key for remote server authentication | `getenv("VLLM_API_KEY")` |
| `enable_batch` | `bool` | Enable batch processing for multiple texts | `False` |
| `batch_size` | `int` | Number of texts to process per batch | `10` |
| `enforce_eager` | `bool` | Use eager execution mode (local mode) | `True` |
| `vllm_kwargs` | `Optional[Dict[str, Any]]` | Additional vLLM engine parameters (local mode) | `None` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just repurpose client_params? Feels like conceptually the same thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm_kwargs configures the vLLM's LLM class for local eg: {"disable_sliding_window": True, "max_model_len": 4096}
whereas client_params is used to configure the OpenAIClient class in remote mode. eg: {"timeout": 30, "max_retries": 3}

They serve diff purpose.

| `request_params` | `Optional[Dict[str, Any]]` | Additional request parameters (remote mode) | `None` |
| `client_params` | `Optional[Dict[str, Any]]` | OpenAI client configuration (remote mode) | `None` |
89 changes: 89 additions & 0 deletions concepts/knowledge/embedder/vllm.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: vLLM Embedder
sidebarTitle: vLLM
---

The vLLM Embedder provides high-performance embedding inference with support for both local and remote deployment modes. All models are downloaded from HuggingFace.

## Usage

### Local Mode

You can directly load local models using the vLLM library, without any need to host a model on a server.

```python vllm_embedder.py
from agno.knowledge.embedder.vllm import VLLMEmbedder
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.pgvector import PgVector

# Get embeddings directly
embeddings = VLLMEmbedder(
id="intfloat/e5-mistral-7b-instruct",
dimensions=4096,
enforce_eager=True,
vllm_kwargs={
"disable_sliding_window": True,
"max_model_len": 4096,
},
).get_embedding("The quick brown fox jumps over the lazy dog.")

print(f"Embeddings: {embeddings[:5]}")
print(f"Dimensions: {len(embeddings)}")

# Use with Knowledge
knowledge = Knowledge(
vector_db=PgVector(
db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
table_name="vllm_embeddings",
embedder=VLLMEmbedder(
id="intfloat/e5-mistral-7b-instruct",
dimensions=4096,
enforce_eager=True,
vllm_kwargs={
"disable_sliding_window": True,
"max_model_len": 4096,
},
),
),
max_results=2,
)
```

### Remote Mode

You can connect to a running vLLM server via an OpenAI-compatible API.

```python vllm_embedder_remote.py
# Remote mode (for production deployments)
knowledge_remote = Knowledge(
vector_db=PgVector(
db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
table_name="vllm_embeddings_remote",
embedder=VLLMEmbedder(
id="intfloat/e5-mistral-7b-instruct",
dimensions=4096,
base_url="http://localhost:8000/v1", # Example endpoint for local development
api_key="your-api-key", # Optional
),
),
max_results=2,
)
```

## Params

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `id` | `str` | `"intfloat/e5-mistral-7b-instruct"` | Model identifier (HuggingFace model name) |
| `dimensions` | `int` | `4096` | Embedding vector dimensions |
| `base_url` | `Optional[str]` | `None` | Remote vLLM server URL (enables remote mode) |
| `api_key` | `Optional[str]` | `getenv("VLLM_API_KEY")` | API key for remote server authentication |
| `enable_batch` | `bool` | `False` | Enable batch processing for multiple texts |
| `batch_size` | `int` | `10` | Number of texts to process per batch |
| `enforce_eager` | `bool` | `True` | Use eager execution mode (local mode) |
| `vllm_kwargs` | `Optional[Dict[str, Any]]` | `None` | Additional vLLM engine parameters (local mode) |
| `request_params` | `Optional[Dict[str, Any]]` | `None` | Additional request parameters (remote mode) |
| `client_params` | `Optional[Dict[str, Any]]` | `None` | OpenAI client configuration (remote mode) |

## Developer Resources
- View [Cookbook](https://github.com/agno-agi/agno/tree/main/cookbook/knowledge/embedders/vllm_embedder.py)
3 changes: 3 additions & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,7 @@
"concepts/knowledge/embedder/qdrant_fastembed",
"concepts/knowledge/embedder/sentencetransformers",
"concepts/knowledge/embedder/together",
"concepts/knowledge/embedder/vllm",
"concepts/knowledge/embedder/voyageai",
"concepts/knowledge/embedder/aws_bedrock"
]
Expand Down Expand Up @@ -1639,6 +1640,7 @@
"examples/concepts/knowledge/embedders/nebius-embedder",
"examples/concepts/knowledge/embedders/sentence-transformer-embedder",
"examples/concepts/knowledge/embedders/together-embedder",
"examples/concepts/knowledge/embedders/vllm-embedder",
"examples/concepts/knowledge/embedders/voyageai-embedder"
]
},
Expand Down Expand Up @@ -2996,6 +2998,7 @@
"reference/knowledge/embedder/openai",
"reference/knowledge/embedder/sentence-transformer",
"reference/knowledge/embedder/together",
"reference/knowledge/embedder/vllm",
"reference/knowledge/embedder/voyageai"
]
},
Expand Down
119 changes: 119 additions & 0 deletions examples/concepts/knowledge/embedders/vllm-embedder.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: vLLM Embedder
---

## Code

```python vllm_embedder.py
import asyncio

from agno.knowledge.embedder.vllm import VLLMEmbedder
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.pgvector import PgVector


def main():
# Basic usage - get embeddings directly
embeddings = VLLMEmbedder(
id="intfloat/e5-mistral-7b-instruct",
dimensions=4096,
enforce_eager=True,
vllm_kwargs={
"disable_sliding_window": True,
"max_model_len": 4096,
},
).get_embedding("The quick brown fox jumps over the lazy dog.")

# Print the embeddings and their dimensions
print(f"Embeddings: {embeddings[:5]}")
print(f"Dimensions: {len(embeddings)}")

# Local Mode with Knowledge
knowledge = Knowledge(
vector_db=PgVector(
db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
table_name="vllm_embeddings",
embedder=VLLMEmbedder(
id="intfloat/e5-mistral-7b-instruct",
dimensions=4096,
enforce_eager=True,
vllm_kwargs={
"disable_sliding_window": True,
"max_model_len": 4096,
},
),
),
max_results=2,
)

# Remote mode with Knowledge
knowledge_remote = Knowledge(
vector_db=PgVector(
db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
table_name="vllm_embeddings_remote",
embedder=VLLMEmbedder(
id="intfloat/e5-mistral-7b-instruct",
dimensions=4096,
base_url="http://localhost:8000/v1",
api_key="your-api-key", # Optional
),
),
max_results=2,
)

asyncio.run(
knowledge.add_content_async(
path="cookbook/knowledge/testing_resources/cv_1.pdf",
)
)


if __name__ == "__main__":
main()
```

## Usage

<Steps>
<Snippet file="create-venv-step.mdx" />

<Step title="Install libraries">
```bash
pip install -U agno vllm openai sqlalchemy psycopg[binary] pgvector pypdf
```
</Step>

<Step title="Run PgVector">
```bash
docker run -d \
-e POSTGRES_DB=ai \
-e POSTGRES_USER=ai \
-e POSTGRES_PASSWORD=ai \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v pgvolume:/var/lib/postgresql/data \
-p 5532:5432 \
--name pgvector \
agno/pgvector:16
```
</Step>

<Step title="Run the example">
<CodeGroup>
```bash Mac
python vllm_embedder.py
```

```bash Windows
python vllm_embedder.py
```
</CodeGroup>
</Step>
</Steps>

## Notes

- This example uses **local mode** where vLLM loads the model directly (no server needed)
- For **remote mode**, the code includes `knowledge_remote` example with `base_url` parameter
- GPU with ~14GB VRAM required for e5-mistral-7b-instruct model
- For CPU-only or lower memory, use smaller models like `BAAI/bge-small-en-v1.5`
- Models are automatically downloaded from HuggingFace on first use
51 changes: 51 additions & 0 deletions reference/knowledge/embedder/vllm.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: vLLM
---

The vLLM Embedder provides high-performance embedding inference with support for both local and remote deployment modes. It can load models directly for local inference or connect to a remote vLLM server via an OpenAI-compatible API.

## Usage

```python
from agno.knowledge.embedder.vllm import VLLMEmbedder
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.pgvector import PgVector

# Local mode
embedder = VLLMEmbedder(
id="intfloat/e5-mistral-7b-instruct",
dimensions=4096,
enforce_eager=True,
vllm_kwargs={
"disable_sliding_window": True,
"max_model_len": 4096,
},
)

# Use with Knowledge
knowledge = Knowledge(
vector_db=PgVector(
db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
table_name="vllm_embeddings",
embedder=embedder,
),
)
```

## Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `id` | `str` | `"intfloat/e5-mistral-7b-instruct"` | Model identifier (HuggingFace model name) |
| `dimensions` | `int` | `4096` | Embedding vector dimensions |
| `base_url` | `Optional[str]` | `None` | Remote vLLM server URL (enables remote mode) |
| `api_key` | `Optional[str]` | `getenv("VLLM_API_KEY")` | API key for remote server authentication |
| `enable_batch` | `bool` | `False` | Enable batch processing for multiple texts |
| `batch_size` | `int` | `10` | Number of texts to process per batch |
| `enforce_eager` | `bool` | `True` | Use eager execution mode (local mode) |
| `vllm_kwargs` | `Optional[Dict[str, Any]]` | `None` | Additional vLLM engine parameters (local mode) |
| `request_params` | `Optional[Dict[str, Any]]` | `None` | Additional request parameters (remote mode) |
| `client_params` | `Optional[Dict[str, Any]]` | `None` | OpenAI client configuration (remote mode) |

## Developer Resources
- View [Cookbook](https://github.com/agno-agi/agno/tree/main/cookbook/knowledge/embedders/vllm_embedder.py)