-
Notifications
You must be signed in to change notification settings - Fork 31
docs: add vLLM embedder documentation #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from 5 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f0aa163
docs: add vLLM embedder documentation
uzaxirr 8df4c17
Update concepts/knowledge/embedder/vllm.mdx
uzaxirr 4d9ea1d
Update concepts/knowledge/embedder/vllm.mdx
uzaxirr 519e7d7
Update concepts/knowledge/embedder/vllm.mdx
uzaxirr 31fd5b5
cmnts
uzaxirr 1602d4e
Apply suggestion from @dirkbrnd
dirkbrnd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| | Parameter | Type | Description | Default | | ||
| |-----------|------|-------------|---------| | ||
| | `id` | `str` | Model identifier (HuggingFace model name) | `"intfloat/e5-mistral-7b-instruct"` | | ||
| | `dimensions` | `int` | Embedding vector dimensions | `4096` | | ||
| | `base_url` | `Optional[str]` | Remote vLLM server URL (enables remote mode) | `None` | | ||
| | `api_key` | `Optional[str]` | API key for remote server authentication | `getenv("VLLM_API_KEY")` | | ||
| | `enable_batch` | `bool` | Enable batch processing for multiple texts | `False` | | ||
| | `batch_size` | `int` | Number of texts to process per batch | `10` | | ||
| | `enforce_eager` | `bool` | Use eager execution mode (local mode) | `True` | | ||
| | `vllm_kwargs` | `Optional[Dict[str, Any]]` | Additional vLLM engine parameters (local mode) | `None` | | ||
| | `request_params` | `Optional[Dict[str, Any]]` | Additional request parameters (remote mode) | `None` | | ||
| | `client_params` | `Optional[Dict[str, Any]]` | OpenAI client configuration (remote mode) | `None` | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| --- | ||
| title: vLLM Embedder | ||
| sidebarTitle: vLLM | ||
| --- | ||
|
|
||
| The vLLM Embedder provides high-performance embedding inference with support for both local and remote deployment modes. All models are downloaded from HuggingFace. | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Local Mode | ||
|
|
||
| You can directly load local models using the vLLM library, without any need to host a model on a server. | ||
dirkbrnd marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```python vllm_embedder.py | ||
| from agno.knowledge.embedder.vllm import VLLMEmbedder | ||
| from agno.knowledge.knowledge import Knowledge | ||
| from agno.vectordb.pgvector import PgVector | ||
|
|
||
| # Get embeddings directly | ||
| embeddings = VLLMEmbedder( | ||
| id="intfloat/e5-mistral-7b-instruct", | ||
| dimensions=4096, | ||
| enforce_eager=True, | ||
| vllm_kwargs={ | ||
| "disable_sliding_window": True, | ||
| "max_model_len": 4096, | ||
| }, | ||
| ).get_embedding("The quick brown fox jumps over the lazy dog.") | ||
|
|
||
| print(f"Embeddings: {embeddings[:5]}") | ||
| print(f"Dimensions: {len(embeddings)}") | ||
|
|
||
| # Use with Knowledge | ||
| knowledge = Knowledge( | ||
| vector_db=PgVector( | ||
| db_url="postgresql+psycopg://ai:ai@localhost:5532/ai", | ||
| table_name="vllm_embeddings", | ||
| embedder=VLLMEmbedder( | ||
| id="intfloat/e5-mistral-7b-instruct", | ||
| dimensions=4096, | ||
| enforce_eager=True, | ||
| vllm_kwargs={ | ||
| "disable_sliding_window": True, | ||
| "max_model_len": 4096, | ||
| }, | ||
| ), | ||
| ), | ||
| max_results=2, | ||
| ) | ||
| ``` | ||
|
|
||
| ### Remote Mode | ||
|
|
||
| You can connect to a running vLLM server via an OpenAI-compatible API. | ||
|
|
||
| ```python vllm_embedder_remote.py | ||
| # Remote mode (for production deployments) | ||
| knowledge_remote = Knowledge( | ||
| vector_db=PgVector( | ||
| db_url="postgresql+psycopg://ai:ai@localhost:5532/ai", | ||
| table_name="vllm_embeddings_remote", | ||
| embedder=VLLMEmbedder( | ||
| id="intfloat/e5-mistral-7b-instruct", | ||
| dimensions=4096, | ||
| base_url="http://localhost:8000/v1", # Example endpoint for local development | ||
| api_key="your-api-key", # Optional | ||
| ), | ||
| ), | ||
| max_results=2, | ||
| ) | ||
| ``` | ||
|
|
||
| ## Params | ||
|
|
||
| | Parameter | Type | Default | Description | | ||
| |-----------|------|---------|-------------| | ||
| | `id` | `str` | `"intfloat/e5-mistral-7b-instruct"` | Model identifier (HuggingFace model name) | | ||
| | `dimensions` | `int` | `4096` | Embedding vector dimensions | | ||
| | `base_url` | `Optional[str]` | `None` | Remote vLLM server URL (enables remote mode) | | ||
| | `api_key` | `Optional[str]` | `getenv("VLLM_API_KEY")` | API key for remote server authentication | | ||
| | `enable_batch` | `bool` | `False` | Enable batch processing for multiple texts | | ||
| | `batch_size` | `int` | `10` | Number of texts to process per batch | | ||
| | `enforce_eager` | `bool` | `True` | Use eager execution mode (local mode) | | ||
| | `vllm_kwargs` | `Optional[Dict[str, Any]]` | `None` | Additional vLLM engine parameters (local mode) | | ||
| | `request_params` | `Optional[Dict[str, Any]]` | `None` | Additional request parameters (remote mode) | | ||
| | `client_params` | `Optional[Dict[str, Any]]` | `None` | OpenAI client configuration (remote mode) | | ||
|
|
||
| ## Developer Resources | ||
| - View [Cookbook](https://github.com/agno-agi/agno/tree/main/cookbook/knowledge/embedders/vllm_embedder.py) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
119 changes: 119 additions & 0 deletions
119
examples/concepts/knowledge/embedders/vllm-embedder.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| --- | ||
| title: vLLM Embedder | ||
| --- | ||
|
|
||
| ## Code | ||
|
|
||
| ```python vllm_embedder.py | ||
| import asyncio | ||
|
|
||
| from agno.knowledge.embedder.vllm import VLLMEmbedder | ||
| from agno.knowledge.knowledge import Knowledge | ||
| from agno.vectordb.pgvector import PgVector | ||
|
|
||
|
|
||
| def main(): | ||
| # Basic usage - get embeddings directly | ||
| embeddings = VLLMEmbedder( | ||
| id="intfloat/e5-mistral-7b-instruct", | ||
| dimensions=4096, | ||
| enforce_eager=True, | ||
| vllm_kwargs={ | ||
| "disable_sliding_window": True, | ||
| "max_model_len": 4096, | ||
| }, | ||
| ).get_embedding("The quick brown fox jumps over the lazy dog.") | ||
|
|
||
| # Print the embeddings and their dimensions | ||
| print(f"Embeddings: {embeddings[:5]}") | ||
| print(f"Dimensions: {len(embeddings)}") | ||
|
|
||
| # Local Mode with Knowledge | ||
| knowledge = Knowledge( | ||
| vector_db=PgVector( | ||
| db_url="postgresql+psycopg://ai:ai@localhost:5532/ai", | ||
| table_name="vllm_embeddings", | ||
| embedder=VLLMEmbedder( | ||
| id="intfloat/e5-mistral-7b-instruct", | ||
| dimensions=4096, | ||
| enforce_eager=True, | ||
| vllm_kwargs={ | ||
| "disable_sliding_window": True, | ||
| "max_model_len": 4096, | ||
| }, | ||
| ), | ||
| ), | ||
| max_results=2, | ||
| ) | ||
|
|
||
| # Remote mode with Knowledge | ||
| knowledge_remote = Knowledge( | ||
| vector_db=PgVector( | ||
| db_url="postgresql+psycopg://ai:ai@localhost:5532/ai", | ||
| table_name="vllm_embeddings_remote", | ||
| embedder=VLLMEmbedder( | ||
| id="intfloat/e5-mistral-7b-instruct", | ||
| dimensions=4096, | ||
| base_url="http://localhost:8000/v1", | ||
| api_key="your-api-key", # Optional | ||
| ), | ||
| ), | ||
| max_results=2, | ||
| ) | ||
|
|
||
| asyncio.run( | ||
| knowledge.add_content_async( | ||
| path="cookbook/knowledge/testing_resources/cv_1.pdf", | ||
| ) | ||
| ) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| <Steps> | ||
| <Snippet file="create-venv-step.mdx" /> | ||
|
|
||
| <Step title="Install libraries"> | ||
| ```bash | ||
| pip install -U agno vllm openai sqlalchemy psycopg[binary] pgvector pypdf | ||
| ``` | ||
| </Step> | ||
|
|
||
| <Step title="Run PgVector"> | ||
| ```bash | ||
| docker run -d \ | ||
| -e POSTGRES_DB=ai \ | ||
| -e POSTGRES_USER=ai \ | ||
| -e POSTGRES_PASSWORD=ai \ | ||
| -e PGDATA=/var/lib/postgresql/data/pgdata \ | ||
| -v pgvolume:/var/lib/postgresql/data \ | ||
| -p 5532:5432 \ | ||
| --name pgvector \ | ||
| agno/pgvector:16 | ||
| ``` | ||
| </Step> | ||
|
|
||
| <Step title="Run the example"> | ||
| <CodeGroup> | ||
| ```bash Mac | ||
| python vllm_embedder.py | ||
| ``` | ||
|
|
||
| ```bash Windows | ||
| python vllm_embedder.py | ||
| ``` | ||
| </CodeGroup> | ||
| </Step> | ||
| </Steps> | ||
|
|
||
| ## Notes | ||
|
|
||
| - This example uses **local mode** where vLLM loads the model directly (no server needed) | ||
| - For **remote mode**, the code includes `knowledge_remote` example with `base_url` parameter | ||
| - GPU with ~14GB VRAM required for e5-mistral-7b-instruct model | ||
| - For CPU-only or lower memory, use smaller models like `BAAI/bge-small-en-v1.5` | ||
| - Models are automatically downloaded from HuggingFace on first use |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| --- | ||
| title: vLLM | ||
| --- | ||
|
|
||
| The vLLM Embedder provides high-performance embedding inference with support for both local and remote deployment modes. It can load models directly for local inference or connect to a remote vLLM server via an OpenAI-compatible API. | ||
|
|
||
| ## Usage | ||
|
|
||
| ```python | ||
| from agno.knowledge.embedder.vllm import VLLMEmbedder | ||
| from agno.knowledge.knowledge import Knowledge | ||
| from agno.vectordb.pgvector import PgVector | ||
|
|
||
| # Local mode | ||
| embedder = VLLMEmbedder( | ||
| id="intfloat/e5-mistral-7b-instruct", | ||
| dimensions=4096, | ||
| enforce_eager=True, | ||
| vllm_kwargs={ | ||
| "disable_sliding_window": True, | ||
| "max_model_len": 4096, | ||
| }, | ||
| ) | ||
|
|
||
| # Use with Knowledge | ||
| knowledge = Knowledge( | ||
| vector_db=PgVector( | ||
| db_url="postgresql+psycopg://ai:ai@localhost:5532/ai", | ||
| table_name="vllm_embeddings", | ||
| embedder=embedder, | ||
| ), | ||
| ) | ||
| ``` | ||
|
|
||
| ## Parameters | ||
|
|
||
| | Parameter | Type | Default | Description | | ||
| |-----------|------|---------|-------------| | ||
| | `id` | `str` | `"intfloat/e5-mistral-7b-instruct"` | Model identifier (HuggingFace model name) | | ||
| | `dimensions` | `int` | `4096` | Embedding vector dimensions | | ||
| | `base_url` | `Optional[str]` | `None` | Remote vLLM server URL (enables remote mode) | | ||
| | `api_key` | `Optional[str]` | `getenv("VLLM_API_KEY")` | API key for remote server authentication | | ||
| | `enable_batch` | `bool` | `False` | Enable batch processing for multiple texts | | ||
| | `batch_size` | `int` | `10` | Number of texts to process per batch | | ||
| | `enforce_eager` | `bool` | `True` | Use eager execution mode (local mode) | | ||
| | `vllm_kwargs` | `Optional[Dict[str, Any]]` | `None` | Additional vLLM engine parameters (local mode) | | ||
| | `request_params` | `Optional[Dict[str, Any]]` | `None` | Additional request parameters (remote mode) | | ||
| | `client_params` | `Optional[Dict[str, Any]]` | `None` | OpenAI client configuration (remote mode) | | ||
|
|
||
| ## Developer Resources | ||
| - View [Cookbook](https://github.com/agno-agi/agno/tree/main/cookbook/knowledge/embedders/vllm_embedder.py) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just repurpose client_params? Feels like conceptually the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm_kwargsconfigures the vLLM's LLM class for local eg:{"disable_sliding_window": True, "max_model_len": 4096}whereas
client_paramsis used to configure the OpenAIClient class in remote mode. eg:{"timeout": 30, "max_retries": 3}They serve diff purpose.