-
Notifications
You must be signed in to change notification settings - Fork 22
docs: add vLLM embedder documentation #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
concepts/knowledge/embedder/vllm.mdx
Outdated
| Integrate with Agno's knowledge system: | ||
|
|
||
| ```python | ||
| from agno.knowledge.pdf import PDFKnowledgeBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not at all our syntax ?! That means this is probably vibe coded... And that just means these code snippets aren't tested... which is really bad.
concepts/knowledge/embedder/vllm.mdx
Outdated
| - Enable CPU offloading: `vllm_kwargs={"enforce_eager": False}` | ||
|
|
||
| ### Model Download Issues | ||
| - Models are downloaded from HuggingFace on first use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all the models downloaded from HF?
| from agno.vectordb.pgvector import PgVector | ||
|
|
||
| # Create knowledge base with vLLM embedder (local mode) | ||
| knowledge_base = PDFKnowledgeBase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not at all our syntax.
|
|
||
| ## Code | ||
|
|
||
| ```python cookbook/knowledge/embedders/vllm_embedder.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use cookbook paths anymore
Co-authored-by: Dirk Brand <[email protected]>
Co-authored-by: Dirk Brand <[email protected]>
|
|
||
| <Step title="Run the agent"> | ||
| ```bash | ||
| python cookbook/knowledge/embedders/vllm_embedder.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well
Co-authored-by: Dirk Brand <[email protected]>
concepts/knowledge/embedder/vllm.mdx
Outdated
|
|
||
| ```python | ||
| embedder = VLLMEmbedder( | ||
| base_url="http://localhost:8000/v1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't use localhost, because that wouldn't be typical for production right? Or at least mention that this is for a locally running server
|
|
||
| vLLM Embedder for local and remote embedding models with high-performance inference. | ||
|
|
||
| <Snippet file="embedder-vllm-reference.mdx" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather not use a snippet, they are buggy. Just use the code directly here
| | `enable_batch` | `bool` | Enable batch processing for multiple texts | `False` | | ||
| | `batch_size` | `int` | Number of texts to process per batch | `10` | | ||
| | `enforce_eager` | `bool` | Use eager execution mode (local mode) | `True` | | ||
| | `vllm_kwargs` | `Optional[Dict[str, Any]]` | Additional vLLM engine parameters (local mode) | `None` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just repurpose client_params? Feels like conceptually the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm_kwargs configures the vLLM's LLM class for local eg: {"disable_sliding_window": True, "max_model_len": 4096}
whereas client_params is used to configure the OpenAIClient class in remote mode. eg: {"timeout": 30, "max_retries": 3}
They serve diff purpose.
Related SDK PR: agno-agi/agno#5187