docs: add vLLM embedder documentation #222

uzaxirr · 2025-10-28T01:26:40Z

concepts/knowledge/embedder/vllm.mdx

dirkbrnd · 2025-10-28T03:33:42Z

concepts/knowledge/embedder/vllm.mdx

+Integrate with Agno's knowledge system:
+
+```python
+from agno.knowledge.pdf import PDFKnowledgeBase


This is not at all our syntax ?! That means this is probably vibe coded... And that just means these code snippets aren't tested... which is really bad.

dirkbrnd · 2025-10-28T03:34:06Z

concepts/knowledge/embedder/vllm.mdx

+- Enable CPU offloading: `vllm_kwargs={"enforce_eager": False}`
+
+### Model Download Issues
+- Models are downloaded from HuggingFace on first use


Are all the models downloaded from HF?

dirkbrnd · 2025-10-28T03:34:25Z

examples/concepts/knowledge/embedders/vllm-embedder.mdx

+from agno.vectordb.pgvector import PgVector
+
+# Create knowledge base with vLLM embedder (local mode)
+knowledge_base = PDFKnowledgeBase(


Again, not at all our syntax.

dirkbrnd · 2025-10-28T03:34:34Z

examples/concepts/knowledge/embedders/vllm-embedder.mdx

+
+## Code
+
+```python cookbook/knowledge/embedders/vllm_embedder.py


We don't use cookbook paths anymore

Co-authored-by: Dirk Brand <[email protected]>

dirkbrnd · 2025-10-28T03:35:07Z

examples/concepts/knowledge/embedders/vllm-embedder.mdx

+
+  <Step title="Run the agent">
+    ```bash
+    python cookbook/knowledge/embedders/vllm_embedder.py


Here as well

Co-authored-by: Dirk Brand <[email protected]>

dirkbrnd · 2025-10-28T03:35:48Z

concepts/knowledge/embedder/vllm.mdx

+
+```python
+embedder = VLLMEmbedder(
+    base_url="http://localhost:8000/v1",


I wouldn't use localhost, because that wouldn't be typical for production right? Or at least mention that this is for a locally running server

dirkbrnd · 2025-10-28T03:36:07Z

reference/knowledge/embedder/vllm.mdx

+
+vLLM Embedder for local and remote embedding models with high-performance inference.
+
+<Snippet file="embedder-vllm-reference.mdx" />


Rather not use a snippet, they are buggy. Just use the code directly here

dirkbrnd · 2025-10-28T14:01:22Z

_snippets/embedder-vllm-reference.mdx

+| `enable_batch` | `bool` | Enable batch processing for multiple texts | `False` |
+| `batch_size` | `int` | Number of texts to process per batch | `10` |
+| `enforce_eager` | `bool` | Use eager execution mode (local mode) | `True` |
+| `vllm_kwargs` | `Optional[Dict[str, Any]]` | Additional vLLM engine parameters (local mode) | `None` |


Can't we just repurpose client_params? Feels like conceptually the same thing

vllm_kwargs configures the vLLM's LLM class for local eg: {"disable_sliding_window": True, "max_model_len": 4096}
whereas client_params is used to configure the OpenAIClient class in remote mode. eg: {"timeout": 30, "max_retries": 3}

They serve diff purpose.

concepts/knowledge/embedder/vllm.mdx

docs: add vLLM embedder documentation

f0aa163