Skip to content

Conversation

@uzaxirr
Copy link
Contributor

@uzaxirr uzaxirr commented Oct 28, 2025

Related SDK PR: agno-agi/agno#5187

Integrate with Agno's knowledge system:

```python
from agno.knowledge.pdf import PDFKnowledgeBase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not at all our syntax ?! That means this is probably vibe coded... And that just means these code snippets aren't tested... which is really bad.

- Enable CPU offloading: `vllm_kwargs={"enforce_eager": False}`

### Model Download Issues
- Models are downloaded from HuggingFace on first use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the models downloaded from HF?

from agno.vectordb.pgvector import PgVector

# Create knowledge base with vLLM embedder (local mode)
knowledge_base = PDFKnowledgeBase(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not at all our syntax.


## Code

```python cookbook/knowledge/embedders/vllm_embedder.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use cookbook paths anymore


<Step title="Run the agent">
```bash
python cookbook/knowledge/embedders/vllm_embedder.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well


```python
embedder = VLLMEmbedder(
base_url="http://localhost:8000/v1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use localhost, because that wouldn't be typical for production right? Or at least mention that this is for a locally running server


vLLM Embedder for local and remote embedding models with high-performance inference.

<Snippet file="embedder-vllm-reference.mdx" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather not use a snippet, they are buggy. Just use the code directly here

| `enable_batch` | `bool` | Enable batch processing for multiple texts | `False` |
| `batch_size` | `int` | Number of texts to process per batch | `10` |
| `enforce_eager` | `bool` | Use eager execution mode (local mode) | `True` |
| `vllm_kwargs` | `Optional[Dict[str, Any]]` | Additional vLLM engine parameters (local mode) | `None` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just repurpose client_params? Feels like conceptually the same thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm_kwargs configures the vLLM's LLM class for local eg: {"disable_sliding_window": True, "max_model_len": 4096}
whereas client_params is used to configure the OpenAIClient class in remote mode. eg: {"timeout": 30, "max_retries": 3}

They serve diff purpose.

@uzaxirr uzaxirr self-assigned this Nov 4, 2025
@dirkbrnd dirkbrnd merged commit 526874a into main Nov 4, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants