A Model Context Protocol (MCP) server that provides personal knowledge base with RAG (Retrieval-Augmented Generation) capabilities. Share context across Claude Desktop, Claude Code, VS Code, and Open WebUI.
- Hybrid Storage: SQLite for full-text documents + Qdrant for semantic search
- Rich Metadata: Comprehensive metadata capture for future extensibility
- Dual Transport: stdio (for Claude Desktop/VS Code) + HTTP Streaming (for Open WebUI)
- Forward-Compatible: Strategy pattern allows adding advanced RAG features without refactoring
- Containerized: Runs in Docker, connects to existing Qdrant/Ollama/LiteLLM infrastructure
User Input → MCP Tool
↓
[1] Generate embedding (Ollama)
↓
[2] Store full text + metadata in SQLite
↓
[3] Store vector in Qdrant
↓
Return confirmation
Search Query
↓
[1] Embed query (Ollama)
↓
[2] Search Qdrant (semantic search)
↓
[3] Retrieve full text from SQLite
↓
[4] Generate response (LiteLLM)
↓
Return answer + sources
Store notes, documents, or snippets in the knowledge base.
store_memory(
text="Your content here",
namespace="notes/personal", # Hierarchical organization
tags=["tag1", "tag2"],
title="Optional Title",
category="personal", # work, personal, family
content_type="note" # note, document, snippet
)Semantic search across your knowledge base.
search_memory(
query="What did I learn about X?",
namespace="notes/personal", # Optional filter
limit=5,
content_type="note" # Optional filter
)Ask questions with RAG (retrieval + generation).
ask_with_context(
question="What are my thoughts on X?",
namespace="notes/personal", # Optional filter
limit=5 # Context chunks to retrieve
)personal-rag-mcp/
├── Dockerfile
├── requirements.txt
├── README.md
├── config/
│ ├── pipeline.yaml # RAG pipeline config
│ └── server.yaml # Server config
├── personal_rag_mcp/
│ ├── server.py # MCP server entry point
│ ├── storage/
│ │ ├── sqlite_store.py # SQLite document storage
│ │ ├── qdrant_store.py # Qdrant vector storage
│ │ └── schema.py # Pydantic metadata models
│ ├── pipeline/
│ │ ├── retriever.py # Retrieval strategies
│ │ ├── reranker.py # Reranking strategies
│ │ ├── expander.py # Query expansion
│ │ ├── generator.py # LLM generation
│ │ └── pipeline.py # RAG orchestration
│ └── utils/
│ ├── embeddings.py # Ollama embedding client
│ └── chunking.py # Text chunking
├── scripts/
│ ├── init_db.py # Initialize database
│ └── backup.py # Backup utility
└── tests/
# Transport
TRANSPORT=http # or stdio
PORT=8765
# Storage
SQLITE_PATH=/app/data/documents.db
QDRANT_URL=http://qdrant:6333
# AI Services
OLLAMA_URL=http://ollama:11434
LITELLM_URL=http://litellm:4000# Create virtual environment
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install dependencies
pip install -r requirements.txtexport SQLITE_PATH=./data/documents.db
export QDRANT_URL=http://localhost:6333
export OLLAMA_URL=http://localhost:11434
export LITELLM_URL=http://localhost:4000
python -m personal_rag_mcp.serverexport TRANSPORT=http
export PORT=8765
python -m personal_rag_mcp.serverThis MCP server depends on the following AI infrastructure services:
- Qdrant (vector database) - Port 6333
- Ollama (embeddings) - Port 11434
- LiteLLM (LLM proxy) - Port 4000/8000
services:
# Required: Qdrant vector database
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
ports:
- "6333:6333"
volumes:
- qdrant-data:/qdrant/storage
restart: unless-stopped
# Required: Ollama for embeddings
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
restart: unless-stopped
# Required: LiteLLM proxy for LLM access
litellm-proxy:
image: ghcr.io/berriai/litellm:main-latest
container_name: litellm-proxy
ports:
- "4080:8000"
volumes:
- ./litellm_config.yaml:/app/config.yaml
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- AWS_REGION=${AWS_REGION}
- OLLAMA_API_BASE=http://ollama:11434
entrypoint: ["litellm", "--config", "/app/config.yaml", "--port", "8000"]
depends_on:
- ollama
restart: unless-stopped
# Personal RAG MCP Server
personal-rag-mcp:
build: ./personal-rag-mcp
container_name: personal-rag-mcp
ports:
- "8765:8765"
environment:
- TRANSPORT=http
- PORT=8765
- QDRANT_URL=http://qdrant:6333
- OLLAMA_URL=http://ollama:11434
- LITELLM_URL=http://litellm-proxy:8000
- OPENAI_API_KEY=${LITELLM_API_KEY} # LiteLLM auth
- SQLITE_PATH=/app/data/documents.db
volumes:
- personal-rag-data:/app/data
- ./config/personal-rag:/app/config:ro
depends_on:
- qdrant
- ollama
- litellm-proxy
restart: unless-stopped
volumes:
qdrant-data:
ollama-data:
personal-rag-data:The MCP server uses LiteLLM as a unified proxy, which means you can use any LLM provider:
- Local: Ollama (llama3, deepseek, qwen, etc.)
- Cloud: OpenAI, Anthropic Claude, Google Gemini, Cohere
- AWS Bedrock: Claude, Llama, Mistral, etc.
- Azure OpenAI: GPT-4, GPT-3.5
- 100+ other providers: See LiteLLM docs
Simply configure your preferred models in litellm_config.yaml:
model_list:
# Local Ollama models (no API key needed)
- model_name: deepseek-r1-1.5b
litellm_params:
model: ollama/deepseek-r1:1.5b
api_base: http://ollama:11434
# AWS Bedrock models
- model_name: bedrock-claude-3-5-sonnet-v2
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20241022-v2:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-2
# OpenAI models
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
# Anthropic Claude
- model_name: claude-3-5-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
# Embedding model (for semantic search)
- model_name: nomic-embed-text
litellm_params:
model: ollama/nomic-embed-text
api_base: http://ollama:11434
general_settings:
master_key: sk-1234 # Set LITELLM_API_KEY in .envThe server defaults to using whatever model is configured in LiteLLM. You can easily switch between local and cloud models without changing the MCP server code.
# LiteLLM API Key
LITELLM_API_KEY=sk-1234
# AWS Credentials (optional, for Bedrock models)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-2-
Pull required Ollama models:
docker exec ollama ollama pull nomic-embed-text docker exec ollama ollama pull deepseek-r1:1.5b
-
Verify services are running:
curl http://localhost:6333/collections # Qdrant curl http://localhost:11434/api/tags # Ollama curl -H "Authorization: Bearer sk-1234" http://localhost:4080/v1/models # LiteLLM
-
Test the MCP server:
docker exec personal-rag-mcp python /app/scripts/test_e2e.py
For complete infrastructure setup, see the parent repository.
- ✅ Hybrid SQLite + Qdrant storage
- ✅ Basic RAG pipeline (vector retrieval)
- ✅ MCP tools (store, search, ask)
- ✅ Dual transport (stdio + HTTP)
- Advanced RAG features (reranking, hybrid search, query expansion)
- Bulk document ingestion (PDF, DOCX parsing)
- Conversation history capture
- Multi-user support with authentication
MIT