Multi-Agent RAG with MLOPS infrastructure

A powerful Multi-agent RAG that combines local Deepseek models with RAG capabilities and integrates multiple models, including Qwen2.5:4b, Llama3.2, and Deepseek. Built using Deepseek (via Ollama), Snowflake for embeddings, Qdrant for vector storage, Langtrace for observability, and Hydra for configuration management, this application offers both simple local chat and advanced RAG-enhanced interactions with comprehensive document processing, web search capabilities, and an end-to-end MLOps pipeline for streamlined deployment and management.

Features

Dual Operation Modes
- Local Chat Mode: Direct interaction with Deepseek locally
- RAG Mode: Enhanced reasoning with document context and web search integration (llama3.2)
Multi-Agent System: Combines various models, including Qwen2.5:4b, Llama3.2, and Deepseek, to enhance reasoning and improve the accuracy of responses.
Document Processing (RAG Mode)
- PDF document upload and processing
- Web page content extraction
- Automatic text chunking and embedding
- Vector storage in Qdrant cloud
Intelligent Querying (RAG Mode)
- RAG-based document retrieval
- Similarity search with threshold filtering
- Automatic fallback to web search
- Source attribution for answers
Advanced Capabilities
- Exa AI web search integration
- Custom domain filtering for web search
- Context-aware response generation
- Chat history management
- Thinking process visualization
MLOps Pipeline
- Code Versioning with GitHub: Managing code changes and collaboration
- Data Versioning with DVC: Tracking changes in large datasets
- Continuous Integration with CML: Automating ML workflows and testing
- Secrets Management: Securing sensitive information
- Configuration Management with Hydra: Organizing and managing configurations
- Observability using Langtrace
- RAG performance tracking using RAGAS
Model Specific Features
- Flexible model selection:
  - Deepseek r1 1.5b (lighter, suitable for most laptops)
  - Deepseek r1 7b (more capable, requires better hardware)
  - Qwen2.5:4b (advanced capabilities for more complex tasks)
  - Llama3.2 (latest model for enhanced reasoning)
- Snowflake Arctic Embedding model (SOTA) for vector embeddings
- Agno Agent framework for orchestration
- Streamlit-based interactive interface

Prerequisites

1. Ollama Setup

Install Ollama
Pull the Deepseek r1 model(s):

# For the lighter model
ollama pull deepseek-r1:1.5b

# For the more capable model (if your hardware supports it)
ollama pull deepseek-r1:7b

ollama pull snowflake-arctic-embed
ollama pull llama3.2
ollama pull qwen2.5:4b

2. Qdrant Cloud Setup (for RAG Mode)

Visit Qdrant Cloud
Create an account or sign in
Create a new cluster
Get your credentials:
- Qdrant API Key: Found in API Keys section
- Qdrant URL: Your cluster URL (format: https://xxx-xxx.cloud.qdrant.io)

3. Exa AI API Key (Optional)

Visit Exa AI
Sign up for an account
Generate an API key for web search capabilities

How to Run

Clone the repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd rag_tutorials/deepseek_local_rag_agent

Install dependencies:

pip install -r requirements.txt

Run the application:

streamlit run deepseek_rag_agent.py

How the Application Works

The Deepseek Local RAG Reasoning Agent uses multiple components to provide advanced reasoning capabilities in RAG Mode. Here's how it works:

Model Selection: You can select between Deepseek r1 1.5b, Deepseek r1 7b, Qwen2.5:4b, and Llama3.2, depending on your hardware capability and task complexity.
Multi-Agent System: Multiple models interact to provide more accurate and flexible reasoning capabilities. This system combines models like Qwen2.5:4b, Llama3.2, and Deepseek.
Document Upload: In RAG Mode, you can upload PDF documents or input URLs to process content for knowledge retrieval. This data is embedded and stored in Qdrant.
Querying: The application enables intelligent querying with the RAG approach to search documents or fallback to web searches using Exa AI.
Web Search Fallback: When relevant documents aren't found, the system automatically switches to web search, using customizable domains (e.g., arxiv.org, wikipedia.org).

The app can be easily managed via the Streamlit interface, allowing toggling between different modes and configurations.

Configuration

RAG Mode: Enable or disable RAG Mode from the sidebar to activate document-based reasoning.
Web Search Fallback: You can toggle web search in case documents aren’t available in the vector database.
Model Version: Choose from Deepseek r1 1.5b, Deepseek r1 7b, Qwen2.5:4b, or Llama3.2 models based on your hardware.

Main Script

The core functionality is in the deepseek_rag_agent.py script, which handles the application’s logic. Here's an overview of the main components:

Langtrace for tracking requests and API calls.
OllamaEmbedder: Embeds documents for RAG-based retrieval.
Qdrant for vector storage and search capabilities.
Agno for agent orchestration.
Streamlit: Provides an interactive web interface for the user to interact with the agent.

The script initializes models, processes inputs (PDFs or URLs), and uses Qdrant to store vectors for document search and retrieval.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.dvc		.dvc
.github/workflows		.github/workflows
__pycache__		__pycache__
config		config
data		data
outputs		outputs
src/rag		src/rag
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
cd		cd
conftest.py		conftest.py
deepseek_rag_agent.py		deepseek_rag_agent.py
rag_test.py		rag_test.py
requirements.txt		requirements.txt
test.py		test.py
tests.dvc		tests.dvc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Agent RAG with MLOPS infrastructure

Features

Prerequisites

1. Ollama Setup

2. Qdrant Cloud Setup (for RAG Mode)

3. Exa AI API Key (Optional)

How to Run

How the Application Works

Configuration

Main Script

About

Uh oh!

Releases

Packages

Uh oh!

Languages

othrou/MLOPS-for-RAG

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent RAG with MLOPS infrastructure

Features

Prerequisites

1. Ollama Setup

2. Qdrant Cloud Setup (for RAG Mode)

3. Exa AI API Key (Optional)

How to Run

How the Application Works

Configuration

Main Script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages