Skip to content

codebywiam/youtube-summarizer-rag

Repository files navigation

🎥 YouTube AI Analyzer

An AI-powered YouTube video analysis tool that transforms video content into actionable insights using RAG (Retrieval-Augmented Generation), LangChain, FAISS, and Google Gemini.

alt text

Features

  • Automatic Transcript Extraction: Seamlessly extracts transcripts from any YouTube video
  • AI-Powered Summarization: Generates concise, intelligent summaries using Google Gemini
  • Context-Aware Q&A: Ask questions about video content with RAG-based retrieval
  • Semantic Search: FAISS-powered vector similarity search for accurate context retrieval
  • Modern UI: Beautiful, responsive Streamlit interface
  • Docker Ready: Fully containerized for easy deployment
  • Production-Grade: Comprehensive testing, CI/CD, and code quality checks

Technical Stack

Component Technology
LLM Google Gemini (gemini-2.5-flash)
Embeddings Google Generative AI Embeddings
Vector Store FAISS (Facebook AI Similarity Search)
Framework LangChain
Web UI Streamlit
Transcript API youtube-transcript-api
Testing Pytest, pytest-cov
CI/CD GitHub Actions
Containerization Docker, Docker Compose
Code Quality Black, Flake8, isort, mypy

Use Cases

  • Content Creators: Quickly understand video content before citing
  • Researchers: Extract key information from educational videos
  • Students: Study from lecture recordings efficiently
  • Journalists: Research video content for articles
  • Accessibility: Make video content more accessible

Architecture

graph TB
    A[YouTube URL] --> B[Transcript Extractor]
    B --> C[Text Processor]
    C --> D[Text Chunker]
    D --> E[Embedding Generator<br/>Google Gemini]
    E --> F[FAISS Vector Store]
    G[User Query] --> F
    F --> H[Context Retriever]
    H --> I[LLM QA Engine<br/>Google Gemini]
    I --> J[Answer]
    C --> K[Summarizer<br/>Google Gemini]
    K --> L[Summary]
Loading

Project Structure

youtube-summarizer-rag/
├── .github/
│   └── workflows/
│       ├── ci.yml
│       └── deploy.yml
├── scripts/
│   ├── setup_environment.sh
│   └── benchmark.py
├── docs/
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_youtube_handler.py
│   ├── test_text_processor.py
│   ├── test_rag_chain.py
│   └── test_integration.py
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── youtube_handler.py
│   ├── text_processor.py
│   ├── vector_store.py
│   ├── rag_chain.py
│   ├── langgraph_workflow.py
│   └── app.py
├── .gitignore
├── .env.example
├── .pre-commit-config.yaml
├── Dockerfile
├── Makefile
├── image
├── LICENSE
├── pyproject.toml
├── README.md
├── requirements.txt
├── run.py
├── setup.py
└── docker-compose.yml

Configuration

Create a .env file:

# Google API Configuration
GOOGLE_API_KEY=your_google_api_key_here

# Model Configuration
GEMINI_MODEL=gemini-2.5-flash
EMBEDDING_MODEL=models/embedding-001

# Application Settings
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
TOP_K_RESULTS=5

# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0

Quick Start

Prerequisites


Installation

Method 1: Automated Setup (Recommended)

# Clone the repository
git clone https://github.com/codebywiam/youtube-summarizer-rag.git
cd youtube-summarizer-rag

# Run automated setup script (includes dependency installation and testing)
chmod +x scripts/setup_environment.sh
./scripts/setup_environment.sh

# Activate virtual environment
source venv/bin/activate

# Launch the application
python run.py

Method 2: Manual Setup

git clone https://github.com/codebywiam/youtube-summarizer-rag.git
cd youtube-summarizer-rag

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env


# Create necessary directories
mkdir -p data/vector_store logs

# Run the application
python run.py

Method 3: Docker Setup

git clone https://github.com/codebywiam/youtube-summarizer-rag.git
cd youtube-summarizer-rag

# Copy environment file and add your API key
cp .env.example .env

# Start with Docker Compose
docker-compose up -d

# Application will be available at http://localhost:8501

First Run

  1. Open http://localhost:8501
  2. Enter your Google API key in the sidebar
  3. Paste a YouTube URL and click Process Video
  4. Generate summaries or ask questions

Command Line Interface

from src.summarizer import YouTubeTranscriptExtractor, VideoSummarizer, TranscriptChunker
from src.qa_engine import RAGPipeline

# Initialize
api_key = "your_google_api_key"
url = "https://www.youtube.com/watch?v=VIDEO_ID"

# Extract transcript
extractor = YouTubeTranscriptExtractor()
transcript = extractor.fetch_transcript(url)
processed = extractor.process_transcript(transcript)

# Summarize
summarizer = VideoSummarizer(api_key)
summary = summarizer.summarize(processed)
print(summary)

# Q&A
chunker = TranscriptChunker()
chunks = chunker.chunk_text(processed)

rag_pipeline = RAGPipeline(api_key)
rag_pipeline.index_transcript(chunks)

answer = rag_pipeline.query("What are the main topics?")
print(answer)

Web Interface

  1. Enter YouTube URL
  2. Click "Load Video" to extract transcript
  3. Navigate to tabs:
    • Summary: Generate AI summary
    • Ask Questions: Interactive Q&A
    • Transcript: View full transcript

Testing

Run the complete test suite:

# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=html

# Run specific test file
pytest tests/test_summarizer.py -v

# Run with specific markers
pytest -m "not slow" -v

View coverage report:

open htmlcov/index.html  # On macOS
xdg-open htmlcov/index.html  # On Linux
start htmlcov/index.html  # On Windows

Usage example

Example Video for Testing:
https://www.youtube.com/watch?v=T-D1KjGDW1M

Example 1: AI Breakthroughs Analysis

Let's analyze a video about AI advancements using our summarizer:

Video URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ

Step 1: Process the Video

from src.langgraph_workflow import YouTubeRAGWorkflow

# Initialize the workflow
workflow = YouTubeRAGWorkflow()

# Process the AI breakthroughs video
result = workflow.process_video("https://www.youtube.com/watch?v=dQw4w9WgXcQ")

Step: 2 Generate Summary

# Get the AI video summary
summary = workflow.process_video("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print("Video Summary:")
print(summary['summary'])

Expected Summary Output:


2024 AI Breakthroughs: A Transformative Year

This year has seen significant advancements in artificial intelligence, pushing the boundaries of what's possible. Key breakthroughs include:

Multi-Modal Models: The biggest development is the advent of models like Gemini that seamlessly understand and generate text, images, audio, and video. This integration allows for deeper contextual understanding and unlocks applications like generating recipe videos from fridge ingredients or comprehensive financial summaries from diverse data sources.

On-Device AI and Efficiency: A major push towards smaller, more specialized models is enabling AI to run directly on consumer devices like smartphones and laptops. Techniques like Quantization and Pruning improve efficiency, enhancing privacy (data stays on device) and speed (near-instantaneous responses for offline applications).

Autonomous AI Agents: AI agents can now perform complex, multi-step tasks by understanding high-level goals. They achieve this through advanced reasoning and tool-use capabilities, interacting with APIs, browsing the web, and executing code to plan trips or manage other intricate processes.

Ethical AI and Safety: There's a growing focus on "Constitutional AI," training models with explicit rules to prevent harmful or biased content, ensuring alignment with human values.

Explainable AI (XAI): Advances in explainability are helping researchers understand why models make certain decisions, which is crucial for high-stakes applications in fields like medicine and law, moving beyond just providing the right answer.


Step 3: Ask Questions About the Video

# Ask specific questions about the AI breakthroughs
questions = [
    "What are multi-modal models and why are they important?",
    "How does on-device AI improve privacy?",
    "What capabilities do autonomous AI agents have?",
    "Why is Explainable AI important for medicine?",
    "What are the main ethical concerns mentioned?"
]

for question in questions:
    result = workflow.process_video(
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ", 
        question
    )
    print(f"Q: {question}")
    print(f"A: {result['answer']}")
    print("-" * 50)

Sample Q&A Output:

Q: What are multi-modal models and why are they important? A: Multi-modal models like Gemini can understand and generate multiple types of data including text, images, audio, and video simultaneously. They're important because they enable deeper contextual understanding and unlock practical applications like generating cooking videos from refrigerator contents or creating comprehensive reports from diverse data sources.

Q: How does on-device AI improve privacy? A: On-device AI enhances privacy by processing data locally on the user's device rather than sending it to cloud servers. This means sensitive information like personal photos, messages, and documents never leaves the device, addressing major privacy concerns associated with cloud-based AI systems.

Q: What capabilities do autonomous AI agents have? A: Autonomous AI agents can perform complex, multi-step tasks by understanding high-level objectives. They utilize advanced reasoning and tool-use capabilities to interact with APIs, browse the web, execute code, and manage intricate processes like trip planning or business workflow automation without constant human supervision.


Roadmap

  • Multi-language support
  • Batch processing for multiple videos
  • Export summaries to PDF/Word
  • Video timestamp navigation
  • Custom model selection
  • Playlist summarization
  • Sentiment analysis
  • Key topic extraction
  • Comparison between multiple videos

Resources

Acknowledgments

  • Google for the Gemini API
  • Facebook AI Research for FAISS
  • LangChain community
  • YouTube Transcript API contributors

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published