An AI-powered YouTube video analysis tool that transforms video content into actionable insights using RAG (Retrieval-Augmented Generation), LangChain, FAISS, and Google Gemini.
- Automatic Transcript Extraction: Seamlessly extracts transcripts from any YouTube video
- AI-Powered Summarization: Generates concise, intelligent summaries using Google Gemini
- Context-Aware Q&A: Ask questions about video content with RAG-based retrieval
- Semantic Search: FAISS-powered vector similarity search for accurate context retrieval
- Modern UI: Beautiful, responsive Streamlit interface
- Docker Ready: Fully containerized for easy deployment
- Production-Grade: Comprehensive testing, CI/CD, and code quality checks
| Component | Technology |
|---|---|
| LLM | Google Gemini (gemini-2.5-flash) |
| Embeddings | Google Generative AI Embeddings |
| Vector Store | FAISS (Facebook AI Similarity Search) |
| Framework | LangChain |
| Web UI | Streamlit |
| Transcript API | youtube-transcript-api |
| Testing | Pytest, pytest-cov |
| CI/CD | GitHub Actions |
| Containerization | Docker, Docker Compose |
| Code Quality | Black, Flake8, isort, mypy |
- Content Creators: Quickly understand video content before citing
- Researchers: Extract key information from educational videos
- Students: Study from lecture recordings efficiently
- Journalists: Research video content for articles
- Accessibility: Make video content more accessible
graph TB
A[YouTube URL] --> B[Transcript Extractor]
B --> C[Text Processor]
C --> D[Text Chunker]
D --> E[Embedding Generator<br/>Google Gemini]
E --> F[FAISS Vector Store]
G[User Query] --> F
F --> H[Context Retriever]
H --> I[LLM QA Engine<br/>Google Gemini]
I --> J[Answer]
C --> K[Summarizer<br/>Google Gemini]
K --> L[Summary]
youtube-summarizer-rag/
├── .github/
│ └── workflows/
│ ├── ci.yml
│ └── deploy.yml
├── scripts/
│ ├── setup_environment.sh
│ └── benchmark.py
├── docs/
├── tests/
│ ├── __init__.py
│ ├── conftest.py
│ ├── test_youtube_handler.py
│ ├── test_text_processor.py
│ ├── test_rag_chain.py
│ └── test_integration.py
├── src/
│ ├── __init__.py
│ ├── config.py
│ ├── youtube_handler.py
│ ├── text_processor.py
│ ├── vector_store.py
│ ├── rag_chain.py
│ ├── langgraph_workflow.py
│ └── app.py
├── .gitignore
├── .env.example
├── .pre-commit-config.yaml
├── Dockerfile
├── Makefile
├── image
├── LICENSE
├── pyproject.toml
├── README.md
├── requirements.txt
├── run.py
├── setup.py
└── docker-compose.yml
Create a .env file:
# Google API Configuration
GOOGLE_API_KEY=your_google_api_key_here
# Model Configuration
GEMINI_MODEL=gemini-2.5-flash
EMBEDDING_MODEL=models/embedding-001
# Application Settings
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
TOP_K_RESULTS=5
# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0- Python 3.9 or higher
- Google AI Studio API key (Get free key here)
- Git
# Clone the repository
git clone https://github.com/codebywiam/youtube-summarizer-rag.git
cd youtube-summarizer-rag
# Run automated setup script (includes dependency installation and testing)
chmod +x scripts/setup_environment.sh
./scripts/setup_environment.sh
# Activate virtual environment
source venv/bin/activate
# Launch the application
python run.pygit clone https://github.com/codebywiam/youtube-summarizer-rag.git
cd youtube-summarizer-rag
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Create necessary directories
mkdir -p data/vector_store logs
# Run the application
python run.pygit clone https://github.com/codebywiam/youtube-summarizer-rag.git
cd youtube-summarizer-rag
# Copy environment file and add your API key
cp .env.example .env
# Start with Docker Compose
docker-compose up -d
# Application will be available at http://localhost:8501- Open http://localhost:8501
- Enter your Google API key in the sidebar
- Paste a YouTube URL and click Process Video
- Generate summaries or ask questions
from src.summarizer import YouTubeTranscriptExtractor, VideoSummarizer, TranscriptChunker
from src.qa_engine import RAGPipeline
# Initialize
api_key = "your_google_api_key"
url = "https://www.youtube.com/watch?v=VIDEO_ID"
# Extract transcript
extractor = YouTubeTranscriptExtractor()
transcript = extractor.fetch_transcript(url)
processed = extractor.process_transcript(transcript)
# Summarize
summarizer = VideoSummarizer(api_key)
summary = summarizer.summarize(processed)
print(summary)
# Q&A
chunker = TranscriptChunker()
chunks = chunker.chunk_text(processed)
rag_pipeline = RAGPipeline(api_key)
rag_pipeline.index_transcript(chunks)
answer = rag_pipeline.query("What are the main topics?")
print(answer)- Enter YouTube URL
- Click "Load Video" to extract transcript
- Navigate to tabs:
- Summary: Generate AI summary
- Ask Questions: Interactive Q&A
- Transcript: View full transcript
Run the complete test suite:
# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=html
# Run specific test file
pytest tests/test_summarizer.py -v
# Run with specific markers
pytest -m "not slow" -vView coverage report:
open htmlcov/index.html # On macOS
xdg-open htmlcov/index.html # On Linux
start htmlcov/index.html # On WindowsExample Video for Testing:
https://www.youtube.com/watch?v=T-D1KjGDW1M
Let's analyze a video about AI advancements using our summarizer:
Video URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
from src.langgraph_workflow import YouTubeRAGWorkflow
# Initialize the workflow
workflow = YouTubeRAGWorkflow()
# Process the AI breakthroughs video
result = workflow.process_video("https://www.youtube.com/watch?v=dQw4w9WgXcQ")# Get the AI video summary
summary = workflow.process_video("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print("Video Summary:")
print(summary['summary'])This year has seen significant advancements in artificial intelligence, pushing the boundaries of what's possible. Key breakthroughs include:
• Multi-Modal Models: The biggest development is the advent of models like Gemini that seamlessly understand and generate text, images, audio, and video. This integration allows for deeper contextual understanding and unlocks applications like generating recipe videos from fridge ingredients or comprehensive financial summaries from diverse data sources.
• On-Device AI and Efficiency: A major push towards smaller, more specialized models is enabling AI to run directly on consumer devices like smartphones and laptops. Techniques like Quantization and Pruning improve efficiency, enhancing privacy (data stays on device) and speed (near-instantaneous responses for offline applications).
• Autonomous AI Agents: AI agents can now perform complex, multi-step tasks by understanding high-level goals. They achieve this through advanced reasoning and tool-use capabilities, interacting with APIs, browsing the web, and executing code to plan trips or manage other intricate processes.
• Ethical AI and Safety: There's a growing focus on "Constitutional AI," training models with explicit rules to prevent harmful or biased content, ensuring alignment with human values.
• Explainable AI (XAI): Advances in explainability are helping researchers understand why models make certain decisions, which is crucial for high-stakes applications in fields like medicine and law, moving beyond just providing the right answer.
# Ask specific questions about the AI breakthroughs
questions = [
"What are multi-modal models and why are they important?",
"How does on-device AI improve privacy?",
"What capabilities do autonomous AI agents have?",
"Why is Explainable AI important for medicine?",
"What are the main ethical concerns mentioned?"
]
for question in questions:
result = workflow.process_video(
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
question
)
print(f"Q: {question}")
print(f"A: {result['answer']}")
print("-" * 50)Q: What are multi-modal models and why are they important? A: Multi-modal models like Gemini can understand and generate multiple types of data including text, images, audio, and video simultaneously. They're important because they enable deeper contextual understanding and unlock practical applications like generating cooking videos from refrigerator contents or creating comprehensive reports from diverse data sources.
Q: How does on-device AI improve privacy? A: On-device AI enhances privacy by processing data locally on the user's device rather than sending it to cloud servers. This means sensitive information like personal photos, messages, and documents never leaves the device, addressing major privacy concerns associated with cloud-based AI systems.
Q: What capabilities do autonomous AI agents have? A: Autonomous AI agents can perform complex, multi-step tasks by understanding high-level objectives. They utilize advanced reasoning and tool-use capabilities to interact with APIs, browse the web, execute code, and manage intricate processes like trip planning or business workflow automation without constant human supervision.
- Multi-language support
- Batch processing for multiple videos
- Export summaries to PDF/Word
- Video timestamp navigation
- Custom model selection
- Playlist summarization
- Sentiment analysis
- Key topic extraction
- Comparison between multiple videos
- Google for the Gemini API
- Facebook AI Research for FAISS
- LangChain community
- YouTube Transcript API contributors
This project is licensed under the MIT License - see the LICENSE file for details.
