Skip to content

wendy7756/AI-Video-Transcriber

Repository files navigation

AI Video Transcriber

English | δΈ­ζ–‡

An AI-powered video transcription and summarization tool that supports multiple video platforms including YouTube, Tiktok, Bilibili, and 30+ platforms.

Interface

✨ Features

  • πŸŽ₯ Multi-Platform Support: Works with YouTube, Tiktok, Bilibili, and 30+ more
  • πŸ—£οΈ Intelligent Transcription: High-accuracy speech-to-text using Faster-Whisper
  • πŸ€– AI Text Optimization: Automatic typo correction, sentence completion, and intelligent paragraphing
  • 🌍 Multi-Language Summaries: Generate intelligent summaries in multiple languages
  • ⚑ Real-Time Progress: Live progress tracking and status updates
  • βš™οΈ Conditional Translation: When the selected summary language differs from the detected transcript language, the system auto-translates with GPT‑4o
  • πŸ“± Mobile-Friendly: Perfect support for mobile devices

Star History Chart

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • FFmpeg
  • Optional: OpenAI API key (for AI summary features)

Installation

Method 1: Automatic Installation

# Clone the repository
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber

# Run installation script
chmod +x install.sh
./install.sh

Method 2: Docker

# Clone the repository
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber

# Using Docker Compose (easiest)
cp .env.example .env
# Edit .env file and set your OPENAI_API_KEY
docker-compose up -d

# Or using Docker directly
docker build -t ai-video-transcriber .
docker run -p 8000:8000 -e OPENAI_API_KEY="your_api_key_here" ai-video-transcriber

Method 3: Manual Installation

  1. Install Python Dependencies
# macOS (PEP 668) strongly recommends using a virtualenv
python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
  1. Install FFmpeg
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# CentOS/RHEL
sudo yum install ffmpeg
  1. Configure Environment Variables
# Required for AI summary/translation features
export OPENAI_API_KEY="your_api_key_here"

# Optional: only if you use a custom OpenAI-compatible gateway
export OPENAI_BASE_URL="https://oneapi.basevec.com/v1"

Start the Service

python3 start.py

After the service starts, open your browser and visit http://localhost:8000

Production Mode (Recommended for long videos)

To avoid SSE disconnections during long processing, start in production mode (hot-reload disabled):

python3 start.py --prod

This keeps the SSE connection stable throughout long tasks (30–60+ min).

Run with explicit env (example)

source venv/bin/activate
export OPENAI_API_KEY=your_api_key_here
# export OPENAI_BASE_URL=https://oneapi.basevec.com/v1   # if using a custom endpoint
python3 start.py --prod

πŸ“– Usage Guide

  1. Enter Video URL: Paste a video link from YouTube, Bilibili, or other supported platforms
  2. Select Summary Language: Choose the language for the generated summary
  3. Start Processing: Click the "Start" button
  4. Monitor Progress: Watch real-time progress through multiple stages:
    • Video download and parsing
    • Audio transcription with Faster-Whisper
    • AI-powered transcript optimization (typo correction, sentence completion, intelligent paragraphing)
    • AI summary generation in selected language
  5. View Results: Review the optimized transcript and intelligent summary
    • If transcript language β‰  selected summary language, a third tab β€œTranslation” is shown containing a translated transcript
  6. Download Files: Click download buttons to save Markdown-formatted files (Transcript / Translation / Summary)

πŸ› οΈ Technical Architecture

Backend Stack

  • FastAPI: Modern Python web framework
  • yt-dlp: Video downloading and processing
  • Faster-Whisper: Efficient speech transcription
  • OpenAI API: Intelligent text summarization

Frontend Stack

  • HTML5 + CSS3: Responsive interface design
  • JavaScript (ES6+): Modern frontend interactions
  • Marked.js: Markdown rendering
  • Font Awesome: Icon library

Project Structure

AI-Video-Transcriber/
β”œβ”€β”€ backend/                 # Backend code
β”‚   β”œβ”€β”€ main.py             # FastAPI main application
β”‚   β”œβ”€β”€ video_processor.py  # Video processing module
β”‚   β”œβ”€β”€ transcriber.py      # Transcription module
β”‚   β”œβ”€β”€ summarizer.py       # Summary module
β”‚   └── translator.py       # Translation module
β”œβ”€β”€ static/                 # Frontend files
β”‚   β”œβ”€β”€ index.html          # Main page
β”‚   └── app.js              # Frontend logic
β”œβ”€β”€ temp/                   # Temporary files directory
β”œβ”€β”€ Dockerfile              # Docker image configuration
β”œβ”€β”€ docker-compose.yml      # Docker Compose configuration
β”œβ”€β”€ .dockerignore           # Docker ignore rules
β”œβ”€β”€ .env.example            # Environment variables template
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ start.py               # Startup script
└── README.md              # Project documentation

βš™οΈ Configuration Options

Environment Variables

Variable Description Default Required
OPENAI_API_KEY OpenAI API key - Yes (for AI features)
HOST Server address 0.0.0.0 No
PORT Server port 8000 No
WHISPER_MODEL_SIZE Whisper model size base No

Whisper Model Size Options

Model Parameters English-only Multilingual Speed Memory Usage
tiny 39 M βœ“ βœ“ Fast Low
base 74 M βœ“ βœ“ Medium Low
small 244 M βœ“ βœ“ Medium Medium
medium 769 M βœ“ βœ“ Slow Medium
large 1550 M βœ— βœ“ Very Slow High

πŸ”§ FAQ

Q: Why is transcription slow?

A: Transcription speed depends on video length, Whisper model size, and hardware performance. Try using smaller models (like tiny or base) to improve speed.

Q: Which video platforms are supported?

A: All platforms supported by yt-dlp, including but not limited to: YouTube, TikTok, Facebook, Instagram, Twitter, Bilibili, Youku, iQiyi, Tencent Video, etc.

Q: What if the AI optimization features are unavailable?

A: Both transcript optimization and summary generation require an OpenAI API key. Without it, the system provides the raw transcript from Whisper and a simplified summary.

Q: I get HTTP 500 errors when starting/using the service. Why?

A: In most cases this is an environment configuration issue rather than a code bug. Please check:

  • Ensure a virtualenv is activated: source venv/bin/activate
  • Install deps inside the venv: pip install -r requirements.txt
  • Set OPENAI_API_KEY (required for summary/translation)
  • If using a custom gateway, set OPENAI_BASE_URL correctly and ensure network access
  • Install FFmpeg: brew install ffmpeg (macOS) / sudo apt install ffmpeg (Debian/Ubuntu)
  • If port 8000 is occupied, stop the old process or change PORT

Q: How to handle long videos?

A: The system can process videos of any length, but processing time will increase accordingly. For very long videos, consider using smaller Whisper models.

Q: How to use Docker for deployment?

A: Docker provides the easiest deployment method:

Prerequisites:

Quick Start:

# Clone and setup
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber
cp .env.example .env
# Edit .env file to set your OPENAI_API_KEY

# Start with Docker Compose (recommended)
docker-compose up -d

# Or build and run manually
docker build -t ai-video-transcriber .
docker run -p 8000:8000 --env-file .env ai-video-transcriber

Common Docker Issues:

  • Port conflict: Change port mapping -p 8001:8000 if 8000 is occupied
  • Permission denied: Ensure Docker Desktop is running and you have proper permissions
  • Build fails: Check disk space (need ~2GB free) and network connection
  • Container won't start: Verify .env file exists and contains valid OPENAI_API_KEY

Docker Commands:

# View running containers
docker ps

# Check container logs
docker logs ai-video-transcriber-ai-video-transcriber-1

# Stop service
docker-compose down

# Rebuild after changes
docker-compose build --no-cache

Q: What are the memory requirements?

A: Memory usage varies depending on the deployment method and workload:

Docker Deployment:

  • Base memory: ~128MB for idle container
  • During processing: 500MB - 2GB depending on video length and Whisper model
  • Docker image size: ~1.6GB disk space required
  • Recommended: 4GB+ RAM for smooth operation

Traditional Deployment:

  • Base memory: ~50-100MB for FastAPI server
  • Whisper models memory usage:
    • tiny: ~150MB
    • base: ~250MB
    • small: ~750MB
    • medium: ~1.5GB
    • large: ~3GB
  • Peak usage: Base + Model + Video processing (~500MB additional)

Memory Optimization Tips:

# Use smaller Whisper model to reduce memory usage
WHISPER_MODEL_SIZE=tiny  # or base

# For Docker, limit container memory if needed
docker run -m 1g -p 8000:8000 --env-file .env ai-video-transcriber

# Monitor memory usage
docker stats ai-video-transcriber-ai-video-transcriber-1

Q: Network connection errors or timeouts?

A: If you encounter network-related errors during video downloading or API calls, try these solutions:

Common Network Issues:

  • Video download fails with "Unable to extract" or timeout errors
  • OpenAI API calls return connection timeout or DNS resolution failures
  • Docker image pull fails or is extremely slow

Solutions:

  1. Switch VPN/Proxy: Try connecting to a different VPN server or switch your proxy settings
  2. Check Network Stability: Ensure your internet connection is stable
  3. Retry After Network Change: Wait 30-60 seconds after changing network settings before retrying
  4. Use Alternative Endpoints: If using custom OpenAI endpoints, verify they're accessible from your network
  5. Docker Network Issues: Restart Docker Desktop if container networking fails

Quick Network Test:

# Test video platform access
curl -I https://www.youtube.com/

# Test OpenAI API access (replace with your endpoint)
curl -I https://api.openai.com

# Test Docker Hub access
docker pull hello-world

🎯 Supported Languages

Transcription

  • Supports 100+ languages through Whisper
  • Automatic language detection
  • High accuracy for major languages

Summary Generation

  • English
  • Chinese (Simplified)
  • Japanese
  • Korean
  • Spanish
  • French
  • German
  • Portuguese
  • Russian
  • Arabic
  • And more...

πŸ“ˆ Performance Tips

  • Hardware Requirements:

    • Minimum: 4GB RAM, dual-core CPU
    • Recommended: 8GB RAM, quad-core CPU
    • Ideal: 16GB RAM, multi-core CPU, SSD storage
  • Processing Time Estimates:

    Video Length Estimated Time Notes
    1 minute 30s-1 minute Depends on network and hardware
    5 minutes 2-5 minutes Recommended for first-time testing
    15 minutes 5-15 minutes Suitable for regular use

🀝 Contributing

We welcome Issues and Pull Requests!

  1. Fork the project
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Acknowledgments

  • yt-dlp - Powerful video downloading tool
  • Faster-Whisper - Efficient Whisper implementation
  • FastAPI - Modern Python web framework
  • OpenAI - Intelligent text processing API

πŸ“ž Contact

For questions or suggestions, please submit an Issue or contact Wendy.

⭐ Star History

If you find this project helpful, please consider giving it a star!

About

Transcribe and summarize video content using AI. Open-source, multi-platform, and supports multiple languages.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published