English | δΈζ
An AI-powered video transcription and summarization tool that supports multiple video platforms including YouTube, Tiktok, Bilibili, and 30+ platforms.
- π₯ Multi-Platform Support: Works with YouTube, Tiktok, Bilibili, and 30+ more
- π£οΈ Intelligent Transcription: High-accuracy speech-to-text using Faster-Whisper
- π€ AI Text Optimization: Automatic typo correction, sentence completion, and intelligent paragraphing
- π Multi-Language Summaries: Generate intelligent summaries in multiple languages
- β‘ Real-Time Progress: Live progress tracking and status updates
- βοΈ Conditional Translation: When the selected summary language differs from the detected transcript language, the system auto-translates with GPTβ4o
- π± Mobile-Friendly: Perfect support for mobile devices
- Python 3.8+
- FFmpeg
- Optional: OpenAI API key (for AI summary features)
# Clone the repository
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber
# Run installation script
chmod +x install.sh
./install.sh# Clone the repository
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber
# Using Docker Compose (easiest)
cp .env.example .env
# Edit .env file and set your OPENAI_API_KEY
docker-compose up -d
# Or using Docker directly
docker build -t ai-video-transcriber .
docker run -p 8000:8000 -e OPENAI_API_KEY="your_api_key_here" ai-video-transcriber- Install Python Dependencies
# macOS (PEP 668) strongly recommends using a virtualenv
python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt- Install FFmpeg
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# CentOS/RHEL
sudo yum install ffmpeg- Configure Environment Variables
# Required for AI summary/translation features
export OPENAI_API_KEY="your_api_key_here"
# Optional: only if you use a custom OpenAI-compatible gateway
export OPENAI_BASE_URL="https://oneapi.basevec.com/v1"python3 start.pyAfter the service starts, open your browser and visit http://localhost:8000
To avoid SSE disconnections during long processing, start in production mode (hot-reload disabled):
python3 start.py --prodThis keeps the SSE connection stable throughout long tasks (30β60+ min).
source venv/bin/activate
export OPENAI_API_KEY=your_api_key_here
# export OPENAI_BASE_URL=https://oneapi.basevec.com/v1 # if using a custom endpoint
python3 start.py --prod- Enter Video URL: Paste a video link from YouTube, Bilibili, or other supported platforms
- Select Summary Language: Choose the language for the generated summary
- Start Processing: Click the "Start" button
- Monitor Progress: Watch real-time progress through multiple stages:
- Video download and parsing
- Audio transcription with Faster-Whisper
- AI-powered transcript optimization (typo correction, sentence completion, intelligent paragraphing)
- AI summary generation in selected language
- View Results: Review the optimized transcript and intelligent summary
- If transcript language β selected summary language, a third tab βTranslationβ is shown containing a translated transcript
- Download Files: Click download buttons to save Markdown-formatted files (Transcript / Translation / Summary)
- FastAPI: Modern Python web framework
- yt-dlp: Video downloading and processing
- Faster-Whisper: Efficient speech transcription
- OpenAI API: Intelligent text summarization
- HTML5 + CSS3: Responsive interface design
- JavaScript (ES6+): Modern frontend interactions
- Marked.js: Markdown rendering
- Font Awesome: Icon library
AI-Video-Transcriber/
βββ backend/ # Backend code
β βββ main.py # FastAPI main application
β βββ video_processor.py # Video processing module
β βββ transcriber.py # Transcription module
β βββ summarizer.py # Summary module
β βββ translator.py # Translation module
βββ static/ # Frontend files
β βββ index.html # Main page
β βββ app.js # Frontend logic
βββ temp/ # Temporary files directory
βββ Dockerfile # Docker image configuration
βββ docker-compose.yml # Docker Compose configuration
βββ .dockerignore # Docker ignore rules
βββ .env.example # Environment variables template
βββ requirements.txt # Python dependencies
βββ start.py # Startup script
βββ README.md # Project documentation
| Variable | Description | Default | Required |
|---|---|---|---|
OPENAI_API_KEY |
OpenAI API key | - | Yes (for AI features) |
HOST |
Server address | 0.0.0.0 |
No |
PORT |
Server port | 8000 |
No |
WHISPER_MODEL_SIZE |
Whisper model size | base |
No |
| Model | Parameters | English-only | Multilingual | Speed | Memory Usage |
|---|---|---|---|---|---|
| tiny | 39 M | β | β | Fast | Low |
| base | 74 M | β | β | Medium | Low |
| small | 244 M | β | β | Medium | Medium |
| medium | 769 M | β | β | Slow | Medium |
| large | 1550 M | β | β | Very Slow | High |
A: Transcription speed depends on video length, Whisper model size, and hardware performance. Try using smaller models (like tiny or base) to improve speed.
A: All platforms supported by yt-dlp, including but not limited to: YouTube, TikTok, Facebook, Instagram, Twitter, Bilibili, Youku, iQiyi, Tencent Video, etc.
A: Both transcript optimization and summary generation require an OpenAI API key. Without it, the system provides the raw transcript from Whisper and a simplified summary.
A: In most cases this is an environment configuration issue rather than a code bug. Please check:
- Ensure a virtualenv is activated:
source venv/bin/activate - Install deps inside the venv:
pip install -r requirements.txt - Set
OPENAI_API_KEY(required for summary/translation) - If using a custom gateway, set
OPENAI_BASE_URLcorrectly and ensure network access - Install FFmpeg:
brew install ffmpeg(macOS) /sudo apt install ffmpeg(Debian/Ubuntu) - If port 8000 is occupied, stop the old process or change
PORT
A: The system can process videos of any length, but processing time will increase accordingly. For very long videos, consider using smaller Whisper models.
A: Docker provides the easiest deployment method:
Prerequisites:
- Install Docker Desktop from https://www.docker.com/products/docker-desktop/
- Ensure Docker service is running
Quick Start:
# Clone and setup
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber
cp .env.example .env
# Edit .env file to set your OPENAI_API_KEY
# Start with Docker Compose (recommended)
docker-compose up -d
# Or build and run manually
docker build -t ai-video-transcriber .
docker run -p 8000:8000 --env-file .env ai-video-transcriberCommon Docker Issues:
- Port conflict: Change port mapping
-p 8001:8000if 8000 is occupied - Permission denied: Ensure Docker Desktop is running and you have proper permissions
- Build fails: Check disk space (need ~2GB free) and network connection
- Container won't start: Verify .env file exists and contains valid OPENAI_API_KEY
Docker Commands:
# View running containers
docker ps
# Check container logs
docker logs ai-video-transcriber-ai-video-transcriber-1
# Stop service
docker-compose down
# Rebuild after changes
docker-compose build --no-cacheA: Memory usage varies depending on the deployment method and workload:
Docker Deployment:
- Base memory: ~128MB for idle container
- During processing: 500MB - 2GB depending on video length and Whisper model
- Docker image size: ~1.6GB disk space required
- Recommended: 4GB+ RAM for smooth operation
Traditional Deployment:
- Base memory: ~50-100MB for FastAPI server
- Whisper models memory usage:
tiny: ~150MBbase: ~250MBsmall: ~750MBmedium: ~1.5GBlarge: ~3GB
- Peak usage: Base + Model + Video processing (~500MB additional)
Memory Optimization Tips:
# Use smaller Whisper model to reduce memory usage
WHISPER_MODEL_SIZE=tiny # or base
# For Docker, limit container memory if needed
docker run -m 1g -p 8000:8000 --env-file .env ai-video-transcriber
# Monitor memory usage
docker stats ai-video-transcriber-ai-video-transcriber-1A: If you encounter network-related errors during video downloading or API calls, try these solutions:
Common Network Issues:
- Video download fails with "Unable to extract" or timeout errors
- OpenAI API calls return connection timeout or DNS resolution failures
- Docker image pull fails or is extremely slow
Solutions:
- Switch VPN/Proxy: Try connecting to a different VPN server or switch your proxy settings
- Check Network Stability: Ensure your internet connection is stable
- Retry After Network Change: Wait 30-60 seconds after changing network settings before retrying
- Use Alternative Endpoints: If using custom OpenAI endpoints, verify they're accessible from your network
- Docker Network Issues: Restart Docker Desktop if container networking fails
Quick Network Test:
# Test video platform access
curl -I https://www.youtube.com/
# Test OpenAI API access (replace with your endpoint)
curl -I https://api.openai.com
# Test Docker Hub access
docker pull hello-world- Supports 100+ languages through Whisper
- Automatic language detection
- High accuracy for major languages
- English
- Chinese (Simplified)
- Japanese
- Korean
- Spanish
- French
- German
- Portuguese
- Russian
- Arabic
- And more...
-
Hardware Requirements:
- Minimum: 4GB RAM, dual-core CPU
- Recommended: 8GB RAM, quad-core CPU
- Ideal: 16GB RAM, multi-core CPU, SSD storage
-
Processing Time Estimates:
Video Length Estimated Time Notes 1 minute 30s-1 minute Depends on network and hardware 5 minutes 2-5 minutes Recommended for first-time testing 15 minutes 5-15 minutes Suitable for regular use
We welcome Issues and Pull Requests!
- Fork the project
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- yt-dlp - Powerful video downloading tool
- Faster-Whisper - Efficient Whisper implementation
- FastAPI - Modern Python web framework
- OpenAI - Intelligent text processing API
For questions or suggestions, please submit an Issue or contact Wendy.
If you find this project helpful, please consider giving it a star!
