Convert ebooks and PDFs to audiobooks using AI text-to-speech and translation services.
Audify is a API-based system that transforms written content into high-quality audio using:
- Kokoro TTS API for natural speech synthesis
- Ollama + LiteLLM for intelligent translation
- LLM-powered audiobook generation for engaging audio content
- π Multiple Formats: Convert EPUB ebooks and PDF documents
- ποΈ Audiobook Creation: Generate audiobook-style content from books using LLM
- π Multi-language Support: Translate content
- π΅ High-Quality TTS: Natural-sounding speech via Kokoro API
- βοΈ Flexible Configuration: Environment-based settings
- Python 3.10-3.13
- UV package manager (installation guide)
- Docker & Docker Compose (for API services)
- CUDA-capable GPU (recommended for optimal performance)
git clone https://github.com/garciadias/audify.git
cd audify# Start Kokoro TTS and Ollama services
docker compose up -d
# Wait for services to be ready (~2-3 minutes)
# Check status: docker compose ps# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv sync# Pull required models for translation and audiobook generation
docker compose exec ollama ollama pull qwen3:30b
# Or use lighter models for testing:
# docker compose exec ollama ollama pull llama3.2:3b# Convert EPUB to audiobook
task run path/to/your/book.epub
# Convert PDF to audiobook
task run path/to/your/document.pdf
# Create audiobook from EPUB
task audiobook path/to/your/book.epub# English EPUB to audiobook
task run "book.epub"
# PDF with specific language
task run "document.pdf" --language pt
# With translation (English to Spanish)
task run "book.epub" --language en --translate es# Create audiobook from EPUB
task audiobook "book.epub"
# Limit to first 5 chapters
task audiobook "book.epub" --max-chapters 5
# Custom voice and language
task audiobook "book.epub" --voice af_bella --language en
# With translation
task audiobook "book.epub" --translate pt# List available languages
task run --list-languages
# List available TTS models
task run --list-models
# Save extracted text
task run "book.epub" --save-text
# Skip confirmation prompts
task run "book.epub" -y# Kokoro TTS API
export KOKORO_API_URL="http://localhost:8887/v1/audio"
# Ollama Configuration
export OLLAMA_API_BASE_URL="http://localhost:11434"
export OLLAMA_TRANSLATION_MODEL="qwen3:30b"
export OLLAMA_MODEL="qwen3:30b"The docker-compose.yml configures:
- Kokoro TTS: Port 8887 (GPU-accelerated speech synthesis)
- Ollama: Port 11434 (LLM for translation and audiobook generation)
data/output/
βββ [book_name]/
β βββ chapters.txt # Book metadata
β βββ cover.jpg # Book cover image
β βββ chapters_001.mp3 # Individual chapter audio
β βββ chapters_002.mp3
β βββ chapters_003.mp3
β βββ ... # More chapters
β βββ book_name.m4b # Final audiobook
β
βββ audiobooks/
βββ [book_name]/
βββ episode_01.mp3 # Audiobook episodes
βββ episode_02.mp3
βββ scripts/ # Generated scripts
task test # Run tests with coverage
task format # Format code with ruff
task run # Convert ebook to audiobook
task audiobook # Create audiobook from content
task up # Start Docker services# Install development dependencies
uv sync --group dev
# Run tests
task test
# Format code
task format
# Type checking (included in pre_test)
mypy ./audify ./tests --ignore-missing-importsAudify uses a modern microservices architecture:
βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
Audify CLI Kokoro Ollama
TTS API LLM API
β’ EPUB/PDF Read β’ Speech β’ Translation
β’ Text Process Synthesis β’ Audiobook scripts
β’ Audio Combine β’ Multi-voice
βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
- Text Extraction: EPUB/PDF parsing with chapter detection
- Translation: LiteLLM + Ollama for high-quality translation
- TTS: Kokoro API for natural speech synthesis
- Audiobook Generation: LLM-powered script creation
- Audio Processing: Pydub for format conversion and combining
Primary: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Hungarian, Korean, Japanese, Hindi
Translation: Any language pair supported by your Ollama model
Services not responding:
# Check service status
docker compose ps
# Restart services
docker compose restart
# Check logs
docker compose logs kokoro
docker compose logs ollamaOllama model not found:
# List available models
docker compose exec ollama ollama list
# Pull required model
docker compose exec ollama ollama pull qwen3:30bGPU issues:
# Check GPU availability
docker compose exec kokoro nvidia-smi
# If no GPU, services will run on CPU (slower)- Use SSD storage for model caching
- Ensure adequate GPU memory (8GB+ recommended)
- Use lighter models for testing:
llama3.2:3binstead ofmagistral:24b - Consider running services on separate machines for large workloads
Check the examples/ directory for sample usage patterns and configuration files.
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
task test - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Kokoro TTS for high-quality speech synthesis
- Kokoro-FastAPI accessible kokoro via FastAPI
- Ollama for local LLM inference
- LiteLLM for unified LLM API interface