Audify

Convert ebooks and PDFs to audiobooks using AI text-to-speech and translation services.

Audify is a API-based system that transforms written content into high-quality audio using:

Kokoro TTS API for natural speech synthesis
Ollama + LiteLLM for intelligent translation
LLM-powered audiobook generation for engaging audio content

🚀 Features

📚 Multiple Formats: Convert EPUB ebooks and PDF documents
🎙️ Audiobook Creation: Generate audiobook-style content from books using LLM
🌍 Multi-language Support: Translate content
🎵 High-Quality TTS: Natural-sounding speech via Kokoro API
⚙️ Flexible Configuration: Environment-based settings

📋 Prerequisites

Python 3.10-3.13
UV package manager (installation guide)
Docker & Docker Compose (for API services)
CUDA-capable GPU (recommended for optimal performance)

🐳 Quick Start with Docker (Recommended)

1. Clone and Setup

git clone https://github.com/garciadias/audify.git
cd audify

2. Start API Services

# Start Kokoro TTS and Ollama services
docker compose up -d

# Wait for services to be ready (~2-3 minutes)
# Check status: docker compose ps

3. Install Python Dependencies

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

4. Setup Ollama Models

# Pull required models for translation and audiobook generation
docker compose exec ollama ollama pull qwen3:30b

# Or use lighter models for testing:
# docker compose exec ollama ollama pull llama3.2:3b

5. Convert Your First Book

# Convert EPUB to audiobook
task run path/to/your/book.epub

# Convert PDF to audiobook
task run path/to/your/document.pdf

# Create audiobook from EPUB
task audiobook path/to/your/book.epub

📖 Usage Examples

Basic Audiobook Conversion

# English EPUB to audiobook
task run "book.epub"

# PDF with specific language
task run "document.pdf" --language pt

# With translation (English to Spanish)
task run "book.epub" --language en --translate es

Audiobook Generation

# Create audiobook from EPUB
task audiobook "book.epub"

# Limit to first 5 chapters
task audiobook "book.epub" --max-chapters 5

# Custom voice and language
task audiobook "book.epub" --voice af_bella --language en

# With translation
task audiobook "book.epub" --translate pt

Advanced Options

# List available languages
task run --list-languages

# List available TTS models
task run --list-models

# Save extracted text
task run "book.epub" --save-text

# Skip confirmation prompts
task run "book.epub" -y

⚙️ Configuration

Environment Variables

# Kokoro TTS API
export KOKORO_API_URL="http://localhost:8887/v1/audio"

# Ollama Configuration
export OLLAMA_API_BASE_URL="http://localhost:11434"
export OLLAMA_TRANSLATION_MODEL="qwen3:30b"
export OLLAMA_MODEL="qwen3:30b"

Docker Services

The docker-compose.yml configures:

Kokoro TTS: Port 8887 (GPU-accelerated speech synthesis)
Ollama: Port 11434 (LLM for translation and audiobook generation)

📁 Output Structure

data/output/
├── [book_name]/
│   ├── chapters.txt           # Book metadata
│   ├── cover.jpg              # Book cover image
│   ├── chapters_001.mp3       # Individual chapter audio
│   ├── chapters_002.mp3
│   ├── chapters_003.mp3
│   ├── ...                    # More chapters
│   └── book_name.m4b          # Final audiobook
│
└── audiobooks/
    └── [book_name]/
        ├── episode_01.mp3     # Audiobook episodes
        ├── episode_02.mp3
        └── scripts/           # Generated scripts

🛠️ Development

Available Tasks

task test      # Run tests with coverage
task format    # Format code with ruff
task run       # Convert ebook to audiobook
task audiobook   # Create audiobook from content
task up        # Start Docker services

Local Development Setup

# Install development dependencies
uv sync --group dev

# Run tests
task test

# Format code
task format

# Type checking (included in pre_test)
mypy ./audify ./tests --ignore-missing-imports

🏗️ Architecture

Audify uses a modern microservices architecture:

┌─────────────────┐    ┌──────────────┐    ┌──────────────────┐
   Audify CLI            Kokoro              Ollama    
                         TTS API             LLM API   
                                                       
 • EPUB/PDF Read         • Speech           • Translation
 • Text Process            Synthesis        • Audiobook scripts
 • Audio Combine         • Multi-voice      
└─────────────────┘    └──────────────┘    └──────────────────┘

Key Components

Text Extraction: EPUB/PDF parsing with chapter detection
Translation: LiteLLM + Ollama for high-quality translation
TTS: Kokoro API for natural speech synthesis
Audiobook Generation: LLM-powered script creation
Audio Processing: Pydub for format conversion and combining

🌍 Supported Languages

Primary: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Hungarian, Korean, Japanese, Hindi

Translation: Any language pair supported by your Ollama model

🔧 Troubleshooting

Common Issues

Services not responding:

# Check service status
docker compose ps

# Restart services
docker compose restart

# Check logs
docker compose logs kokoro
docker compose logs ollama

Ollama model not found:

# List available models
docker compose exec ollama ollama list

# Pull required model
docker compose exec ollama ollama pull qwen3:30b

GPU issues:

# Check GPU availability
docker compose exec kokoro nvidia-smi

# If no GPU, services will run on CPU (slower)

Performance Tips

Use SSD storage for model caching
Ensure adequate GPU memory (8GB+ recommended)
Use lighter models for testing: llama3.2:3b instead of magistral:24b
Consider running services on separate machines for large workloads

📚 Examples

Check the examples/ directory for sample usage patterns and configuration files.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

Fork the repository
Create a feature branch
Make your changes
Run tests: task test
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Kokoro TTS for high-quality speech synthesis
Kokoro-FastAPI accessible kokoro via FastAPI
Ollama for local LLM inference
LiteLLM for unified LLM API interface

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.github/workflows		.github/workflows
.vscode		.vscode
audify		audify
docs		docs
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
docker-compose.yml		docker-compose.yml
metadata.txt		metadata.txt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

garciadias/audify

Folders and files

Latest commit

History

Repository files navigation

Audify

🚀 Features

📋 Prerequisites

🐳 Quick Start with Docker (Recommended)

1. Clone and Setup

2. Start API Services

3. Install Python Dependencies

4. Setup Ollama Models

5. Convert Your First Book

📖 Usage Examples

Basic Audiobook Conversion

Audiobook Generation

Advanced Options

⚙️ Configuration

Environment Variables

Docker Services

📁 Output Structure

🛠️ Development

Available Tasks

Local Development Setup

🏗️ Architecture

Key Components

🌍 Supported Languages

🔧 Troubleshooting

Common Issues

Performance Tips

📚 Examples

🤝 Contributing

Development Workflow

📄 License

🙏 Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages