Skip to content

garciadias/audify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Audify

codecov Tests

Convert ebooks and PDFs to audiobooks using AI text-to-speech and translation services.

Audify is a API-based system that transforms written content into high-quality audio using:

  • Kokoro TTS API for natural speech synthesis
  • Ollama + LiteLLM for intelligent translation
  • LLM-powered audiobook generation for engaging audio content

πŸš€ Features

  • πŸ“š Multiple Formats: Convert EPUB ebooks and PDF documents
  • πŸŽ™οΈ Audiobook Creation: Generate audiobook-style content from books using LLM
  • 🌍 Multi-language Support: Translate content
  • 🎡 High-Quality TTS: Natural-sounding speech via Kokoro API
  • βš™οΈ Flexible Configuration: Environment-based settings

πŸ“‹ Prerequisites

  • Python 3.10-3.13
  • UV package manager (installation guide)
  • Docker & Docker Compose (for API services)
  • CUDA-capable GPU (recommended for optimal performance)

🐳 Quick Start with Docker (Recommended)

1. Clone and Setup

git clone https://github.com/garciadias/audify.git
cd audify

2. Start API Services

# Start Kokoro TTS and Ollama services
docker compose up -d

# Wait for services to be ready (~2-3 minutes)
# Check status: docker compose ps

3. Install Python Dependencies

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

4. Setup Ollama Models

# Pull required models for translation and audiobook generation
docker compose exec ollama ollama pull qwen3:30b

# Or use lighter models for testing:
# docker compose exec ollama ollama pull llama3.2:3b

5. Convert Your First Book

# Convert EPUB to audiobook
task run path/to/your/book.epub

# Convert PDF to audiobook
task run path/to/your/document.pdf

# Create audiobook from EPUB
task audiobook path/to/your/book.epub

πŸ“– Usage Examples

Basic Audiobook Conversion

# English EPUB to audiobook
task run "book.epub"

# PDF with specific language
task run "document.pdf" --language pt

# With translation (English to Spanish)
task run "book.epub" --language en --translate es

Audiobook Generation

# Create audiobook from EPUB
task audiobook "book.epub"

# Limit to first 5 chapters
task audiobook "book.epub" --max-chapters 5

# Custom voice and language
task audiobook "book.epub" --voice af_bella --language en

# With translation
task audiobook "book.epub" --translate pt

Advanced Options

# List available languages
task run --list-languages

# List available TTS models
task run --list-models

# Save extracted text
task run "book.epub" --save-text

# Skip confirmation prompts
task run "book.epub" -y

βš™οΈ Configuration

Environment Variables

# Kokoro TTS API
export KOKORO_API_URL="http://localhost:8887/v1/audio"

# Ollama Configuration
export OLLAMA_API_BASE_URL="http://localhost:11434"
export OLLAMA_TRANSLATION_MODEL="qwen3:30b"
export OLLAMA_MODEL="qwen3:30b"

Docker Services

The docker-compose.yml configures:

  • Kokoro TTS: Port 8887 (GPU-accelerated speech synthesis)
  • Ollama: Port 11434 (LLM for translation and audiobook generation)

πŸ“ Output Structure

data/output/
β”œβ”€β”€ [book_name]/
β”‚   β”œβ”€β”€ chapters.txt           # Book metadata
β”‚   β”œβ”€β”€ cover.jpg              # Book cover image
β”‚   β”œβ”€β”€ chapters_001.mp3       # Individual chapter audio
β”‚   β”œβ”€β”€ chapters_002.mp3
β”‚   β”œβ”€β”€ chapters_003.mp3
β”‚   β”œβ”€β”€ ...                    # More chapters
β”‚   └── book_name.m4b          # Final audiobook
β”‚
└── audiobooks/
    └── [book_name]/
        β”œβ”€β”€ episode_01.mp3     # Audiobook episodes
        β”œβ”€β”€ episode_02.mp3
        └── scripts/           # Generated scripts

πŸ› οΈ Development

Available Tasks

task test      # Run tests with coverage
task format    # Format code with ruff
task run       # Convert ebook to audiobook
task audiobook   # Create audiobook from content
task up        # Start Docker services

Local Development Setup

# Install development dependencies
uv sync --group dev

# Run tests
task test

# Format code
task format

# Type checking (included in pre_test)
mypy ./audify ./tests --ignore-missing-imports

πŸ—οΈ Architecture

Audify uses a modern microservices architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   Audify CLI            Kokoro              Ollama    
                         TTS API             LLM API   
                                                       
 β€’ EPUB/PDF Read         β€’ Speech           β€’ Translation
 β€’ Text Process            Synthesis        β€’ Audiobook scripts
 β€’ Audio Combine         β€’ Multi-voice      
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

  • Text Extraction: EPUB/PDF parsing with chapter detection
  • Translation: LiteLLM + Ollama for high-quality translation
  • TTS: Kokoro API for natural speech synthesis
  • Audiobook Generation: LLM-powered script creation
  • Audio Processing: Pydub for format conversion and combining

🌍 Supported Languages

Primary: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Hungarian, Korean, Japanese, Hindi

Translation: Any language pair supported by your Ollama model

πŸ”§ Troubleshooting

Common Issues

Services not responding:

# Check service status
docker compose ps

# Restart services
docker compose restart

# Check logs
docker compose logs kokoro
docker compose logs ollama

Ollama model not found:

# List available models
docker compose exec ollama ollama list

# Pull required model
docker compose exec ollama ollama pull qwen3:30b

GPU issues:

# Check GPU availability
docker compose exec kokoro nvidia-smi

# If no GPU, services will run on CPU (slower)

Performance Tips

  • Use SSD storage for model caching
  • Ensure adequate GPU memory (8GB+ recommended)
  • Use lighter models for testing: llama3.2:3b instead of magistral:24b
  • Consider running services on separate machines for large workloads

πŸ“š Examples

Check the examples/ directory for sample usage patterns and configuration files.

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: task test
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •