Whisper Parallel CPU Audio & Video Transcriber

A minimal, robust Python package for whisper.cpp with CPU-optimized threading and integrated model management. Transcribe both audio and video files with high performance. Targeting distributed cloud deployments and transcription workflows.

🚀 Quick Start

Install from PyPI:

pip install whisper-parallel-cpu

Use in Python:

import whisper_parallel_cpu

# Transcribe audio files
text = whisper_parallel_cpu.transcribe("audio.mp3", model="base")

# Transcribe video files
text = whisper_parallel_cpu.transcribe("video.mp4", model="base")

# Or use specific functions
text = whisper_parallel_cpu.transcribe_audio("audio.wav", model="small")
text = whisper_parallel_cpu.transcribe_video("video.mkv", model="medium")

Or use the CLI:

# Transcribe audio
whisper_parallel_cpu transcribe audio.mp3 --model base

# Transcribe video
whisper_parallel_cpu transcribe video.mp4 --model base

✨ Features

Native C++/pybind11 speed (CPU & GPU acceleration)
Automatic model download/caching - no manual setup required
Simple Python & CLI interface - just pip install and go
Input: Audio (.mp3, .wav, .flac, .aac, .ogg, .m4a) and video (.mp4, .mkv, .avi, .mov) formats
Output: Transcribed text as a Python string
Benchmarking: Built-in performance testing and optimization tools
Cross-platform: Works on macOS, Linux, and Windows

📦 Installation

From PyPI (Recommended)

pip install whisper-parallel-cpu

From Source (Development)

# Clone the repository
git clone https://github.com/krisfur/whisper-parallel-cpu.git
cd whisper-parallel-cpu

# Install in editable mode
pip install -e .

# Test the installation
python test_transcribe.py video.mp4

🧰 Requirements

System Tools

C++17 compiler (g++, clang++) - automatically handled by pip
cmake (>=3.15) - automatically handled by pip
ffmpeg (for audio extraction)

Note for uv users: If building from source with uv, ensure build-essential (Ubuntu/Debian) or equivalent C++ build tools are installed first.

Install ffmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

Windows: Download from ffmpeg.org or use Chocolatey:

choco install ffmpeg

🧪 Usage

Python API

Basic Usage (Function-based)

import whisper_parallel_cpu

# Transcribe any audio or video file (auto-detects format)
text = whisper_parallel_cpu.transcribe("audio.mp3", model="base", threads=4)
text = whisper_parallel_cpu.transcribe("video.mp4", model="small")

# Use specific functions for audio or video
text = whisper_parallel_cpu.transcribe_audio("audio.wav", model="base", threads=4)
text = whisper_parallel_cpu.transcribe_video("video.mkv", model="medium", threads=8)

# CPU-only mode (no GPU)
text = whisper_parallel_cpu.transcribe("audio.flac", model="base", use_gpu=False)

Advanced Usage (Model Reuse)

For better performance when transcribing multiple files, use the WhisperModel class to load the model once and reuse it:

from whisper_parallel_cpu import WhisperModel

# Create a model instance (model is loaded on first use)
model = WhisperModel(model="base", use_gpu=False, threads=4)

# Transcribe multiple files using the same loaded model
files = ["audio1.mp3", "audio2.wav", "video1.mp4", "video2.mkv"]
for file_path in files:
    text = model.transcribe(file_path)
    print(f"Transcribed {file_path}: {text[:100]}...")

# Use as context manager
with WhisperModel(model="small", use_gpu=True) as model:
    text1 = model.transcribe("audio1.mp3")
    text2 = model.transcribe("audio2.wav")
    # Model is automatically managed

# Memory management
model.clear_contexts()  # Free memory
print(f"Active contexts: {model.get_context_count()}")

Supported File Formats

Audio Formats:

.mp3, .wav, .flac, .aac, .ogg, .m4a, .wma, .opus, .webm, .3gp, .amr, .au, .ra, .mid, .midi

Video Formats:

.mp4, .avi, .mov, .mkv, .wmv, .flv, .webm, .m4v, .3gp, .ogv, .ts, .mts, .m2ts

Available Models

The following models are available and will be downloaded automatically:

Model	Size	Accuracy	Speed	Use Case
`tiny`	74MB	Good	Fastest	Quick transcriptions
`base`	141MB	Better	Fast	General purpose
`small`	444MB	Better	Medium	High accuracy needed
`medium`	1.4GB	Best	Slow	Maximum accuracy
`large`	2.9GB	Best	Slowest	Professional use

Command Line Interface

# List available models
whisper_parallel_cpu list

# Download a specific model
whisper_parallel_cpu download base

# Transcribe audio files
whisper_parallel_cpu transcribe audio.mp3 --model base --threads 4
whisper_parallel_cpu transcribe audio.wav --model small

# Transcribe video files
whisper_parallel_cpu transcribe video.mp4 --model base --threads 4
whisper_parallel_cpu transcribe video.mkv --model medium

# Transcribe without GPU (CPU-only)
whisper_parallel_cpu transcribe audio.flac --model small --no-gpu

Model Management

import whisper_parallel_cpu

# List available models
whisper_parallel_cpu.list_models()

# Download a specific model
whisper_parallel_cpu.download_model("medium")

# Force re-download
whisper_parallel_cpu.download_model("base", force=True)

📊 Benchmarking & Performance

Run Performance Tests

# Test with 5 audio/video copies
python benchmark.py audio.mp3 5
python benchmark.py video.mp4 5

What the Benchmark Tests

Thread Scaling: Tests different thread counts (1, 2, 4, 8, 16, etc.) for single audio/video transcription
Sequential Processing: Measures throughput when processing multiple audio/video files one after another
Parallel Processing: Tests concurrent processing with different numbers of workers
Optimal Configuration: Provides the best settings for your specific hardware

Performance Optimization Tips

Model Reuse: Use WhisperModel class for multiple transcriptions to avoid reloading the model each time
GPU Acceleration: The system automatically uses Metal (macOS) or CUDA (Linux/Windows) when available
Thread Count: Use the benchmark to find optimal thread count for your CPU
Batch Processing: For multiple audio/video files, use parallel processing with ThreadPoolExecutor
Model Size: Smaller models (base, small) are faster but less accurate than larger ones (medium, large)

Model Reuse Performance

When transcribing multiple files, using the WhisperModel class can provide significant performance improvements:

from whisper_parallel_cpu import WhisperModel
import time

# Method 1: Using WhisperModel (model reuse) - FASTER
model = WhisperModel(model="base")
start = time.time()
for file in files:
    text = model.transcribe(file)
model_time = time.time() - start

# Method 2: Using transcribe function (no reuse) - SLOWER
start = time.time()
for file in files:
    text = whisper_parallel_cpu.transcribe(file, model="base")
function_time = time.time() - start

print(f"Speedup with model reuse: {function_time / model_time:.2f}x")

Typical speedups:

2-5x faster for multiple files with the same model
Reduced memory usage through context sharing
Better for batch processing workflows

⚙️ API Reference

`transcribe(file_path, model, threads, use_gpu)`

Transcribes an audio or video file using Whisper. Automatically detects file type.

Parameters:

file_path (str): Path to the audio or video file
model (str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)
threads (int): Number of CPU threads to use (default: 4)
use_gpu (bool): Whether to use GPU acceleration (default: True)

Returns:

str: Transcribed text

`transcribe_audio(audio_path, model, threads, use_gpu)`

Transcribes an audio file using Whisper.

Parameters:

audio_path (str): Path to the audio file
model (str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)
threads (int): Number of CPU threads to use (default: 4)
use_gpu (bool): Whether to use GPU acceleration (default: True)

Returns:

str: Transcribed text

`transcribe_video(video_path, model, threads, use_gpu)`

Transcribes a video file using Whisper.

Parameters:

video_path (str): Path to the video file
model (str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)
threads (int): Number of CPU threads to use (default: 4)
use_gpu (bool): Whether to use GPU acceleration (default: True)

Returns:

str: Transcribed text

Example:

import whisper_parallel_cpu

# Basic usage
text = whisper_parallel_cpu.transcribe_video("sample.mp4")

# Advanced usage
text = whisper_parallel_cpu.transcribe_video(
    "sample.mp4", 
    model="medium", 
    threads=8, 
    use_gpu=False
)

`WhisperModel(model, use_gpu, threads)`

A class for efficient model reuse across multiple transcriptions.

Parameters:

model (str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)
use_gpu (bool): Whether to use GPU acceleration (default: False)
threads (int): Number of CPU threads to use (default: 4)

Methods:

transcribe(file_path): Transcribe any audio or video file
transcribe_audio(audio_path): Transcribe an audio file
transcribe_video(video_path): Transcribe a video file
clear_contexts(): Clear all cached contexts to free memory
get_context_count(): Get number of cached contexts

Example:

from whisper_parallel_cpu import WhisperModel

# Create model instance
model = WhisperModel(model="base", use_gpu=False, threads=4)

# Transcribe multiple files efficiently
files = ["audio1.mp3", "audio2.wav", "video1.mp4"]
for file_path in files:
    text = model.transcribe(file_path)
    print(f"Transcribed: {text[:50]}...")

# Memory management
model.clear_contexts()

`clear_contexts()`

Clear all cached whisper contexts to free memory.

Example:

import whisper_parallel_cpu

# Clear all cached contexts
whisper_parallel_cpu.clear_contexts()

`get_context_count()`

Get the number of currently cached whisper contexts.

Returns:

int: Number of cached contexts

Example:

import whisper_parallel_cpu

# Check how many contexts are cached
count = whisper_parallel_cpu.get_context_count()
print(f"Active contexts: {count}")

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and test thoroughly
Commit your changes: git commit -m 'Add feature'
Push to the branch: git push origin feature-name
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built on whisper.cpp by Georgi Gerganov
Uses pybind11 for Python bindings
Model management inspired by the original OpenAI Whisper project

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
whisper_parallel_cpu		whisper_parallel_cpu
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE.md		RELEASE.md
RELEASE_NOTES_1.2.0.md		RELEASE_NOTES_1.2.0.md
benchmark.py		benchmark.py
example_audio_video.py		example_audio_video.py
pyproject.toml		pyproject.toml
test_audio.py		test_audio.py
test_local_mp4.py		test_local_mp4.py
test_transcribe.py		test_transcribe.py
thread_benchmark.py		thread_benchmark.py

License

krisfur/whisper-parallel-cpu

Folders and files

Latest commit

History

Repository files navigation

Whisper Parallel CPU Audio & Video Transcriber

🚀 Quick Start

✨ Features

📦 Installation

From PyPI (Recommended)

From Source (Development)

🧰 Requirements

System Tools

Install ffmpeg

🧪 Usage

Python API

Basic Usage (Function-based)

Advanced Usage (Model Reuse)

Supported File Formats

Available Models

Command Line Interface

Model Management

📊 Benchmarking & Performance

Run Performance Tests

What the Benchmark Tests

Performance Optimization Tips

Model Reuse Performance

⚙️ API Reference

transcribe(file_path, model, threads, use_gpu)

transcribe_audio(audio_path, model, threads, use_gpu)

transcribe_video(video_path, model, threads, use_gpu)

WhisperModel(model, use_gpu, threads)

clear_contexts()

get_context_count()

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Languages

`transcribe(file_path, model, threads, use_gpu)`

`transcribe_audio(audio_path, model, threads, use_gpu)`

`transcribe_video(video_path, model, threads, use_gpu)`

`WhisperModel(model, use_gpu, threads)`

`clear_contexts()`

`get_context_count()`