Speechmatics vCon Link

A vCon server link for transcribing audio dialogs using the Speechmatics API. This link processes vCon objects and stores transcription results in both Speechmatics native format and the standardized World Transcription Format (WTF).

Features

Batch transcription of audio dialogs via Speechmatics API
Dual output formats:
- Speechmatics native JSON format (preserves all provider-specific data)
- WTF format (standardized transcription schema for interoperability)
Speaker diarization support
Automatic language detection
Idempotent processing (skip already transcribed dialogs)
Retry logic with exponential backoff
Comprehensive logging

Installation

Install directly from GitHub:

pip install git+https://github.com/yourusername/speechmatics-link.git

Install from a specific version/tag:

pip install git+https://github.com/yourusername/[email protected]

Install for development:

git clone https://github.com/yourusername/speechmatics-link.git
cd speechmatics-link
pip install -e ".[dev]"

Configuration

vCon Server Configuration

Add to your vcon-server config.yml:

links:
  speechmatics:
    module: speechmatics_vcon_link
    pip_name: git+https://github.com/yourusername/speechmatics-link.git@main
    options:
      api_key: ${SPEECHMATICS_API_KEY}
      save_native_format: true
      save_wtf_format: true
      model: "enhanced"
      enable_diarization: false
      skip_if_exists: true

chains:
  transcription_chain:
    links:
      - speechmatics
    ingress_lists:
      - incoming_calls
    storages:
      - postgres
    enabled: 1

Configuration Options

Option	Type	Default	Description
`api_key`	string	`$SPEECHMATICS_API_KEY`	Speechmatics API key (required)
`api_url`	string	`https://asr.api.speechmatics.com/v2`	Speechmatics API base URL
`save_native_format`	bool	`true`	Store transcription in Speechmatics native format
`save_wtf_format`	bool	`true`	Store transcription in WTF format
`model`	string	`"enhanced"`	Transcription model: "standard" or "enhanced"
`language`	string	`null`	Language code (null for auto-detect)
`enable_diarization`	bool	`false`	Enable speaker diarization
`diarization_max_speakers`	int	`null`	Maximum speakers for diarization
`poll_interval`	int	`5`	Seconds between job status checks
`max_poll_attempts`	int	`120`	Maximum polling attempts before timeout
`skip_if_exists`	bool	`true`	Skip dialogs with existing transcription
`redis_host`	string	`"localhost"`	Redis host
`redis_port`	int	`6379`	Redis port
`redis_db`	int	`0`	Redis database number

Environment Variables

Set your Speechmatics API key:

export SPEECHMATICS_API_KEY=your-api-key-here

Output Formats

Speechmatics Native Format

Stored as vCon analysis with type speechmatics_transcription. Contains the complete Speechmatics API response including:

Word-level timing and confidence
Punctuation
Speaker labels (if diarization enabled)
Job metadata

WTF Format

Stored as vCon analysis with type wtf_transcription. Follows the WTF schema defined in draft-howe-vcon-wtf-extension-01:

{
  "transcript": {
    "text": "Complete transcription text...",
    "language": "en-US",
    "duration": 65.2,
    "confidence": 0.95
  },
  "segments": [
    {
      "id": 0,
      "start": 0.5,
      "end": 4.8,
      "text": "Hello, this is Alice from customer service.",
      "confidence": 0.97,
      "speaker": "S1",
      "words": [0, 1, 2, 3, 4, 5, 6]
    }
  ],
  "words": [
    {
      "id": 0,
      "start": 0.5,
      "end": 0.8,
      "text": "Hello",
      "confidence": 0.98,
      "speaker": "S1",
      "is_punctuation": false
    }
  ],
  "speakers": {
    "S1": {
      "id": "S1",
      "label": "Speaker S1",
      "segments": [0, 1],
      "total_time": 4.3,
      "confidence": 0.97
    }
  },
  "metadata": {
    "created_at": "2025-01-02T12:15:30Z",
    "processed_at": "2025-01-02T12:16:35Z",
    "provider": "speechmatics",
    "model": "enhanced",
    "processing_time": 12.5,
    "audio": {
      "duration": 65.2,
      "format": "wav"
    }
  },
  "quality": {
    "audio_quality": "high",
    "average_confidence": 0.95,
    "low_confidence_words": 0,
    "multiple_speakers": true
  },
  "extensions": {
    "speechmatics": {
      "job": { ... },
      "format": "2.9"
    }
  }
}

Usage Examples

Basic Usage

The link automatically processes audio dialogs in vCons:

# The run() function is called by vcon-server
from speechmatics_vcon_link import run

result = run(
    vcon_uuid="your-vcon-uuid",
    link_name="speechmatics",
    opts={
        "api_key": "your-api-key",
        "save_native_format": True,
        "save_wtf_format": True,
    }
)

Using the Client Directly

from speechmatics_vcon_link.client import SpeechmaticsClient, TranscriptionConfig

client = SpeechmaticsClient(api_key="your-api-key")

# Configure transcription
config = TranscriptionConfig(
    language="en",
    operating_point="enhanced",
    enable_diarization=True,
)

# Transcribe audio
result = client.transcribe(
    audio_url="https://example.com/audio.wav",
    config=config,
)

print(result)

Converting to WTF Format

from speechmatics_vcon_link.converter import convert_to_wtf

# Convert Speechmatics response to WTF format
wtf_result = convert_to_wtf(
    speechmatics_response,
    created_at="2025-01-02T12:00:00Z",
    processing_time=12.5,
)

print(wtf_result["transcript"]["text"])

Development

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run with coverage
pytest --cov=speechmatics_vcon_link --cov-report=html

# Run specific test file
pytest tests/test_converter.py -v

Code Formatting

black speechmatics_vcon_link/

Project Structure

speechmatics-link/
  speechmatics_vcon_link/
    __init__.py       # Main link implementation
    client.py         # Speechmatics API client
    converter.py      # WTF format converter
  tests/
    test_link.py      # Link tests
    test_client.py    # Client tests
    test_converter.py # Converter tests
    fixtures/         # Test data
  pyproject.toml      # Package config
  README.md           # This file

Troubleshooting

API Key Issues

Ensure SPEECHMATICS_API_KEY is set or api_key is provided in options
Verify your API key has batch transcription permissions
Check API key hasn't expired

Audio URL Issues

Audio must be accessible via HTTP/HTTPS URL
Supported formats: WAV, MP3, FLAC, OGG, WebM
URL must be publicly accessible or use signed URLs

Timeout Errors

Increase max_poll_attempts for longer audio files
Check Speechmatics service status
Verify audio file isn't corrupted

Redis Connection Issues

Verify Redis is running and accessible
Check redis_host, redis_port, redis_db settings
Ensure Redis JSON module is available

Module Not Found

Verify package is installed: pip list | grep speechmatics
Check module name in config matches: speechmatics_vcon_link
Restart vcon-server after installation

CI/CD

This project uses GitHub Actions for continuous integration:

Unit Tests: Run on every push and PR (Python 3.12 and 3.13)
Code Formatting: Checked with Black
Integration Tests: Run on main branch pushes (requires SPEECHMATICS_API_KEY secret)
Coverage: Uploaded to Codecov

Setting Up Secrets

For integration tests to run in CI, add the following secrets to your GitHub repository:

SPEECHMATICS_API_KEY - Your Speechmatics API key
CODECOV_TOKEN - (Optional) Codecov token for coverage reports

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests (pytest tests/ -v)
Format code (black speechmatics_vcon_link/ tests/)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speechmatics vCon Link

Features

Installation

Configuration

vCon Server Configuration

Configuration Options

Environment Variables

Output Formats

Speechmatics Native Format

WTF Format

Usage Examples

Basic Usage

Using the Client Directly

Converting to WTF Format

Development

Running Tests

Code Formatting

Project Structure

Troubleshooting

API Key Issues

Audio URL Issues

Timeout Errors

Redis Connection Issues

Module Not Found

CI/CD

Setting Up Secrets

Contributing

License

References

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
speechmatics_vcon_link		speechmatics_vcon_link
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

vcon-dev/speechmatics-link

Folders and files

Latest commit

History

Repository files navigation

Speechmatics vCon Link

Features

Installation

Configuration

vCon Server Configuration

Configuration Options

Environment Variables

Output Formats

Speechmatics Native Format

WTF Format

Usage Examples

Basic Usage

Using the Client Directly

Converting to WTF Format

Development

Running Tests

Code Formatting

Project Structure

Troubleshooting

API Key Issues

Audio URL Issues

Timeout Errors

Redis Connection Issues

Module Not Found

CI/CD

Setting Up Secrets

Contributing

License

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages