Fix ElevenLabs language code extraction for multilingual ASR #107

louisjoecodes · 2025-10-28T15:20:14Z

Fixes hardcoded language_code="eng" in ElevenLabs transcription that was causing poor performance on multilingual ASR benchmarks.

Changes

Added extract_language_code() helper to parse language from dataset names (e.g., fleurs_fr → fr)
Updated transcribe_with_retry() to accept and use dataset parameter
Replaced hardcoded "eng" with dynamic language extraction

Results (on small sample size)

Before fix (hardcoded "eng"):

French FLEURS: 26.34% WER (leaderboard: 19.75%)
Portuguese FLEURS: 35.98% WER (leaderboard: 22.8%)

After fix (dynamic language code):

French FLEURS: 3.99% WER (85% improvement)
Portuguese FLEURS: 4.55% WER (87% improvement)

Previously, language_code was hardcoded to "eng" for all ElevenLabs transcriptions, causing poor WER on multilingual benchmarks (e.g., French: 26.34% WER, Portuguese: 35.98% WER). This fix: - Extracts language code from dataset name (e.g., "fleurs_fr" → "fr") - Dynamically sets language_code parameter based on dataset - Defaults to "en" for English-only datasets (ami, librispeech, etc.) Test results: - French: 26.34% → 3.99% WER (85% improvement) - Portuguese: 35.98% → 4.55% WER (87% improvement) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ElevenLabs language code extraction for multilingual ASR #107

Fix ElevenLabs language code extraction for multilingual ASR #107

Uh oh!

louisjoecodes commented Oct 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix ElevenLabs language code extraction for multilingual ASR #107

Are you sure you want to change the base?

Fix ElevenLabs language code extraction for multilingual ASR #107

Uh oh!

Conversation

louisjoecodes commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Results (on small sample size)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

louisjoecodes commented Oct 28, 2025 •

edited

Loading