Fuzhou Dialect Speech Recognition

An open-source project dedicated to fine-tuning a Whisper-based speech recognition model for the Fuzhou dialect (福州话).

🚀 Overview

This project builds a robust ASR (Automatic Speech Recognition) system for the Fuzhou dialect—a regional Chinese language with limited existing NLP support. We leverage OpenAI's Whisper architecture to create an end-to-end pipeline from data collection to model deployment.

🎯 Key Features

Comprehensive Dataset - Professionally curated audio samples with accurate transcriptions
State-of-the-Art Models - Fine-tuned Whisper variants optimized for the Fuzhou dialect
User-Friendly Interface - Interactive web application for immediate speech recognition

📋 Project Milestones

Dataset Construction

Collect diverse Fuzhou dialect audio/video content
Implement OCR workflow for subtitle extraction and alignment
Release curated training dataset: i18nJack/fuzhouhua

Model Development

Fine-tune Whisper architecture for Fuzhou dialect recognition
Release models in various sizes for different application requirements
- Release i18nJack/fuzhou-whisper-large-v3
Optimize for latency and resource-constrained environments

Deployment

Develop web-based interface for real-time speech recognition
Create API endpoints for third-party integration
Publish comprehensive documentation and usage examples

🛠️ Quick Start

Installation

# Clone the repository
git clone [email protected]:pangahn/fuzhouhua.git
cd fuzhouhua

# Install dependencies
uv sync

📥 Download Pretrained Models

To download the pretrained Whisper model and dataset required for Fuzhou dialect speech recognition:

cd fuzhouhua
wget -P scripts https://hf-mirror.com/hfd/hfd.sh
chmod a+x scripts/hfd.sh

export HF_ENDPOINT=https://hf-mirror.com

# Download Whisper Large V3 model
scripts/hfd.sh openai/whisper-large-v3 \
    --tool wget \
    --local-dir /your/path/to/fuzhouhua/models/openai/whisper-large-v3

# Download the Fuzhouhua dataset
scripts/hfd.sh i18nJack/fuzhouhua --dataset \
    --tool wget \
    --hf_username i18nJack \
    --hf_token hf_your_hf_token \
    --local-dir /your/path/to/fuzhouhua/data/datasets

Replace /your/path/to/ with your actual local directory paths, and ensure hf_your_hf_token is a valid Hugging Face token with dataset access permissions.

Configuration

Create a .env file in the project root:

# OCR configuration
OCR_OPENAI_BASE_URL=https://openrouter.ai/api/v1
OCR_OPENAI_API_KEY=your_openai_key

# OpenAI API configuration
OPENAI_BASE_URL=https://api.moonshot.cn/v1
MODEL_NAME=moonshot-v1-8k
OPENAI_API_KEY=your_moonshot_key

# Hugging Face access
HF_TOKEN=hf_your_token

💻 Development Environment

Prerequisites

macOS or Linux (Windows users can use WSL)
Python 3.10+
Homebrew (for macOS users)

Setup Instructions

# Install uv package manager
brew install uv

# Initialize project environment
uv init fuzhouhua --python=3.10

# Add required dependencies
uv add paddle2onnx==1.3.1

🔤 PaddleOCR Integration

Convert PaddleOCR models to ONNX format for wider compatibility:

# Detection Model
paddle2onnx \
  --model_dir ./models/ocr/pdmodel/ch_PP-OCRv4_det_server_infer \
  --model_filename inference.pdmodel \
  --params_filename inference.pdiparams \
  --save_file ./models/ocr/onnxmodel/ch_PP-OCRv4_det_server_infer.onnx \
  --opset_version 11 \
  --enable_onnx_checker True

# Recognition Model
paddle2onnx \
  --model_dir ./models/ocr/pdmodel/ch_PP-OCRv4_rec_server_infer \
  --model_filename inference.pdmodel \
  --params_filename inference.pdiparams \
  --save_file ./models/ocr/onnxmodel/ch_PP-OCRv4_rec_server_infer.onnx \
  --opset_version 11 \
  --enable_onnx_checker True

# Classification Model
paddle2onnx \
  --model_dir ./models/ocr/pdmodel/ch_ppocr_mobile_v2.0_cls_infer \
  --model_filename inference.pdmodel \
  --params_filename inference.pdiparams \
  --save_file ./models/ocr/onnxmodel/ch_ppocr_mobile_v2_cls.onnx \
  --opset_version 11 \
  --enable_onnx_checker True

🤝 Contributing

Contributions are welcome! Please check out our contribution guidelines before getting started.

📝 Citation

If you use this project in your research or applications, please cite:

@software{fuzhouhua_asr,
  author = {Pan, Gahn},
  title = {Fuzhou Dialect Speech Recognition},
  year = {2025},
  url = {https://github.com/pangahn/fuzhouhua}
}

📄 License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
configs		configs
models/ocr/fonts		models/ocr/fonts
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fuzhou Dialect Speech Recognition

🚀 Overview

🎯 Key Features

📋 Project Milestones

Dataset Construction

Model Development

Deployment

🛠️ Quick Start

Installation

📥 Download Pretrained Models

Configuration

💻 Development Environment

Prerequisites

Setup Instructions

🔤 PaddleOCR Integration

🤝 Contributing

📝 Citation

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

pangahn/fuzhouhua

Folders and files

Latest commit

History

Repository files navigation

Fuzhou Dialect Speech Recognition

🚀 Overview

🎯 Key Features

📋 Project Milestones

Dataset Construction

Model Development

Deployment

🛠️ Quick Start

Installation

📥 Download Pretrained Models

Configuration

💻 Development Environment

Prerequisites

Setup Instructions

🔤 PaddleOCR Integration

🤝 Contributing

📝 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages