An open-source project dedicated to fine-tuning a Whisper-based speech recognition model for the Fuzhou dialect (福州话).
This project builds a robust ASR (Automatic Speech Recognition) system for the Fuzhou dialect—a regional Chinese language with limited existing NLP support. We leverage OpenAI's Whisper architecture to create an end-to-end pipeline from data collection to model deployment.
- Comprehensive Dataset - Professionally curated audio samples with accurate transcriptions
- State-of-the-Art Models - Fine-tuned Whisper variants optimized for the Fuzhou dialect
- User-Friendly Interface - Interactive web application for immediate speech recognition
- Collect diverse Fuzhou dialect audio/video content
- Implement OCR workflow for subtitle extraction and alignment
- Release curated training dataset: i18nJack/fuzhouhua
- Fine-tune Whisper architecture for Fuzhou dialect recognition
- Release models in various sizes for different application requirements
- Release i18nJack/fuzhou-whisper-large-v3
- Optimize for latency and resource-constrained environments
- Develop web-based interface for real-time speech recognition
- Create API endpoints for third-party integration
- Publish comprehensive documentation and usage examples
# Clone the repository
git clone [email protected]:pangahn/fuzhouhua.git
cd fuzhouhua
# Install dependencies
uv syncTo download the pretrained Whisper model and dataset required for Fuzhou dialect speech recognition:
cd fuzhouhua
wget -P scripts https://hf-mirror.com/hfd/hfd.sh
chmod a+x scripts/hfd.sh
export HF_ENDPOINT=https://hf-mirror.com
# Download Whisper Large V3 model
scripts/hfd.sh openai/whisper-large-v3 \
--tool wget \
--local-dir /your/path/to/fuzhouhua/models/openai/whisper-large-v3
# Download the Fuzhouhua dataset
scripts/hfd.sh i18nJack/fuzhouhua --dataset \
--tool wget \
--hf_username i18nJack \
--hf_token hf_your_hf_token \
--local-dir /your/path/to/fuzhouhua/data/datasetsReplace
/your/path/to/with your actual local directory paths, and ensurehf_your_hf_tokenis a valid Hugging Face token with dataset access permissions.
Create a .env file in the project root:
# OCR configuration
OCR_OPENAI_BASE_URL=https://openrouter.ai/api/v1
OCR_OPENAI_API_KEY=your_openai_key
# OpenAI API configuration
OPENAI_BASE_URL=https://api.moonshot.cn/v1
MODEL_NAME=moonshot-v1-8k
OPENAI_API_KEY=your_moonshot_key
# Hugging Face access
HF_TOKEN=hf_your_token- macOS or Linux (Windows users can use WSL)
- Python 3.10+
- Homebrew (for macOS users)
# Install uv package manager
brew install uv
# Initialize project environment
uv init fuzhouhua --python=3.10
# Add required dependencies
uv add paddle2onnx==1.3.1Convert PaddleOCR models to ONNX format for wider compatibility:
# Detection Model
paddle2onnx \
--model_dir ./models/ocr/pdmodel/ch_PP-OCRv4_det_server_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./models/ocr/onnxmodel/ch_PP-OCRv4_det_server_infer.onnx \
--opset_version 11 \
--enable_onnx_checker True
# Recognition Model
paddle2onnx \
--model_dir ./models/ocr/pdmodel/ch_PP-OCRv4_rec_server_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./models/ocr/onnxmodel/ch_PP-OCRv4_rec_server_infer.onnx \
--opset_version 11 \
--enable_onnx_checker True
# Classification Model
paddle2onnx \
--model_dir ./models/ocr/pdmodel/ch_ppocr_mobile_v2.0_cls_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./models/ocr/onnxmodel/ch_ppocr_mobile_v2_cls.onnx \
--opset_version 11 \
--enable_onnx_checker TrueContributions are welcome! Please check out our contribution guidelines before getting started.
If you use this project in your research or applications, please cite:
@software{fuzhouhua_asr,
author = {Pan, Gahn},
title = {Fuzhou Dialect Speech Recognition},
year = {2025},
url = {https://github.com/pangahn/fuzhouhua}
}This project is licensed under the GNU General Public License v3.0 (GPL-3.0).