Spectrogram Vector Quantization

This is an unofficial implementation of the Spectrogram VQ module from the DCTTS (Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation) paper. DCTTS proposes a novel TTS approach that leverages discrete diffusion models and contrastive learning to generate high-quality speech. This repository specifically implements the Spectrogram VQ component described in Section 2.1 of the paper, which quantizes mel-spectrograms into discrete representations.

Figure 1. Spectrogram VQ architecture from DCTTS paper

Environment

Docker image: pytorch/pytorch:2.8.0-cuda12.8-cudnn9-devel
GPU: NVIDIA RTX 4060 (8GB VRAM)

Setup

Clone this repository and install Python requirements:
```
pip install -r requirements.txt
```
Download the LJSpeech dataset and place it under /workspace/data/ (recommended)

How to Run

Prepare mel-spectrogram features in .npy format:
```
python preprocess.py
```
Train the Spectrogram VQ model:
```
python train_vqgan.py
```
Run inference:

Open the inference.ipynb notebook for inference examples and usage.

Results

Audio samples can be found in the sample/ directory. Most hyperparameters follow the VQGAN implementation from dome272/VQGAN-pytorch.

Figure 2. Spectrogram indices visualization

Acknowledgements

This implementation builds upon several excellent works:

DCTTS Paper: Wu, Zhichao, et al. "DCTTS: Discrete diffusion model with contrastive learning for text-to-speech generation." ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024.
VQGAN Implementation: dome272/VQGAN-pytorch for the VQGAN architecture reference.
HiFi-GAN: keonlee9420's HiFi-GAN implementation for the vocoder parameters.
Original HiFi-GAN: jik876/hifi-gan for the original HiFi-GAN model.

I am grateful to all the authors for making their work publicly available. 👏

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
checkpoints		checkpoints
models		models
sample		sample
.gitignore		.gitignore
LICENSE		LICENSE
dataset.py		dataset.py
inference.ipynb		inference.ipynb
preprocess.py		preprocess.py
readme.md		readme.md
requirements.txt		requirements.txt
synthesis.py		synthesis.py
train_vqgan.py		train_vqgan.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spectrogram Vector Quantization

Environment

Setup

How to Run

Results

Acknowledgements

About

Uh oh!

Languages

License

Orca0917/Spectrogram-VQ

Folders and files

Latest commit

History

Repository files navigation

Spectrogram Vector Quantization

Environment

Setup

How to Run

Results

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages