Skip to content

Commit 8d2ee1f

Browse files
authored
Fix sil (#22)
* Update infore dataset * Use new textgrid data. - Update download url and hash. - Use sil instead of sp. - Normalize audio to match hifigan preprocessing. - Random dropout of tokens when training duration model to prevent overfitting. * Load phoneme set from config instead from lexicon file. This keeps the phoneme set unchanged even if the dataset or the lexicon file changed. * use `jax.tree_map` instead of `jax.tree_multimap`. * Better log file names * Remove colab links in notebooks * Fix `zero_silence_segments` script. * Update pretrained models
1 parent 07a5d8a commit 8d2ee1f

18 files changed

+8346
-4454
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
A Vietnamese TTS
22
================
33

4-
Tacotron + HiFiGAN vocoder for vietnamese datasets.
4+
Duration model + Acoustic model + HiFiGAN vocoder for vietnamese text-to-speech application.
55

66
Online demo at https://huggingface.co/spaces/ntt123/vietTTS.
77

@@ -32,12 +32,13 @@ Download InfoRe dataset
3232
-----------------------
3333

3434
```sh
35-
bash ./scripts/download_aligned_infore_dataset.sh
35+
python ./scripts/download_aligned_infore_dataset.py
3636
```
3737

3838
**Note**: this is a denoised and aligned version of the original dataset which is donated by the InfoRe Technology company (see [here](https://www.facebook.com/groups/j2team.community/permalink/1010834009248719/)). You can download the original dataset (**InfoRe Technology 1**) at [here](https://github.com/TensorSpeech/TensorFlowASR/blob/main/README.md#vietnamese).
3939

40-
The Montreal Forced Aligner (MFA) is used to align transcript and speech (textgrid files). [Here](https://colab.research.google.com/gist/NTT123/c99b5a391af56e0cb8f7b190d3d7f0ee/infore-mfa-example.ipynb) is a Colab notebook to align InfoRe dataset. Visit [MFA](https://montreal-forced-aligner.readthedocs.io/en/latest/) for more information on how to create textgrid files.
40+
See `notebooks/denoise_infore_dataset.ipynb` for instructions on how to denoise the dataset. We use the Montreal Forced Aligner (MFA) to align transcript and speech (textgrid files).
41+
See `notebooks/align_text_audio_infore_mfa.ipynb` for instructions on how to create textgrid files.
4142

4243
Train duration model
4344
--------------------

assets/infore/clip.wav

-57.5 KB
Binary file not shown.

0 commit comments

Comments
 (0)