Skip to content

Conversation

@ikks
Copy link

@ikks ikks commented Jul 1, 2025

This bugfix allows to train when only CPU is available.

Previously when the --accelerator cpu option was given by the user, without a GPU, the train process stopped with error:

RuntimeError: Found no NVIDIA driver on your system. Please check that
you have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

Now, when I pass the 'cpu' option and I do not have a GPU, I'm able to train.

Fixes the error when the --accelerator cpu option was given, it's
possible to train with only CPU too.  with small batches and
considerably amount of memory, and days allowing the train process
to run.

  RuntimeError: Found no NVIDIA driver on your system. Please check that
  you have an NVIDIA GPU and installed a driver from
  http://www.nvidia.com/Download/index.aspx
@KiON-GiON
Copy link

Isn't using CPU very slow? Are you using small datasets when it is CPU training?

@ikks
Copy link
Author

ikks commented Jul 1, 2025

Isn't using CPU very slow? Are you using small datasets when it is CPU training?

yes, it is pretty slow. I see your point.

I guess the expectation of the available 'cpu' could be handed gracefully, in case the gpu quota has been reached.

As a note aside:

Right now I'm working adapting to Kaggle, they are offering 30 hours per week in GPU and 20 hours in TPU. I'm still testing the CPU, given that is throwing a similar error.

I'm still polishing the notebook.

@KiON-GiON
Copy link

I'm interested in the Kaggle notebook if you get it working. Kaggle also supports multi GPU, so I think you can use 2x T4, but no idea if it has a noticeable speed gain or something. But for multi-GPU to work you'll have to modify Piper source code directly I think.

@ikks
Copy link
Author

ikks commented Jul 1, 2025

I'm interested in the Kaggle notebook if you get it working. Kaggle also supports multi GPU, so I think you can use 2x T4, but no idea if it has a noticeable speed gain or something. But for multi-GPU to work you'll have to modify Piper source code directly I think.

https://www.kaggle.com/code/scratchpad/notebook6c3922c7e3/edit I just opened it to the public, is in alpha state.

I haven't been able to use it properly with the two GPUs, nor the Tensor one. It runs with CPU, single 2x T4 and GPU P100, the latter one being like 3X faster than 1T4.

I'm new to ML, jupyter notebooks and kaggle. The features that this notebook are:

  • One can provide the url for the checkpoint to start the training
  • Also provide the url of the wavs
  • The notebook has a section to produce a wav with the latest stored checkpoint(when the training has finished)
  • It also prepares the onnx and onnx.json to be downloaded
  • There is an option to use HF datasets to store the checkpoint to resume the work later

Right now I'm adding widgets to make it more useable.

I would be glad to join conversation in other space if needed.

BTW, thanks for all your hard work.

Other things to be done:

  • Improve the r and rr, I checkpointed from high quality lessac voice, but I'm not happy with the pronunciation of r and rr on the data I have used. Maybe it would be good to have a trained voice from the beginning to have this spanish pronunciation more natural.

@ikks
Copy link
Author

ikks commented Jul 5, 2025

I have been testing TPU in Kaggle, and it seems that the direct use of cuda is not recommended. For now, a sample with lightning and TPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants