Skip to content

Training failed #110

@Arseny5

Description

@Arseny5

Hello! Can you please help how I can fix this error?

Epoch 1: : 0batch [00:00, ?batch/s]

Traceback (most recent call last):
  File "tasks/run.py", line 15, in <module>
    run_task()
  File "tasks/run.py", line 10, in run_task
    task_cls.start()
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/tasks/base_task.py", line 257, in start
    trainer.fit(task)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 580, in fit
    self.run_pretrain_routine(model)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 673, in run_pretrain_routine
    self.train()
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1448, in train
    self.run_training_epoch()
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1482, in run_training_epoch
    output = self.run_training_batch(batch, batch_idx)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1604, in run_training_batch
    loss = optimizer_closure()
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1569, in optimizer_closure
    output = self.training_forward(
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1678, in training_forward
    output = self.model.training_step(*args)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/tasks/base_task.py", line 128, in training_step
    loss_ret = self._training_step(sample, batch_idx, optimizer_idx)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/usr/task.py", line 57, in _training_step
    log_outputs = self.run_model(self.model, sample)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/usr/diffsinger_task.py", line 299, in run_model
    output = model(txt_tokens, mel2ph=mel2ph, spk_embed=spk_embed,
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/usr/diff/shallow_diffusion_tts.py", line 236, in forward
    ret = self.fs2(txt_tokens, mel2ph, spk_embed, ref_mels, f0, uv, energy,
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/modules/diffsinger_midi/fs2.py", line 63, in forward
    midi_dur_embedding = self.midi_dur_layer(kwargs['midi_dur'][:, :, None])  # [B, T, 1] -> [B, T, H]
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 91, in forward
    return F.linear(input, self.weight, self.bias)
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions