Skip to content

Running for the first time and i got this error #310

@jxggx

Description

@jxggx

D:\AI\diff-svc>python preprocessing/binarize.py --config training/config_nsf.yaml
| Hparams chains: ['training/config_nsf.yaml']
| Hparams:
K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 128, audio_sample_rate: 44100, binarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False},
binarizer_cls: preprocessing.SVCpre.SVCBinarizer, binary_data_dir: data/binary/nseebmytalk, check_val_every_n_epoch: 10, choose_test_manually: False, clip_grad_norm: 1,
config_path: training/config_nsf.yaml, content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2,
cwt_loss: l1, cwt_std_scale: 0.8, datasets: ['opencpop'], debug: False, dec_ffn_kernel_size: 9,
dec_layers: 4, decay_steps: 40000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet,
diff_loss_type: l2, dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'],
dur_loss: mse, dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4,
encoder_K: 8, encoder_type: fft, endless_ds: False, f0_bin: 256, f0_max: 1100.0,
f0_min: 40.0, ffn_act: gelu, ffn_padding: SAME, fft_size: 2048, fmax: 16000,
fmin: 40, fs2_ckpt: , gaussian_start: True, gen_dir_name: , gen_tgt_spk_id: -1,
hidden_size: 256, hop_size: 512, hubert_gpu: True, hubert_path: checkpoints/hubert/hubert_soft.pt, infer: False,
keep_bins: 128, lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 0.3,
lambda_sent_dur: 1.0, lambda_uv: 1.0, lambda_word_dur: 1.0, load_ckpt: D:\AI\diff-svc\checkpoints\nsf_hifigan, log_interval: 100,
loud_norm: False, lr: 0.0008, max_beta: 0.02, max_epochs: 3000, max_eval_sentences: 1,
max_eval_tokens: 60000, max_frames: 42000, max_input_tokens: 60000, max_sentences: 88, max_tokens: 128000,
max_updates: 1000000, mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120,
no_fs2: True, norm_type: gn, num_ckpt_keep: 10, num_heads: 2, num_sanity_val_steps: 1,
num_spk: 1, num_test_samples: 0, num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98,
out_wav_norm: False, pe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, pe_enable: False, perform_enhance: True, pitch_ar: False,
pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l2, pitch_norm: log, pitch_type: frame,
pndm_speedup: 10, pre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1,
predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256,
pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/nseebmytalk, ref_norm_layer: bn,
rel_pos: True, reset_phone_dict: True, residual_channels: 384, residual_layers: 20, save_best: False,
save_ckpt: True, save_codes: ['configs', 'modules', 'src', 'utils'], save_f0: True, save_gt: False, schedule_type: linear,
seed: 1234, sort_by_len: True, speaker_id: nseebmytalk, spec_max: [0.0], spec_min: [-5.0],
spk_cond_steps: [], stop_token_weight: 5.0, task_cls: training.task.SVC_task.SVCTask, test_ids: [], test_input_dir: ,
test_num: 0, test_prefixes: ['test'], test_set_name: test, timesteps: 1000, train_set_name: train,
use_crepe: True, use_denoise: False, use_energy_embed: False, use_gt_dur: False, use_gt_f0: False,
use_midi: False, use_nsf: True, use_pitch_embed: True, use_pos_embed: True, use_spk_embed: False,
use_spk_id: False, use_split_spk_id: False, use_uv: False, use_var_enc: False, use_vec: False,
val_check_interval: 2000, valid_num: 0, valid_set_name: valid, validate: False, vocoder: network.vocoders.nsf_hifigan.NsfHifiGAN,
vocoder_ckpt: checkpoints/nsf_hifigan/model, warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 2048,
work_dir: ,
| Binarizer: <class 'preprocessing.SVCpre.SVCBinarizer'>
spkers: {'nseebmytalk'}
| spk_map: {'nseebmytalk': 0}
0%| | 0/5 [00:01<?, ?it/s]
Traceback (most recent call last):
File "D:\AI\diff-svc\preprocessing\binarize.py", line 20, in
binarize()
File "D:\AI\diff-svc\preprocessing\binarize.py", line 15, in binarize
binarizer_cls().process()
File "D:\AI\diff-svc\preprocessing\base_binarizer.py", line 135, in process
self.process_data_split('valid')
File "D:\AI\diff-svc\preprocessing\base_binarizer.py", line 156, in process_data_split
item = self.process_item(*a)
File "D:\AI\diff-svc\preprocessing\base_binarizer.py", line 194, in process_item
return File2Batch.temporary_dict2processed_input(item_name, meta_data, self.phone_encoder, binarization_args)
File "D:\AI\diff-svc\preprocessing\process_pipeline.py", line 112, in temporary_dict2processed_input
wav, mel = VOCODERS[hparams['vocoder'].split('.')[-1]].wav2spec(temp_dict['wav_fn'])
File "D:\AI\diff-svc\network\vocoders\nsf_hifigan.py", line 89, in wav2spec
mel_torch = stft.get_mel(wav_torch.unsqueeze(0).to(device)).squeeze(0).T
File "D:\AI\diff-svc\modules\nsf_hifigan\nvSTFT.py", line 100, in get_mel
spec = torch.matmul(self.mel_basis[str(fmax)+'_'+str(y.device)], spec)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x1025 and 1x1025)

i am new to this, i don't know where i went wrong and how to fix this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions