Note: it is recommanded to clone/fork the repository in order to read the README.md with the exampled audio. Currently, github don't support playing audio files.
Takes splited songs (vocals and accompaniment) and create dataset from them (split into 2 seconds files of singing and accompaniment pairs).
Run spleeter on songs in folder to spleet them into vocals and accompaniment.
Run MelGAN with power loss factor.
Generated Singing; Generated Speech; Original Speech; Complete Song Output;
Generated Singing:Normaize both input and generated mel-spectrogram to make sure discriminator have to address all component equally. Add singing as speech output as well, and add identity-loss when the input speech is actually a singing input.
Generated Singing; Generated Speech; Original Speech; Complete Song Output;
Generated Singing:Generated Singing; Generated Speech; Original Speech; Complete Song Output;
Generated Singing:Use HifiGAN generators, discriminators and losses. Input is the original speech with lower accompaniment.
Generated Singing; Generated Speech; Original Speech With Background;
Generated Singing:Classic approaches to manipulate the speech into singing that fits given midi notes.
sentence: "Don't ask me to carry an"
Original Audio:sentence: "She had your dark suit in greasy wash water all year"
Original Audio:sentence: "Don't ask me to carry an oily rag like that"
Original Audio:input: wave of sinus
Original Audio:sentence: Don't ask me to carry an oily rag like that
Original Audio:Use FreeVC As-Is and train it on speech and singing input, got some intresting results.
Original Audio:Use CycleGAN with MelGAN-VC architecture
Original Speech:Use CycleGAN with MelGAN-VC architecture. In addition, pass singing to the speech2sing generator and speech to the sing2speech generator and expect identity. This done in order to try to preserve the content.
Original Speech:



