Skip to content

OfirShechter/speech2sing

Repository files navigation

speech2sing

Note: it is recommanded to clone/fork the repository in order to read the README.md with the exampled audio. Currently, github don't support playing audio files.

Prepare data:

Takes splited songs (vocals and accompaniment) and create dataset from them (split into 2 seconds files of singing and accompaniment pairs).

Open In Colab

Spleeter

Run spleeter on songs in folder to spleet them into vocals and accompaniment.

Open In Colab

MelGAN-VC

MelGAN-VC with power loss

Run MelGAN with power loss factor.

Open In Colab

results sample:

Mel Spectrograms

Generated Singing; Generated Speech; Original Speech; Complete Song Output;

Generated Singing: Generated Speech: Original Speech: Complete Song Output:

MelGAN-VC with Identity force and Normalize mel-spectrogram data

Normaize both input and generated mel-spectrogram to make sure discriminator have to address all component equally. Add singing as speech output as well, and add identity-loss when the input speech is actually a singing input.

Open In Colab

Singing as input - results sample:

Mel Spectrograms

Generated Singing; Generated Speech; Original Speech; Complete Song Output;

Generated Singing: Generated Speech: Original Speech: Complete Song Output:

Speech as input - results sample:

Mel Spectrograms

Generated Singing; Generated Speech; Original Speech; Complete Song Output;

Generated Singing: Generated Speech: Original Speech: Complete Song Output:

HifiGAN

Use HifiGAN generators, discriminators and losses. Input is the original speech with lower accompaniment.

Open In Colab

results sample:

Generated Singing; Generated Speech; Original Speech With Background;

Generated Singing: Generated Speech: Original Speech With Background:

Classic Approach

Classic approaches to manipulate the speech into singing that fits given midi notes.

GitHub Repo

result samples (on barbie girl):

Syllables Manipulation

sentence: "Don't ask me to carry an"

Original Audio: Output Audio:

Words Manipulation

sentence: "She had your dark suit in greasy wash water all year"

Original Audio: Output Audio:

Phonemes Manipulation

sentence: "Don't ask me to carry an oily rag like that"

Original Audio: Output Audio:

Time Manipulation (do not mind content)

input: wave of sinus

Original Audio: Output Audio:

sentence: Don't ask me to carry an oily rag like that

Original Audio: Output Audio:

FreeVC

Use FreeVC As-Is and train it on speech and singing input, got some intresting results.

GitHub Repo

result samples:

taylor swift singing to taylor swift speech

Original Audio: Converted Audio:

the beatles singing to taylor swift speech

Original Audio: Converted Audio:

female librispeech to taylor swift speech

Original Audio: Converted Audio:

male librispeech to taylor swift speech

Original Audio: Converted Audio:

taylor swift speech to taylor swift singing

Original Audio: Converted Audio:

female librispeech to taylor swift speech

Original Audio: Converted Audio:

male librispeech to taylor swift speech

Original Audio: Converted Audio:

the beatles singing to taylor swift singing

Original Audio: Converted Audio:

CycleGAN - MelGAN-VC

CycleGAN - without identity loss

Use CycleGAN with MelGAN-VC architecture

Open In Colab

results sample:

Original Speech: Generated Singing: Back To Speech: Original Singing: Generated Speech: Back To Singing:

Mel Spectrograms

CycleGAN - with identity loss

Use CycleGAN with MelGAN-VC architecture. In addition, pass singing to the speech2sing generator and speech to the sing2speech generator and expect identity. This done in order to try to preserve the content.

Open In Colab

results sample:

Original Speech: Identity Speech: Generated Singing: Back To Speech: --------- Original Singing: Identity Singing: Generated Speech: Back To Singing:

Mel Spectrograms

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published