One-sided is a tool to process spoken audio (eg. podcasts) and selectively remove specific speakers. It's ideal for when you want to eliminate extraneous chatter, enabling you to focus on the crux of an interview.
# Install ffmpeg
apt-get install ffmpeg # Ubuntu
brew install ffmpeg # MacOS
# Install python packages and open shell
poetry install
poetry shellDownload your podcast audio from YouTube:
yt-dlp -xv --audio-format wav --restrict-filenames -o "%(title)s.%(ext)s" -- https://www.youtube.com/watch?v=xxxxSegment the audio by speaker, producing a CSV. This process is compute-intensive, and performs best with a GPU:
python diarise.py my-podcast.wav
# outputs my-podcast.ds.csvProduce sample audio to determine the identity of each speaker:
python remix.py --speakers --ds my-podcast.ds.csv --input my-podcast.wav
# outputs my-podcast.speakers.wavListen to the sample audio. This will consist of a segment for each speaker, separated by a chime. Make note of which speakers you'd like to retain and/or remove.
Output the processed audio:
# Include only specified speakers
python remix.py --ds my-podcast.ds.csv --input my-podcast.wav --include-speakers=1,3
# Exclude some speakers, retain the others:
python remix.py --ds my-podcast.ds.csv --input my-podcast.wav --exclude-speakers=2,4
# Either will output my-podcast.onesided.wav- Spoken numbers by Amy Gedgaudas - https://freesound.org/people/tim.kahn/packs/4372/