Project Setup

Setting Up the Environment

Option A — Local (Conda)

Install Miniconda or Anaconda.

Create the environment:

conda env create -f environment.yml
conda activate cloudspace

(Optional) Update later:

conda env update -f environment.yml --name cloudspace

Option B — Lightning Studio (single built-in Conda env)

Lightning Studios give you one default Conda environment (often called cloudspace). Update that active env in place:

# from the repo root
conda env update -f environment.yml

Option C — Pip-only install (CI / minimal images / quick start)

If you just want the pip: packages from the YAML (e.g., when Conda changes are not allowed), create requirements.txt from the pip: block and install:

pip install -r requirements.txt

Audio Datasets

The datasets used in this project are not included in this repository.
You can access them through the following shared folder:

Datasets - Google Drive link

Alternatively, you may collect the audio files directly from their original sources if you prefer.

Please follow these guidelines when preparing your local dataset structure:

Folder location: place all datasets inside the Original_datasets folder located in the project root.
Folder organization: within Original_datasets, create a separate folder for each species and store the corresponding WAV files inside it.
File naming: keep the same naming pattern as in the Google Drive link to ensure compatibility with the provided notebooks.

Once your dataset is in place, you can start running the Jupyter notebooks.

Metadata

The metadata of datasets used in this project are not included in thi repository You can access them through the following shared folder:

Metadata - Google Drive link

Then, paste the downloaded files into the Output_metadata folder using the following structure:

Output_metadata
├── GreatTit_metadata
│   ├── final_greatTit_metadata.csv
│   ├── test_metadata.csv
│   ├── train_metadata.csv
│   └── val_metadata.csv
├── chiffchaff-fg
│   ├── chiffchaff-withinyear-fg-trn.csv
│   └── chiffchaff-withinyear-fg-tst.csv
├── KiwiTrimmed
│   └── kiwi_metadata.csv
├── littleowl-fg
│   ├── littleowl-acrossyear-fg-trn.csv
│   └── littleowl-acrossyear-fg-tst.csv
├── littlepenguin_metadata
│   └── littlepenguin_metadata_corrected.csv
├── pipit-fg
│   ├── pipit-withinyear-fg-trn.csv
│   └── pipit-withinyear-fg-tst.csv
└── rtbc_metadata
    └── rtbc_metadata.csv

Extracting BirdNET Embeddings

Before extracting embeddings, each vocalization must be padded so its duration is a multiple of 3 seconds.
Run the following notebook first:

Notebooks/3_Adding silence/Adding_silence_to_audios.ipynb

This notebook adds the necessary silence and outputs audio files ready to be processed by BirdNET.
For large datasets this step can be time-consuming, so please be patient.

Next, extract the embeddings with:

Notebooks/4_gettingEmbeddings/1_gettingEmbeddings_parquet.ipynb

This notebook uses the BirdNETlib library to process the padded audio datasets, extract embeddings, and save the results in Parquet format.
Make sure to adjust the file paths and parameters inside the notebook to match your specific dataset and requirements.

Each dataset will produce a set of Parquet parts, saved under:

 Output_files/Embeddings_from_3sPadding/<dataset_name>_parquet_parts/

Example:
Output_files/Embeddings_from_3sPadding/littleowl_parquet_parts/part_0000.parquet Output_files/Embeddings_from_3sPadding/littleowl_parquet_parts/littleowl_processed_files.parquet

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Notebooks		Notebooks
Original_datasets		Original_datasets
Output_files		Output_files
Output_metadata		Output_metadata
.gitignore		.gitignore
Adding_labels_to embedddings.ipynb		Adding_labels_to embedddings.ipynb
Filter_segemets_0_3.ipynb		Filter_segemets_0_3.ipynb
environment.yml		environment.yml
installation_process.txt		installation_process.txt
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Setup

Setting Up the Environment

Option A — Local (Conda)

Option B — Lightning Studio (single built-in Conda env)

Option C — Pip-only install (CI / minimal images / quick start)

Audio Datasets

Metadata

Extracting BirdNET Embeddings

About

Uh oh!

Releases

Packages

Languages

jongalon/emb_extraction

Folders and files

Latest commit

History

Repository files navigation

Project Setup

Setting Up the Environment

Option A — Local (Conda)

Option B — Lightning Studio (single built-in Conda env)

Option C — Pip-only install (CI / minimal images / quick start)

Audio Datasets

Metadata

Extracting BirdNET Embeddings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages