Skip to content

Labeled data ingestion for train/eval #1201

@sammlapp

Description

@sammlapp

Simplify user experience by allowing user to pass a table specifying various annotation files, corresponding audio files/root paths, the annotation column name, the data split (eg train/val/test), annotated class, and the annotation type (eg multi hot csv, categorical csv, raven, audacity, whombat, dipper single target, dipper multi target)

If annotated class is empty assumes all classes annotated

Returns a dictionary with labels for each split

CatLabels_dict= ingest_labels(labels_summary.csv or df, class_list,upsample=500, downsample=None)
catlabels["train"].label_counts()
CatLabels.convert()
catlabels.trim_whitespace()
CatLabels.dropclips(labels="uncertain")

Optionally Upsamples and or downsamples training split to N samples per class

Consider also supporting xeno canto ingestion via britekit

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions