Labeled data ingestion for train/eval

Simplify user experience by allowing user to pass a table specifying various annotation files, corresponding audio files/root paths, the annotation column name, the data split (eg train/val/test), annotated class, and the annotation type (eg multi hot csv, categorical csv, raven, audacity, whombat, dipper single target, dipper multi target) 

If annotated class is empty assumes all classes annotated

Returns a dictionary with labels for each split 

CatLabels_dict= ingest_labels(labels_summary.csv or df, class_list,upsample=500, downsample=None)
catlabels["train"].label_counts()
CatLabels.convert()
catlabels.trim_whitespace()
CatLabels.dropclips(labels="uncertain")

Optionally Upsamples and or downsamples training split to N samples per class 

Consider also supporting xeno canto ingestion via britekit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Labeled data ingestion for train/eval #1201

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Labeled data ingestion for train/eval #1201

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions