Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/source/opensoundscape.ml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,14 @@ opensoundscape.ml.utils module
:undoc-members:
:show-inheritance:

opensoundscape.ml.export module
-------------------------------

.. automodule:: opensoundscape.ml.export
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

Expand Down
134 changes: 129 additions & 5 deletions docs/tutorials/train_cnn.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1246,6 +1246,123 @@
"scores_df.head()"
]
},
{
"cell_type": "markdown",
"id": "8aea4709",
"metadata": {},
"source": [
"## Saving and exporting the model\n",
"\n",
"There are a few different ways we can save the trained model depending on downstream use cases. In general, you can simply use the `model.save(path)` function to save the model in a JSON format that can be reloaded by OpenSoundscape (`opso.load_model(path)` or `CNN.load(path)`). To load the raw dictionary of saved content, use `torch.load(path,weights_only=False)`. \n",
"\n",
"When you want to continue training from a saved model file, it is helpful to use `model.save(pickle=True)` which saves a compressed version of the entire Python object, rather than a JSON-like format. Unlike the default saving method, this saved object retains temporary model training states like the optimizer and learning rate scheduler states. However, note that when you saved a pickled model object, you could encounter issues re-loading it in different Python environments or different versions of OpenSoundscape. After saving a pickled model, you can reload it in the same way as normal: `opso.load_model(path)` or `CNN.load(path)`. \n",
"\n",
"### ONNX Export\n",
"ONNX (Open Neural Network Exchange) is a cross-platform format for representing neural networks on a wide range of hardware and operating systems. Exporting a model to ONNX is useful for edge computing and other inference-only applications where you want independence from PyTorch. Note that not all models can be exported to ONNX. In particular, the supported method for creating a model that can be exported to ONNX is to initialize the model with the TorchSpectrogramPreprocessor class as the preprocessor. This preprocessor is designed for compatability with PyTorch's `torch.onnx.export` function. Here's an example:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a8d82b80",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/SML161/miniconda3/envs/opso_dev/lib/python3.13/site-packages/onnxscript/converter.py:457: DeprecationWarning: Expression.__init__ got an unexpected keyword argument 'lineno'. Support for arbitrary keyword arguments is deprecated and will be removed in Python 3.15.\n",
" expr = ast.Expression(expr, lineno=expr.lineno, col_offset=expr.col_offset)\n",
"/Users/SML161/miniconda3/envs/opso_dev/lib/python3.13/site-packages/onnxscript/converter.py:457: DeprecationWarning: Expression.__init__ got an unexpected keyword argument 'col_offset'. Support for arbitrary keyword arguments is deprecated and will be removed in Python 3.15.\n",
" expr = ast.Expression(expr, lineno=expr.lineno, col_offset=expr.col_offset)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[torch.onnx] Obtain model graph for `ONNXModel()` with `torch.export.export(..., strict=False)`...\n",
"[torch.onnx] Obtain model graph for `ONNXModel()` with `torch.export.export(..., strict=False)`... ✅\n",
"[torch.onnx] Run decomposition...\n",
"[torch.onnx] Run decomposition... ✅\n",
"[torch.onnx] Translate the graph into ONNX...\n",
"[torch.onnx] Translate the graph into ONNX... ✅\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/SML161/miniconda3/envs/opso_dev/lib/python3.13/site-packages/onnx/reference/ops/op_range.py:13: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)\n",
" return (np.arange(starts, ends, steps).astype(starts.dtype),)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Applied 9 of general pattern rewrite rules.\n"
]
}
],
"source": [
"from opensoundscape import CNN, preprocessors\n",
"\n",
"model = CNN(\n",
" architecture=\"efficientnet_b0\",\n",
" classes=[0, 1, 2, 3],\n",
" sample_duration=3,\n",
" preprocessor_cls=preprocessors.TorchSpectrogramPreprocessor,\n",
" sample_rate=32000,\n",
")\n",
"onnx_program = model.save_onnx(\"./opso_efficientnet.onnx\", activation_layer=\"sigmoid\")"
]
},
{
"cell_type": "markdown",
"id": "088f7597",
"metadata": {},
"source": [
"This saves an ONNX program for inference (prediction). The program produces three outputs by default: the pre-processed audio sample (a spectrogram), the embedding layer outputs, and the final classifier outputs. You can turn off any of these outputs using the arguments to `.save_onnx()`. If `model.network.classifier_layer` is not set, the function will not know which layer to use for embeddings, and will instead create a program that only exports the pre-processed sample and the final classifier outputs. \n",
"\n",
"You can also directly use the lower-level functions `opso.export.to_onnx_program` to export custom model classes, or inspect the code in that function to build a custom onnx export method. \n",
"\n",
"The ONNX program can be run in various ways once it is exported. In Python, you can run onnx programs using the onnx_runtime package. Here's a sample script:\n",
"\n",
"\n",
"```python\n",
"import onnx, onnxruntime\n",
"import numpy as np\n",
"\n",
"combined_model = onnx.load(\"opso_efficientnet.onnx\")\n",
"output_names = [node.name for node in combined_model.graph.output]\n",
"\n",
"onnx.checker.check_model(combined_model)\n",
"\n",
"\n",
"EP_list = [\"CPUExecutionProvider\"] # [\"CUDAExecutionProvider\", \"CPUExecutionProvider\"]\n",
"ort_session = onnxruntime.InferenceSession(\"opso_efficientnet.onnx\", providers=EP_list)\n",
"\n",
"# make up some random inputs\n",
"audio_samples_per_input = (\n",
" combined_model.graph.input[0].type.tensor_type.shape.dim[2].dim_value\n",
")\n",
"batch_size = 3\n",
"input_batched = np.random.rand(batch_size, 1, audio_samples_per_input).astype(\n",
" np.float32\n",
")\n",
"\n",
"# compute ONNX Runtime output prediction\n",
"ort_inputs = {ort_session.get_inputs()[0].name: input_batched}\n",
"ort_outs = ort_session.run(None, ort_inputs)\n",
"\n",
"# restore the name-value dictionary mapping of outputs\n",
"outs_dict = {name: ort_outs[i] for i, name in enumerate(output_names)}\n",
"print(f\"shape of outputs for inference on one batch of batch size {batch_size}:\")\n",
"print({k: v.shape for k, v in outs_dict.items()})\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "b1946c1f",
Expand All @@ -1255,24 +1372,31 @@
"\n",
"For guidance on how to use machine learning classifiers, see the Classifieres 101 Guide on opensoundscape.org and the tutorial on predicting with pre-trained CNNs.\n",
"\n",
"For transfer learning from pre-trained CNNs, see the transfer learning tutorial notebook.\n",
"\n",
"**Clean up:** Run the following cell to delete the files created in this tutorial. However, these files are used in other tutorials, so you may wish not to delete them just yet."
"**Clean up:** Run the following cell to delete the files created in this tutorial."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"id": "440ca518-abcd-4bac-94e8-12ff8b8e46b1",
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"\n",
"from pathlib import Path\n",
"# uncomment to remove the training files\n",
"# shutil.rmtree('./annotated_data')\n",
"\n",
"shutil.rmtree(\"./wandb\")\n",
"shutil.rmtree(\"./model_training_checkpoints\")\n",
"if Path(\"./wandb\").exists():\n",
" shutil.rmtree(\"./wandb\")\n",
"if Path(\"./model_training_checkpoints\").exists():\n",
" shutil.rmtree(\"./model_training_checkpoints\")\n",
"try:\n",
" Path(\"opso_efficientnet.onnx\").unlink()\n",
"except:\n",
" pass\n",
"try:\n",
" Path(\"annotation_Files.zip\").unlink()\n",
"except:\n",
Expand Down
41 changes: 31 additions & 10 deletions opensoundscape/annotations.py
Original file line number Diff line number Diff line change
Expand Up @@ -1092,9 +1092,13 @@ def clip_labels(
annotations result in at least one clip being labeled 1
(if there are no gaps between clips).
full_duration: The amount of time (seconds) to split into clips for each file
float or `None`; if `None`, attempts to get each file's duration
any of float, list, or `None`
- if `None`, attempts to get each file's duration
using `librosa.get_duration(path=file)` where file is the value
of `audio` for each row of self.df
- if float: uses this fixed duration for all files
- if list: should be same length as `audio_files`, giving duration
for each file
class_subset: list of classes for one-hot labels. If None, classes will
be all unique values of self.df['annotation']
audio_files: list of audio file paths (as str or pathlib.Path)
Expand Down Expand Up @@ -1159,15 +1163,32 @@ class names for each clip; also returns a second value, the list of class names
argument to `clip_labels()` will avoid the attempt to get
audio file durations from file paths."""
) from exc
else: # use fixed full_duration for all files
# make a clip df, will be re-used for each file
clip_df_template = generate_clip_times_df(
full_duration=full_duration, clip_duration=clip_duration, **kwargs
)
# make a clip df for all files
clip_df = pd.concat([clip_df_template] * len(audio_files))
# add file column, repeating value of file across each clip
clip_df["file"] = np.repeat(audio_files, len(clip_df_template))
else:
# clip_duration is fixed for all files, or is a list of durations for each file
if isinstance(full_duration, (list, np.ndarray)):
assert len(full_duration) == len(audio_files), (
"`full_duration` should be a float or a list/array "
"with same length as `audio_files`, or None. Length did not match audio_files."
)
else:
# same duration for all files
assert isinstance(full_duration, (int, float)), (
"`full_duration` should be a float or a list/array "
f"with same length as `audio_files`, or None. Got type {type(full_duration)}."
)
full_duration = [full_duration] * len(audio_files)

dfs = []
for i, file in enumerate(audio_files):

df = generate_clip_times_df(
full_duration=full_duration[i],
clip_duration=clip_duration,
**kwargs,
)
df["file"] = file
dfs.append(df)
clip_df = pd.concat(dfs).reset_index(drop=True)
clip_df = clip_df.set_index(["file", "start_time", "end_time"])

# call labels_on_index with clip_df
Expand Down
1 change: 1 addition & 0 deletions opensoundscape/ml/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from . import safe_dataset
from . import sampling
from . import utils
from . import export
import torch.multiprocessing

# using 'file_system' avoids errors with "Too many open files",
Expand Down
Loading
Loading