This repository includes source code for training and evaluating the retrieval-augmented neural field (RANF) and its extension named RANF+, proposed in the following IEEE Open Journal of Signal Processing submission and ICASSP 2025 paper:
@Article{Masuyama2025OJSP_ranf,
title = {{RANF}: Neural Field-Based {HRTF} Spatial Upsampling with Retrieval Augmentation and Parameter Efficient Fine-Tuning},
author = {Masuyama, Yoshiki and Wichern, Gordon and Germain, Fran\c{c}ois G. and Ick, Christopher and {Le Roux}, Jonathan},
journal = {IEEE Open Journal of Signal Processing},
year = 2025,
}
@InProceedings{Masuyama2024ICASSP_ranf,
author = {Masuyama, Yoshiki and Wichern, Gordon and Germain, Fran\c{c}ois G. and Ick, Christopher and {Le Roux}, Jonathan},
title = {Retrieval-Augmented Neural Field for {HRTF} Upsampling and Personalization},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year = 2025,
month = apr
}
The latest version supports only the experiments for our journal submission. Version 1.0.0 should be used to reproduce the experiment described in the ICASSP paper.
- Environment setup
- Supported sparsity levels and models
- Training and evaluating RANF
- Evaluating learning-free baseline methods
- Contributing
- Copyright and license
The code has been tested using python 3.10.0 on Linux.
Necessary dependencies can be installed using the included requirements.txt:
pip install -r requirements.txt- Our HRTF upsampling experiments were performed on the SONICOM dataset that is released under the MIT license (see Section 3.1 of this paper).
- We performed HRTF upsampling with four sparsity levels following Task 2 of the Listener Acoustic Personalization Challenge 2024. The number of measured directions,
sp_levelinrun_example.sh, should be selected from{3, 5, 19, 100}, where smaller is more challenging to upsample. - We currently support four NF-based methods. Please refer to our paper for their details.
- NF with conditioning by concatenation (CbC): NF takes a subject-specific latent vector as an auxiliary input in addition to the sound source direction.
- NF with low-rank adaptation (LoRA): The model weights will be updated by adding a subject-specific low-rank matrix.
- RANF: NF takes HRTF magnitude and ITDs of the retrieved subjects in addition to the sound source direction. LoRA is also used to adapt the model.
- RANF+: RANF incorporates the results of a panning-based method as auxiliary inputs.
- RANF+ is applicable only to
sp_levelin{19, 100}because the panning-based method is infeasible under more sparse settings.
In order to train and evaluate RANF, RANF+, and the existing NF-based methods on the SONICOM dataset, please execute run_example.sh after following Stage 0. Then, run_example.sh consists of five stages. You can run each stage one by one by changing stage and stop_stage in the script.
-
Stage 0:
- Before starting the training and evaluation, download the SONICOM dataset into a directory specified in
original_pathinrun_example.shand unzip the dataset.- The directory is assumed to contain
KEMAR,P0001-P0005, ...,P0196-P0200, where each directory contains subdirectories for the corresponding subjects. For example,P0001-P0005consists ofP0001,P0002, ...,P0005. - If you find
P0050_FreeFieldCompMinPhase_48kHz.sofainstead ofP0051_FreeFieldCompMinPhase_48kHz.sofain$original_path/P0051-P0055/P0051/HRTF/HRTF/48kHz, please copy it as follows:
cp $original_path/P0051-P0055/P0051/HRTF/HRTF/48kHz/P0050_FreeFieldCompMinPhase_48kHz.sofa $original_path/P0051-P0055/P0051/HRTF/HRTF/48kHz/P0051_FreeFieldCompMinPhase_48kHz.sofa
- The directory is assumed to contain
- Also download the challenge evaluation set into
lap_challenge_pathfrom the official repository for the challenge.- The directory is assumed to directly contain SOFA files.
- While the challenge evaluation set skips subject
P0209, you need to downloadP0209_FreeFieldCompMinPhase_48kHz.sofafrom the SONICOM dataset as our code assumes that the subject indices are consecutive.
preprocessed_dataset_pathshould be specified to save the preprocessed SONICOM dataset.- Model checkpoints and log files will be stored in subdirectories under
exp_base_path. - You can select a model and a sparsity level by
config_pathandsp_level, respectively.
- Before starting the training and evaluation, download the SONICOM dataset into a directory specified in
-
Stage 1:
- This stage copies required HRTF files into
$sonicom_path. - This stage is required only once regardless of the sparsity level, and you can start from Stage 2 if you want to train a new model.
- This stage copies required HRTF files into
-
Stage 2:
- This stage extracts features (spectra and ITDs) and computes distance matrices between subjects in terms of the spectra and ITDs based on the measured HRTFs.
- This stage splits the datasets (train, valid, and test), where the option
--skip_78enforces that the training set excludes a subject with atypical ITD measurements. - The configuration file in
$config_path/original_config.yamlwill be modified based on the sparsity level and the data split, and then the updated configuration file will be saved in$exp_path/config.yaml.
-
Stage 3:
- This stage trains the model specified by
$exp_path/original_config.yamlon the multi-subject training dataset. - The log file will be stored in
$exp_path/log/exp.log, while the checkpoint with the best validation loss will be$exp_path/best.ckpt - For RANF+, the panning-based method is applied to HRTFs of all the subjects before training the model, where the results are stored in a subdirectory under
$sonicom_pathin the SOFA format. The corresponding features are dumped into a subdirectory for each sparsity level under$preprocessed_dataset_path.
- This stage trains the model specified by
-
Stage 4:
- This stage adapts the pre-trained model to the target subject by fine-tuning a few parameters in the model.
- The log file will be stored in
$exp_path/log/adaptation/adaptation.log, while the checkpoint with the best adaptation loss will be$exp_path/adaptation.ckpt - Currently, this stage simultaneously optimizes the subject-specific parameters of all target subjects since the parameters of each subject are independent of other subjects.
-
Stage 5:
- This stage runs inference and evaluates the results.
- The metrics used in the LAP challenge for each subject will be in
$exp_path/log/eval/eval.log, and the summarized result will be shown in the CLI. - We note that the performance may vary from the results reported in the paper depending on your specific environment, especially when
sp_level = 3, and we used Pytorch 1.13.0 for the paper while the current default inrequirements.txtis 2.2.2.
In order to evaluate the learning-free methods, HRTF selection and nearest neighbor, please execute run_learningfree_methods.sh after specifying the paths as explained for RANF above. Stages 1 and 2 are the same as for RANF, and both inference and evaluation are performed in Stage 3.
See CONTRIBUTING.md for our policy on contributions.
Released under AGPL-3.0-or-later license, as found in the LICENSE.md file.
All files:
Copyright (c) 2024 Mitsubishi Electric Research Laboratories (MERL)
SPDX-License-Identifier: AGPL-3.0-or-later