This is code to train models on facial images to detect presence or absence of some attribute (in this case diagnose acromegaly).
The suggest installation method uses the pyproject.toml to manage dependencies, ideally with a tool like astral uv.
First install it by following the instructions.
Once uv is installed, go to the project root and run:
$ uv syncThis creates the virtual environment for the project in .venv. Initialize it by running:
$ source .venv/bin/activateNow we add this project as a package to that python environment:
$ uv pip install -e .Ideally, you as a user should not have to download any models manually. They should be automatically downloaded and saved into the pretrained_models directory.
This code relies heavily on configuration files to adjust its behavior. The configuration files are typically python files which contain instantiations of specific configuration objects. You can find the configuration files in configs.
Noteworthy files are:
configs/preprocess/- Is used by the preprocessing script to determine how to generate images from the videos. This mainly entails detailing what angles the images should be taken from.configs/datasets/- Tells the training framework how to load the files which belong to the dataset (e.g. what Dataset class to use). This can be used to specify that different views of the data should be used for training vs. evaluation (e.g. training using 20 video frames, evaluation using 3).configs/[model]_experiment.py- Entrypoints for the actual experiments. They detail the experiment process, what dataset config and model config to use, as well as how the dataset splitting and hyper parameter optimization should be performed.configs/models/- This directory contain the configuration for different models, this determines what model classes are used by the experiment running framework, and their [hyper]parameters.
Data is assumed to be in video files, with the following naming scheme:
[Site code][study code]-[A or K]-[video name].mp4
The side code determines what site the data comes from, while the study code identifies the pair of acromegaly-control. A or K denotes the label of the image, for Acromegaly (A) or Control (K). The video identifier determines which video of this subject this is, in case multiple videos have been captured, but this is mainly used for the manual data management.
An example could be:
LU01-A-capture_1.mp4
Which would be at the site LU, of the first acromegaly patient (or corresponding control). In this case, the A denotes this as being a video of an acromegaly participant.
All source video files should be placed in the same folder, called raw, placed in a root dataset folder (e.g. datasets). If you put the videos in datasets/raw/, then the following will produce the dataset containing the evaluation files (three images of 90, 45 and 0 degrees angle).
$ python scripts/preprocess_data.py datasets/raw configs/preprocess/three_yaw_dataset.pyTo produce the training files, run the same command with a different pre-processing config:
$ python scripts/preprocess_data.py datasets/raw configs/preprocess/9_degree_yaw_dataset.py`These two preprocessing runs has created a number of different directories containing frames from the original videos under datasets/processed:
masked_yaws_-90_81_20 masked_yaws_0_90_3 yaws yaws_-90_81_20 yaws_0_90_3
These contain partial results from the different steps. The directory you are interested in are the ones prefixed by masked, as these will contain the frames from the desired yaws, where clothing has been automatically masked.
You might want to do manual inspection of the resulting images, at least ensure that the evaluation images seem correct. By default the images will be masked automatically, but this mask is imperfect either not masking enough of the clothes or masking parts of the face. The masks are in the alpha channel of the image, so can easily be corrected in an image manipulation program.
After you are happy with the clothing masks, you need to collapse the alpha channel of the images:
$ python scripts/remove_alpha_channel.py datasets/processed/masked_yaws_0_90_3`
$ python scripts/remove_alpha_channel.py datasets/processed/masked_yaws_-90_81_20`This produces two new directories which are the ones we want to use for training:
masked_yaws_0_90_3_no-alpha masked_yaws_-90_81_20_no-alpha
The number tells us the starting angle and the stop angle, and how many frames there are. In this case we have one folder with 3 frames per video and one with 20. We'll use the one with 3 frames for evaluation, and the one with 20 for training.
Before we start training, we need to make the training framework aware of our data. We do this by setting up config files.
Before we start training, we need to configure the datasets. There is a default dataset config in configs/datasets/acroface/acroface_ml which is the recommended one. This assumes there are two directories for the dataset, dataset/acroface_ml_dataset/training_dataset and dataset/acroface_ml_dataset/test_dataset. You can create these two directories and copy the files you want over to them, but creating symlinks allow you to more clearly denote which result from the preprocessing you use:
$ mkdir -p dataset/acroface_ml_dataset
$ ln -s $PWD/datasets/processed/masked_yaws_-90_81_20_no-alpha dataset/acroface_ml_dataset/training_dataset
$ ln -s $PWD/datasets/processed/masked_yaws_0_90_3_no-alpha dataset/acroface_ml_dataset/test_datasetIf you wish to limit which subjects (videos) are part of the datasets, you can add a subjects.txt file in the experiment dataset root (e.g. dataset/acroface_ml_dataset in the example above). This text file should have one subject code per line an the experiment framework will only use subjects listed.
Now that all data is set up, we can start running our experiments. They are defined by configuration files divided according to the pre-trained model we'll use for our classification, with the following ones:
config/densenet_experiment_pretrain_20hp.py
config/farl_experiment_pretrain_10hp.py
config/inceptionv3_experiment_pretrain_20hp.py
config/resnet_experiment_pretrain_20hp.py
The experiments look almost exactly the same, the difference is what model config file to load, and the different number of hyper parameter trials to perform. FaRL is significantly more costly to train, so is alotted fewer number of trials.
Before we run the experiments, we need to set up the dataset splits. This is done beforehand for two main reasons: determinism across model runs (the models will be trained on the same cross validation folds) and checking (easier to inspect the dataset split to assure proper disjoint splits). The dataset split script uses an experiment config to determine how to split the data, since this is the same in all our experiments we can use any one of them:
$ python scripts/setup_data_splits.py configs/farl_experiment_pretrain_10hp.pyThis creates a JSON file describing the splits in configs/dataset_splits/base_splits.json. If you like, you can inspect this file manually. In this example, it will be a series of nested splits, due to the nested cross validation.
An experiment will actually be a great number of independent training runs (as determined by the dataset splits). The experiment framework in this project decouples each of these runs from each other by defining each of them as a work package. These will automatically be distributed over multiple processes if the experiment config has a list of multiprocessing_envs (see below for details). To start an experiment run:
$ python scripts/run_experiment.py configs/farl_experiment_pretrain_10hp.pyThis will create the work packages and start processing them, saving all results as files in the folder experiments/EXPERIMENT_NAME/TIMESTAMP, an example could be experiments/farl_experiment_pretrain_10hp/2024-04-17T05.20.14. Most files are human readable, so you can inspect training results as they are created.
The script has rudimentary support for running parallel experiments. This is achieved by starting multiple processes and then make them process different work packages. To configure this, change the multiprocessing_envs environmental variable to be a list of dictionaries. Each dictionary lists the environmental variables and their values which a process should be started with, so if there are two dictionaries, two worker processes will be started with their own environments. This is useful to explicitly limit which devices (i.e. GPUs) those processes will see. Here's an example which would make the framework use two CUDA GPUs:
experiment_config = ExperimentConfig(name='farl_experiment_pretrain_10hp',
dataset_config_path=dataset_config_path,
data_split_path='configs/dataset_splits/base_splits.json',
model_config_path=model_config_path,
resample_strategy=top_resample_strategy,
nested_resample_strategy=nested_resample_strategy,
hpo_config=hpo_config,
multiprocessing_envs=[{'CUDA_VISIBLE_DEVICES': '0'},
{'CUDA_VISIBLE_DEVICES': '1'},
#{'CUDA_VISIBLE_DEVICES': '2'},
#{'CUDA_VISIBLE_DEVICES': '3'},
#{'CUDA_VISIBLE_DEVICES': '4'},
#{'CUDA_VISIBLE_DEVICES': '5'},
#{'CUDA_VISIBLE_DEVICES': '6'},
#{'CUDA_VISIBLE_DEVICES': '7'},
]
)If the experiment run failed for some reason (e.g. power loss), you can continue running an experiment with scripts/continue_experiment.py, for example with the experiment config from above:
$ python scripts/continue_experiment.py configs/farl_experiment_pretrain_10hp.py experiments/farl_experiment_pretrain_10hp/TIMESTAMPThis will contiunue processing the incomplete work packages and finnish the experiment.
The results of experiments are saved to the experiment folder. Since experiments are based on nested cross validation, the experiment results folder structure mimics this. In the root experiment folder there will be a children directory which contain the topmost cross-validation folds. These are the level at which the final models will be created, and each folder in that toplevel children directory will correspond to one test dataset fold (i.e. all models under experiments/farl_experiment_pretrain_10hp/TIMESTAMP/children/01 will have the same test data).
Each such folder will in turn have multiple children, corresponding to the nested cross validation (they will be trained with different subsamples of training data vs. development data).
You can automatically extract the final results from these top-level experiments by running:
$ python analysis/collate_ml_predicitons.py experiments/farl_experiment_pretrain_10hp/TIMESTAMPThis by default creates a file under analysis/annotations/deep_learning_annotations containing a Comma Seperated Values (CSV) file with subjects, ground truth label and model predictions. This file can then be used as input to the further analysis.
To create bootstrapped predictions run:
$ python analysis/make_bootstrap_statistics.py analysis/annotations/deep_learning_annotationsThis creates a new file analysis/bootstraped_statistics/bootstrapped_statistics.json which has bootstrapped the
predictions per subject from all predictors in a annotation file. Based on this we can then summarize the results with
$ python analysis/make_performance_table.py analysis/bootstraped_statistics/bootstrapped_statistics.jsonThis creates the files analysis/results_tables/bootstrapped_results_human_readable_ci.xlsx
and analysis/results_tables/bootstrapped_results_separate_ci.xlsx which summarizes the results.
The analysis scripts group predictions based on the annotation CSV file (e.g. the one in analysis/annotations/deep_learning_annotations),
so if you'd like to simulate ensembling predictions you can simply concatenate predictions from different files into new annotation files.
This will lead to all the the predictions per subject being part of the bootstrapped predictions, so you'll essentially get the
bootstrapped predictions from the ensemble of models in each annotation file, regardless of whether the annotators in that file are
different models or not.
There are a handfull of AI-generated videos in the sora_dataset directory which we can use to try out this framework.
These are not real videos (which should be obvious), but serves as a good example of how the pipeline works. Below is a full example
run using this data.
n.b. at one stage we will pad this data, essentially repeating the same videos multiple times. This is to not make the size of this repository very large (including more than 20 example videos), but will mean that test and training datasets will overlap, so any results you get from this example are not valid
-
$ python scripts/preprocess_data.py sora_dataset configs/preprocess/three_yaw_dataset.py $ python scripts/preprocess_data.py sora_dataset configs/preprocess/9_degree_yaw_dataset.py
-
Inspect masked images under
sora_dataset/processed/masked_yaws_-90_81_20/andsora_dataset/processed/masked_yaws_0_90_3/. -
$ python scripts/remove_alpha_channel.py sora_dataset/processed/masked_yaws_-90_81_20/ $ python scripts/remove_alpha_channel.py sora_dataset/processed/masked_yaws_0_90_3/
-
$ mkdir -p dataset/acroface_ml_dataset $ ln -s $PWD/sora_dataset/processed/masked_yaws_-90_81_20_no-alpha dataset/acroface_ml_dataset/training_dataset $ ln -s $PWD/sora_dataset/processed/masked_yaws_0_90_3_no-alpha dataset/acroface_ml_dataset/test_dataset
-
The dataset only contains 4 videos, but to do nested cross validation, we need at least 20. For the sake of this example, we'll pad the dataset by repeating our images. Only do this for this example, multiple copies of the data invalidates any results!
$ python scripts/pad_example_data.py dataset/acroface_ml_dataset
-
$ python scripts/setup_data_splits.py configs/farl_experiment_pretrain_10hp.py
-
$ python scripts/run_experiment.py configs/densenet_experiment_pretrain_20hp.py $ python scripts/run_experiment.py configs/farl_experiment_pretrain_10hp.py $ python scripts/run_experiment.py configs/inceptionv3_experiment_pretrain_20hp.py $ python scripts/run_experiment.py configs/resnet_experiment_pretrain_20hp.py
-
$ python analysis/collate_ml_predicitons.py experiments
-
$ python analysis/make_bootstrap_statistics.py analysis/annotations/deep_learning_annotations
-
$ python analysis/make_performance_table.py analysis/bootstraped_statistics/bootstrapped_statistics.json
Now the results are in the files analysis/results_tables/bootstrapped_results_human_readable_ci.xlsx
and analysis/results_tables/bootstrapped_results_separate_ci.xlsx. Note that these results are invalid,
they will very likely show perfect performance since the training datasets perfectly overlap the test datasets.