Skip to content

uoh-rislab/event-based_facial_expression_recognition

Repository files navigation

Event-based Facial Expression Recognition

Vision Transformer (ViT) for Facial Expression Recognition with Event-Based Cameras

This repository contains a training script (vit_train.py) for a Vision Transformer (vit_b_16) model using grayscale images generated from an event-based camera. The goal is to classify facial expressions in the CK+ dataset, preprocessed into frame sequences.

Dataset Structure

The dataset must be organized as follows:

output/e-ck+_frames_process_100fps/
├── Train_Set/
│   ├── 0/  # Class: Anger
│   ├── 1/  # Class: Contempt
│   ├── 2/  # Class: Disgust
│   ├── 3/  # Class: Fear
│   ├── 4/  # Class: Happy
│   ├── 5/  # Class: Sadness
│   └── 6/  # Class: Surprise
└── Test_Set/
    ├── 0/
    ├── 1/
    └── ...

Each subfolder contains .png grayscale event frames corresponding to a specific facial expression class.

Installation

Make sure you have Python 3.7+ installed and install the required dependencies:

pip install torch torchvision tqdm matplotlib seaborn scikit-learn tensorboard

Training

Run the training script:

python vit_train.py

A result directory will be automatically created under:

results/vit_e-ckplus_100fps_<timestamp>/

This directory will contain:

  • best_model_vit.pth: Best model based on validation accuracy
  • train_loss.png, accuracy.png: Training/validation plots
  • confusion_matrix_best_model.png: Final normalized confusion matrix
  • Raw and normalized .txt confusion matrix files
  • TensorBoard logs
  • Training metrics and hyperparameters in .txt and .json formats

Hyperparameters

  • Model: vit_b_16
  • Input size: 224x224 pixels (grayscale converted to RGB if pretrained)
  • Pretrained: ImageNet (use_pretrained=True)
  • Optimizer: AdamW
  • Learning rate: 1e-4
  • Epochs: 20
  • Batch size: 4
  • Augmentations: Random crop, flip, rotation

TensorBoard

Launch TensorBoard to monitor training:

tensorboard --logdir results/vit_e-ckplus_100fps_<timestamp>/tensorboard

Output Example

results/
└── vit_e-ckplus_100fps_YYYYMMDD_HHMMSS/
    ├── best_model_vit.pth
    ├── train_loss.txt
    ├── train_acc.txt
    ├── val_acc.txt
    ├── accuracy.png
    ├── train_loss.png
    ├── confusion_matrix_best_model.png
    ├── confusion_matrix_raw.txt
    ├── confusion_matrix_normalized.txt
    ├── hyperparameters.json
    ├── model_architecture.txt
    └── tensorboard/

License

This project is intended for academic and research use.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages