Skip to content

This project explores the use of large language models (LLMs) for automated CTS-R (Cognitive Therapy Scale–Revised) scoring in the context of Cognitive Behavioural Therapy (CBT) intervention, part of an NIHR-funded project led by Prof. Heather O’Mahen at the University of Exeter (2024-present).

Notifications You must be signed in to change notification settings

adamelkholyy/ai-ctsr-eval

Repository files navigation

ai-ctsr-eval

Automatic CBT Competency Assessment with LLMs

This project explores the use of large language models (LLMs) for automated CTS-R (Cognitive Therapy Scale–Revised) scoring in the context of Cognitive Behavioural Therapy (CBT) intervention, part of an NIHR-funded project led by Prof. Heather O’Mahen at the University of Exeter (2024-present).

We use llama.cpp for local inference and provide scripts for running experiments, evaluating models, and comparing outputs to human ratings. The code in this repo was written for use on the GPU partition of the University of Exeter's ISCA supercomputer with proprietary data from the 2013 Cobalt Trial (Wiles et al.). Due to the highly sensitive nature of the data, we are unable to release it publicly, however the code used for the project is presented here in full for the sake of propriety and in the event that someone takes over this project in future.

Features

  • Run CTS-R scoring experiments across multiple categories
  • Evaluate agreement between LLM-generated and human inter-rater scores
  • Flexible experiment configuration via CLI arguments
  • Built on llama.cpp for efficient local inference with GGUF models

Installing llama.cpp on ISCA

To install and build the latest version of llama.cpp on ISCA, first clone the GitHub repo into your project directory.

git clone https://github.com/ggml-org/llama.cpp

Next, hook into a GPU node by using either srun --jobid on one of your existing slurm GPU jobs or srun --partition=gpu to start an interactive job on the GPU. We do this so that llama.cpp can detect the GPU hardware and build accordingly.

srun --jobid=12345 --pty bash

or

srun --partition=gpu --account=Research_Project-T116269 --gres=gpu:1 --time=5:00:00 --mem=4G --pty bash

Next load the appropriate cmake and CUDA modules for building llama.cpp (below are the module versions used on ISCA)

module load CMake/3.26.3-GCCcore-12.3.0
module load GCCcore/12.3.0
module load CUDA/12.2.2

Building llama.cpp

Use the following command to build llama.cpp. Enabling CUDA is highly recommended however -DGGML_CUDA can be switched to OFF if you are running a CPU only partition.

cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

llama.cpp should now successfully build in your project directory. Once the build has completed and you have downloaded a model in .gguf format to the /models folder, you can test your install with the following command (from within llama.cpp/)

./build/bin/llama-cli --model ./models/YOUR_MODEL.gguf --prompt 'What is 1 + 1?'  

For further build instructions, see the llama.cpp github.

Downloading models for llama.cpp

To download models for use in llama.cpp, you will first need to install the hugggingface CLI tool

pip install huggingface_hub[cli]

Then simply use the hf download command to download the model. For example, downloading gpt-oss:120B

hf download ggml-org/gpt-oss-120b-GGUF --local-dir ./models

Running experiments

To run experiments you will need to use 2 scripts:

  1. sbatch_ctsr.sh, which runs the ctsr experiments script on ISCA and
  2. run_ctsr.py, which handles the LLM call and outputs

See also ctsr_experiments.sh for examples of run_ctsr.py calls and the ISCA slurm modules needed for running CTS-R experiments.

sbatch_ctsr.sh is a standard slurm job script. Feel free to change the parameters to your liking, however at least 1 GPU is always necessary and 2 are recommended. run_ctsr.py takes a number of CLI arguments to alter the llama.cpp parameters, as well as the ctsr prompts and models:

usage: run_ctsr.py [-h] [--outdir OUTDIR] [--instruction INSTRUCTION_PROMPT] [--sys SYS_PROMPT] [--cat CAT] [--inter-rater-only] [--test] [--temp [TEMP ...]] --model MODEL [-ngl N_GPU_LAYERS] [--batch-size BATCH_SIZE] [--ctx-size CTX_SIZE] [--seed SEED] [--top-k TOP_K]
                   [--top-p TOP_P] [--min-p MIN_P]

Run llama.cpp for CTS-R assessments

options:
  -h, --help            show this help message and exit
  --outdir OUTDIR       name of output directory
  --instruction INSTRUCTION_PROMPT
                        path to instruction prompt
  --sys SYS_PROMPT      path to system prompt
  --cat CAT             ctsr category to run (default is 1.Agenda setting and adherence)
  --inter-rater-only    toggle on/off scoring across all transcripts or exclusively the 9 double-rated ones, default is  False to run across all 54 transcripts
  --test                enables test mode and runs ctsr assessment on 1 single file only
  --temp [TEMP ...], -t [TEMP ...]
                        temperatures to test, accepts any number of args e.g. --temp 0.8 1.0 (default is 0.7)
  --model MODEL, -m MODEL
                        path to model in .gguf format (if the model is split across multiple .gguf files, simply specify the first one and llama.cpp will find the others)
  -ngl N_GPU_LAYERS, --gpu-layers N_GPU_LAYERS, --n-gpu-layers N_GPU_LAYERS
                        number of layers to offload to the GPU (max layers are automatically configured by default)
  --batch-size BATCH_SIZE, -b BATCH_SIZE
                        number of tokens per batch (default=2048, adjust lower if running out of CUDA memory)
  --ctx-size CTX_SIZE, -c CTX_SIZE
                        size of model context in tokens, default is 50,000 to save memory, specify 0 to use full context
  --seed SEED           set random seed for the model
  --top-k TOP_K         top-k sampling parameter, default is 40
  --top-p TOP_P         top-p sampling parameter, default is 0.95
  --min-p MIN_P         min-p sampling parameter, default is 0.05

Example call

python run_ctsr.py --model ./models/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --temp 0.8 --cat 1 --outdir 32b-cat1-experiment

License

MIT License, see LICENSE.txt.

For any questions feel free to contact [email protected] .

About

This project explores the use of large language models (LLMs) for automated CTS-R (Cognitive Therapy Scale–Revised) scoring in the context of Cognitive Behavioural Therapy (CBT) intervention, part of an NIHR-funded project led by Prof. Heather O’Mahen at the University of Exeter (2024-present).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published