Automatic CBT Competency Assessment with LLMs
This project explores the use of large language models (LLMs) for automated CTS-R (Cognitive Therapy Scale–Revised) scoring in the context of Cognitive Behavioural Therapy (CBT) intervention, part of an NIHR-funded project led by Prof. Heather O’Mahen at the University of Exeter (2024-present).
We use llama.cpp for local inference and provide scripts for running experiments, evaluating models, and comparing outputs to human ratings. The code in this repo was written for use on the GPU partition of the University of Exeter's ISCA supercomputer with proprietary data from the 2013 Cobalt Trial (Wiles et al.). Due to the highly sensitive nature of the data, we are unable to release it publicly, however the code used for the project is presented here in full for the sake of propriety and in the event that someone takes over this project in future.
Features
- Run CTS-R scoring experiments across multiple categories
- Evaluate agreement between LLM-generated and human inter-rater scores
- Flexible experiment configuration via CLI arguments
- Built on llama.cpp for efficient local inference with GGUF models
To install and build the latest version of llama.cpp on ISCA, first clone the GitHub repo into your project directory.
git clone https://github.com/ggml-org/llama.cpp
Next, hook into a GPU node by using either srun --jobid on one of your existing slurm GPU jobs or srun --partition=gpu to start an interactive job on the GPU. We do this so that llama.cpp can detect the GPU hardware and build accordingly.
srun --jobid=12345 --pty bash
or
srun --partition=gpu --account=Research_Project-T116269 --gres=gpu:1 --time=5:00:00 --mem=4G --pty bash
Next load the appropriate cmake and CUDA modules for building llama.cpp (below are the module versions used on ISCA)
module load CMake/3.26.3-GCCcore-12.3.0
module load GCCcore/12.3.0
module load CUDA/12.2.2
Use the following command to build llama.cpp. Enabling CUDA is highly recommended however -DGGML_CUDA can be switched to OFF if you are running a CPU only partition.
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
llama.cpp should now successfully build in your project directory. Once the build has completed and you have downloaded a model in .gguf format to the /models folder, you can test your install with the following command (from within llama.cpp/)
./build/bin/llama-cli --model ./models/YOUR_MODEL.gguf --prompt 'What is 1 + 1?'
For further build instructions, see the llama.cpp github.
To download models for use in llama.cpp, you will first need to install the hugggingface CLI tool
pip install huggingface_hub[cli]
Then simply use the hf download command to download the model. For example, downloading gpt-oss:120B
hf download ggml-org/gpt-oss-120b-GGUF --local-dir ./models
To run experiments you will need to use 2 scripts:
sbatch_ctsr.sh, which runs the ctsr experiments script on ISCA andrun_ctsr.py, which handles the LLM call and outputs
See also ctsr_experiments.sh for examples of run_ctsr.py calls and the ISCA slurm modules needed for running CTS-R experiments.
sbatch_ctsr.sh is a standard slurm job script. Feel free to change the parameters to your liking, however at least 1 GPU is always necessary and 2 are recommended. run_ctsr.py takes a number of CLI arguments to alter the llama.cpp parameters, as well as the ctsr prompts and models:
usage: run_ctsr.py [-h] [--outdir OUTDIR] [--instruction INSTRUCTION_PROMPT] [--sys SYS_PROMPT] [--cat CAT] [--inter-rater-only] [--test] [--temp [TEMP ...]] --model MODEL [-ngl N_GPU_LAYERS] [--batch-size BATCH_SIZE] [--ctx-size CTX_SIZE] [--seed SEED] [--top-k TOP_K]
[--top-p TOP_P] [--min-p MIN_P]
Run llama.cpp for CTS-R assessments
options:
-h, --help show this help message and exit
--outdir OUTDIR name of output directory
--instruction INSTRUCTION_PROMPT
path to instruction prompt
--sys SYS_PROMPT path to system prompt
--cat CAT ctsr category to run (default is 1.Agenda setting and adherence)
--inter-rater-only toggle on/off scoring across all transcripts or exclusively the 9 double-rated ones, default is False to run across all 54 transcripts
--test enables test mode and runs ctsr assessment on 1 single file only
--temp [TEMP ...], -t [TEMP ...]
temperatures to test, accepts any number of args e.g. --temp 0.8 1.0 (default is 0.7)
--model MODEL, -m MODEL
path to model in .gguf format (if the model is split across multiple .gguf files, simply specify the first one and llama.cpp will find the others)
-ngl N_GPU_LAYERS, --gpu-layers N_GPU_LAYERS, --n-gpu-layers N_GPU_LAYERS
number of layers to offload to the GPU (max layers are automatically configured by default)
--batch-size BATCH_SIZE, -b BATCH_SIZE
number of tokens per batch (default=2048, adjust lower if running out of CUDA memory)
--ctx-size CTX_SIZE, -c CTX_SIZE
size of model context in tokens, default is 50,000 to save memory, specify 0 to use full context
--seed SEED set random seed for the model
--top-k TOP_K top-k sampling parameter, default is 40
--top-p TOP_P top-p sampling parameter, default is 0.95
--min-p MIN_P min-p sampling parameter, default is 0.05
Example call
python run_ctsr.py --model ./models/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --temp 0.8 --cat 1 --outdir 32b-cat1-experiment
MIT License, see LICENSE.txt.
For any questions feel free to contact [email protected] .