ai-ctsr-eval

Automatic CBT Competency Assessment with LLMs

This project explores the use of large language models (LLMs) for automated CTS-R (Cognitive Therapy Scale–Revised) scoring in the context of Cognitive Behavioural Therapy (CBT) intervention, part of an NIHR-funded project led by Prof. Heather O’Mahen at the University of Exeter (2024-present).

We use llama.cpp for local inference and provide scripts for running experiments, evaluating models, and comparing outputs to human ratings. The code in this repo was written for use on the GPU partition of the University of Exeter's ISCA supercomputer with proprietary data from the 2013 Cobalt Trial (Wiles et al.). Due to the highly sensitive nature of the data, we are unable to release it publicly, however the code used for the project is presented here in full for the sake of propriety and in the event that someone takes over this project in future.

Features

Run CTS-R scoring experiments across multiple categories
Evaluate agreement between LLM-generated and human inter-rater scores
Flexible experiment configuration via CLI arguments
Built on llama.cpp for efficient local inference with GGUF models

Installing llama.cpp on ISCA

To install and build the latest version of llama.cpp on ISCA, first clone the GitHub repo into your project directory.

git clone https://github.com/ggml-org/llama.cpp

Next, hook into a GPU node by using either srun --jobid on one of your existing slurm GPU jobs or srun --partition=gpu to start an interactive job on the GPU. We do this so that llama.cpp can detect the GPU hardware and build accordingly.

srun --jobid=12345 --pty bash

or

srun --partition=gpu --account=Research_Project-T116269 --gres=gpu:1 --time=5:00:00 --mem=4G --pty bash

Next load the appropriate cmake and CUDA modules for building llama.cpp (below are the module versions used on ISCA)

module load CMake/3.26.3-GCCcore-12.3.0
module load GCCcore/12.3.0
module load CUDA/12.2.2

Building llama.cpp

Use the following command to build llama.cpp. Enabling CUDA is highly recommended however -DGGML_CUDA can be switched to OFF if you are running a CPU only partition.

cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

llama.cpp should now successfully build in your project directory. Once the build has completed and you have downloaded a model in .gguf format to the /models folder, you can test your install with the following command (from within llama.cpp/)

./build/bin/llama-cli --model ./models/YOUR_MODEL.gguf --prompt 'What is 1 + 1?'

For further build instructions, see the llama.cpp github.

Downloading models for llama.cpp

To download models for use in llama.cpp, you will first need to install the hugggingface CLI tool

pip install huggingface_hub[cli]

Then simply use the hf download command to download the model. For example, downloading gpt-oss:120B

hf download ggml-org/gpt-oss-120b-GGUF --local-dir ./models

Running experiments

To run experiments you will need to use 2 scripts:

sbatch_ctsr.sh, which runs the ctsr experiments script on ISCA and
run_ctsr.py, which handles the LLM call and outputs

See also ctsr_experiments.sh for examples of run_ctsr.py calls and the ISCA slurm modules needed for running CTS-R experiments.

sbatch_ctsr.sh is a standard slurm job script. Feel free to change the parameters to your liking, however at least 1 GPU is always necessary and 2 are recommended. run_ctsr.py takes a number of CLI arguments to alter the llama.cpp parameters, as well as the ctsr prompts and models:

usage: run_ctsr.py [-h] [--outdir OUTDIR] [--instruction INSTRUCTION_PROMPT] [--sys SYS_PROMPT] [--cat CAT] [--inter-rater-only] [--test] [--temp [TEMP ...]] --model MODEL [-ngl N_GPU_LAYERS] [--batch-size BATCH_SIZE] [--ctx-size CTX_SIZE] [--seed SEED] [--top-k TOP_K]
                   [--top-p TOP_P] [--min-p MIN_P]

Run llama.cpp for CTS-R assessments

options:
  -h, --help            show this help message and exit
  --outdir OUTDIR       name of output directory
  --instruction INSTRUCTION_PROMPT
                        path to instruction prompt
  --sys SYS_PROMPT      path to system prompt
  --cat CAT             ctsr category to run (default is 1.Agenda setting and adherence)
  --inter-rater-only    toggle on/off scoring across all transcripts or exclusively the 9 double-rated ones, default is  False to run across all 54 transcripts
  --test                enables test mode and runs ctsr assessment on 1 single file only
  --temp [TEMP ...], -t [TEMP ...]
                        temperatures to test, accepts any number of args e.g. --temp 0.8 1.0 (default is 0.7)
  --model MODEL, -m MODEL
                        path to model in .gguf format (if the model is split across multiple .gguf files, simply specify the first one and llama.cpp will find the others)
  -ngl N_GPU_LAYERS, --gpu-layers N_GPU_LAYERS, --n-gpu-layers N_GPU_LAYERS
                        number of layers to offload to the GPU (max layers are automatically configured by default)
  --batch-size BATCH_SIZE, -b BATCH_SIZE
                        number of tokens per batch (default=2048, adjust lower if running out of CUDA memory)
  --ctx-size CTX_SIZE, -c CTX_SIZE
                        size of model context in tokens, default is 50,000 to save memory, specify 0 to use full context
  --seed SEED           set random seed for the model
  --top-k TOP_K         top-k sampling parameter, default is 40
  --top-p TOP_P         top-p sampling parameter, default is 0.95
  --min-p MIN_P         min-p sampling parameter, default is 0.05

Example call

python run_ctsr.py --model ./models/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --temp 0.8 --cat 1 --outdir 32b-cat1-experiment

License

MIT License, see LICENSE.txt.

For any questions feel free to contact [email protected] .

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
buffer.py		buffer.py
ctsr_experiments.sh		ctsr_experiments.sh
deepseek-32b.modelfile		deepseek-32b.modelfile
evaluate.py		evaluate.py
human_ctsr_scores.csv		human_ctsr_scores.csv
install_ollama.sh		install_ollama.sh
krippendorff.py		krippendorff.py
llama_cpp.py		llama_cpp.py
rated_transcripts.txt		rated_transcripts.txt
requirements.txt		requirements.txt
run_ctsr.py		run_ctsr.py
sbatch_ctsr.sh		sbatch_ctsr.sh
script_runner.py		script_runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ai-ctsr-eval

Installing llama.cpp on ISCA

Building llama.cpp

Downloading models for llama.cpp

Running experiments

License

About

Uh oh!

Releases

Packages

Languages

adamelkholyy/ai-ctsr-eval

Folders and files

Latest commit

History

Repository files navigation

ai-ctsr-eval

Installing llama.cpp on ISCA

Building llama.cpp

Downloading models for llama.cpp

Running experiments

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages