This repository contains all the code necessary to reproduce the results from the paper: "Learning a Continue-Thinking Token for Enhanced Test-Time Scaling".
- Clone and install the Open-R1 repository.
- Enter the
trl-libdirectory and install it:cd trl-lib pip install -e .
- Launch the training process using the script:
This script will save the embedding of the new token in the
scripts/launch_vllms_and_train.sh
scripts/directory.
- Enter the evaluation harness directory and install dependencies:
cd eval/lm-evaluation-harness pip install -e .[math,vllm] - Use the
save_model.pyscript to load the trained token weights into a model and save the resulting model locally. - Modify
eval/commands.pyto configure the desired evaluation tasks. - Run the experiments using the script:
eval/submit_all_sbatch.sh
- Evaluate the results using the LLM:
- Start the LLM evaluator via vLLM:
vllm serve Qwen/Qwen2.5-7B-Instruct
- Run the script to generate results and visualizations (as shown in the paper):
This script is intended to be used interactively within VS Code.
eval/process_results.py
- Start the LLM evaluator via vLLM: