Skip to content

liranringel/learning-continue-thinking-token

Repository files navigation

Introduction

This repository contains all the code necessary to reproduce the results from the paper: "Learning a Continue-Thinking Token for Enhanced Test-Time Scaling".


Training Instructions

  1. Clone and install the Open-R1 repository.
  2. Enter the trl-lib directory and install it:
    cd trl-lib
    pip install -e .
  3. Launch the training process using the script:
    scripts/launch_vllms_and_train.sh
    This script will save the embedding of the new token in the scripts/ directory.

Evaluation

  1. Enter the evaluation harness directory and install dependencies:
    cd eval/lm-evaluation-harness
    pip install -e .[math,vllm]
  2. Use the save_model.py script to load the trained token weights into a model and save the resulting model locally.
  3. Modify eval/commands.py to configure the desired evaluation tasks.
  4. Run the experiments using the script:
    eval/submit_all_sbatch.sh
  5. Evaluate the results using the LLM:
    • Start the LLM evaluator via vLLM:
      vllm serve Qwen/Qwen2.5-7B-Instruct
    • Run the script to generate results and visualizations (as shown in the paper):
      eval/process_results.py
      This script is intended to be used interactively within VS Code.

About

Learning a Continue-Thinking Token for Enhanced Test-Time Scaling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published