Introduction

This repository contains all the code necessary to reproduce the results from the paper: "Learning a Continue-Thinking Token for Enhanced Test-Time Scaling".

Training Instructions

Clone and install the Open-R1 repository.
Enter the trl-lib directory and install it:
```
cd trl-lib
pip install -e .
```
Launch the training process using the script:
```
scripts/launch_vllms_and_train.sh
```
This script will save the embedding of the new token in the scripts/ directory.

Evaluation

Enter the evaluation harness directory and install dependencies:
```
cd eval/lm-evaluation-harness
pip install -e .[math,vllm]
```
Use the save_model.py script to load the trained token weights into a model and save the resulting model locally.
Modify eval/commands.py to configure the desired evaluation tasks.
Run the experiments using the script:
```
eval/submit_all_sbatch.sh
```
Evaluate the results using the LLM:
- Start the LLM evaluator via vLLM:
```
vllm serve Qwen/Qwen2.5-7B-Instruct
```
- Run the script to generate results and visualizations (as shown in the paper):
```
eval/process_results.py
```
  This script is intended to be used interactively within VS Code.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
eval		eval
recipes		recipes
scripts		scripts
trl-lib		trl-lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
grpo.py		grpo.py
my_modeling_qwen2.py		my_modeling_qwen2.py
requirements.txt		requirements.txt
save_model.py		save_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Training Instructions

Evaluation

About

Uh oh!

Releases

Packages

Languages

License

liranringel/learning-continue-thinking-token

Folders and files

Latest commit

History

Repository files navigation

Introduction

Training Instructions

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages