TongSearch QR (Previously known as TongSearch Reasoner) is the first query reasoning model family adapted for query reasoning in reasoning-intensive retrieval tasks. "QR" is short of query reasoning. Currently, it includes: TongSearch QR 1.5B and TongSearch QR 7B.
When equipped with standard retrievers, TongSearch-QR surpasses GPT-4 on BRIGHT (left figure) and achieves performance comparable to state-of-the-art large reasoning models, such as DeepSeek-R1 and QwQ-32B, while maintaining superior efficiency (right figure).
-
TongSearch-QR-1.5B 🧠 Size: 1.5B 📏 Sequence Length: 32K
-
TongSearch-QR-3B 🧠 Size: 3B 📏 Sequence Length: 32K
-
TongSearch-QR-7B 🧠 Size: 7B 📏 Sequence Length: 32K
Install dependencies:
❯ git clone https://github.com/bigai-nlco/TongSearch-QR
❯ cd TongSearch-QR
❯ conda env create -f environment.yml
❯ conda activate tongsearchThe following contains a code snippet illustrating how to use TongSearch-QR-7B to generate reasoning content for a given question.
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "TongSearch/TongSearch-QR-7B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
question = (
"Why can't humans drink sea water? It would seem to be a huge evolutionary advantage for an animal "
"to be able to drink sea water rather than have to rely on fresh water, and it's provably not impossible "
"due to the existence of sea mammals that must filter the sea water somehow. Could someone explain to me any likely reasons that this is the case?"
)
prompt = (
f"Instructions:\n"
f"1. Identify the essential problem.\n"
f"2. Think step by step to reason and describe what information could be relevant and helpful to address the questions in detail.\n"
f"3. Draft an answer with as many thoughts as you have.\n"
f"Query: {question}\n\n"
)
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)You can directly use our sampled training questions for V1 and V2 training datasets in:
data_train/raw_data/V1.json
data_train/raw_data/V2.jsonor you can randomly sample questions from HuggingFace H4 Stack Exchange Preference Dataset by:
❯ python ./scripts/data_process/sample_stackexchange.py --max-per-domain 1200 --save-path my_sample.jsonWe release the training questions with their reasoned results (for V1) / answers (for V2) in:
data_train/raw_data/V1-R1.json
data_train/raw_data/V1-QwQ.json
data_train/raw_data/V2.jsonYou can find their corresponding dataset file in:
data_train/built_dataset/V1-R1_no_think
data_train/built_dataset/V1-QwQ_no_think
data_train/built_dataset/V2_no_thinkYou can also transform your own reasoned data to the dataset format for subsequent training:
❯ python ./scripts/data_process/build_dataset.py \
--input data_train/raw_data/V1-LRM.json \
--output-dir data_train/built_dataset \
--output-name V1-LRM_no_thinkOur training process uses GRPO implemented by Huggingface TRL. First, launch the inference server for rollout:
❯ bash scripts/train/run_server.shThen, start the training process by:
❯ bash scripts/train/run_train.sh*Make sure that the server is using different GPUs than the trainer, otherwise you may run into NCCL errors.
We provide a group of scripts for evaluating the overall performance on BRIGHT Benchmark. After installing the dependencies requrired by the evaluation scripts of BRIGHT benchmark (see here). You may:
Use our reasoned results saved in:
data_evaluation/TongSearch-QR-1.5B_reasoned_BRIGHT_queries.json
data_evaluation/TongSearch-QR-7B_reasoned_BRIGHT_queries.jsonor produce new reasoning results by:
❯ export CUDA_VISIBLE_DEVICES=0,1
❯ python scripts/evaluation/question_rewriting.py \
--model_path TongSearch/TongSearch-QR-1.5B \
--output_file data_evaluation/TongSearch-QR-1.5B_reasoned_results.jsonThen, you can run the following scripts to evaluate with the reasoned queries.
❯ bash scripts/evaluation/run_bright_eval.sh \
bm25_with_reasoner \
data_evaluation/TongSearch-QR-1.5B_reasoned_BRIGHT_queries.json \
./outputsThis should be able to reproduce our reported results:
TongSearch is licensed under the MIT License. You are free to use, modify, and distribute this project under the terms of the MIT license.
@article{qin2025tongsearch,
title={TongSearch QR: Reinforced Query Reasoning for Retrieval},
author={Xubo Qin and Jun Bai and Jiaqi Li and Zixia Jia and Zilong Zheng},
year={2025},
eprint={2506.11603},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.11603},
}


