Skip to content

Commit e0e2cc3

Browse files
committed
Squashed commit of the following:
commit 4cb1a25 Author: Kashif Rasul <[email protected]> Date: Sat Nov 22 23:31:29 2025 +0100 [SFT] Log mean token accuracy from Liger kernel (#4302) Co-authored-by: Quentin Gallouédec <[email protected]> commit 468b9d4 Author: Susant <[email protected]> Date: Sun Nov 23 03:40:32 2025 +0530 docs: add KTO (2402.01306) to Paper Index + link ref to KTOTrainer (#4440) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 9bc6206 Author: Behrooz Azarkhalili <[email protected]> Date: Fri Nov 21 17:34:50 2025 -0800 Move PRMTrainer to trl.experimental.prm (#4483) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit f7ac974 Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 21 16:01:04 2025 +0100 Update OpenEnv guide with new notebook (#4555) commit c0de042 Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 21 15:40:25 2025 +0100 Add GRPO Wordle OpenEnv Colab (#4542) commit 9f8ef40 Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 20 22:36:31 2025 -0800 [ORPO] Move ORPOTrainer to experimental (#4480) commit 3bb5d76 Author: Jen Wei <[email protected]> Date: Thu Nov 20 18:53:10 2025 -0700 fix+docs: `device_map=None` for DeepSpeed and add ZeRO paper (1910.02054) to Paper Index (#4551) commit 375b3eb Author: Jonny Li <[email protected]> Date: Thu Nov 20 19:42:45 2025 -0500 Add target_parameters to LoraConfig (#4536) commit 237900d Author: Kristian Schwethelm <[email protected]> Date: Thu Nov 20 23:03:20 2025 +0100 Fix bug with VLM processors in prompt-completion completion text-only training (#4553) Co-authored-by: Quentin Gallouédec <[email protected]> commit 52ed4df Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 20 21:41:23 2025 +0000 Fix style OpenEnv example commit a263946 Author: Sergio Paniego Blanco <[email protected]> Date: Thu Nov 20 14:44:15 2025 +0100 Update OpenEnv guide with latest details (#4552) Co-authored-by: burtenshaw <[email protected]> commit 1a9ff52 Author: Kashif Rasul <[email protected]> Date: Wed Nov 19 15:34:25 2025 +0100 [OpenEnv] browsergym example script (#4539) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 6cbcd94 Author: Sergio Paniego Blanco <[email protected]> Date: Wed Nov 19 14:39:44 2025 +0100 Update OpenEnv example scripts (#4547) commit 8510589 Author: Sergio Paniego Blanco <[email protected]> Date: Wed Nov 19 14:39:20 2025 +0100 Add OpenEnv Script examples to docs (#4533) commit e622196 Author: Quentin Gallouédec <[email protected]> Date: Mon Nov 17 03:12:30 2025 -0700 [Doc] Drop dummy reward and dataset for DeepMath-103K and accuracy reward (#4524) commit 1b1242c Author: Kashif Rasul <[email protected]> Date: Fri Nov 14 20:51:41 2025 +0100 [OpenEnv] add vllm colocate mode to openenv scripts (#4510) Co-authored-by: Sergio Paniego Blanco <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit f39d18a Author: Fabio Milentiansen Sim <[email protected]> Date: Fri Nov 14 23:39:02 2025 +0700 fix(GOLDTrainer): Resolve incorrect attribute access and VLLMClient.generate() output type (#4526) commit d45eaab Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 14 12:12:09 2025 +0100 Add vLLM quantization option for colocate (#4496) Co-authored-by: Kashif Rasul <[email protected]> commit a91d4b3 Author: Sergio Paniego Blanco <[email protected]> Date: Fri Nov 14 02:19:08 2025 +0100 Prevent upcasting norm layers in `prepare_model_for_kbit_training` (#4457) Co-authored-by: Quentin Gallouédec <[email protected]> commit 121318e Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 17:13:16 2025 -0800 docs: Extend CLI basic usage examples to all supported CLIs (#4425) Co-authored-by: Sergio Paniego Blanco <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 7918320 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 13:20:52 2025 -0700 Remove test trainer args (#4517) commit 102dc41 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 12:36:43 2025 -0700 Rename `flash-attn` to `flash-attn2` (#4514) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 5de62b0 Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 12:05:48 2025 -0700 Add step time metric to GRPO Trainer for performance tracking (#4516) Co-authored-by: lewtun <[email protected]> commit f1e6377 Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 11:01:19 2025 -0800 Move PPOTrainer to trl.experimental.ppo (#4482) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 01f497e Author: Behrooz Azarkhalili <[email protected]> Date: Thu Nov 13 10:14:58 2025 -0800 Move NashMDTrainer to experimental module (#4477) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit b6c838a Author: Quentin Gallouédec <[email protected]> Date: Thu Nov 13 16:53:26 2025 +0000 `aws-general-8-plus` runner for Docker build commit ed5c7bb Author: YangKai0616 <[email protected]> Date: Fri Nov 14 00:42:48 2025 +0800 [Bug Fix] OnlineDPOTrainer with vLLM Server Mode (#4500) commit ded9bc6 Author: lewtun <[email protected]> Date: Thu Nov 13 17:33:59 2025 +0100 Fix Docker images for Liger (#4522) commit fd04760 Author: Pramodith Ballapuram <[email protected]> Date: Thu Nov 13 11:31:10 2025 +0000 Paper Index: Change `num_completions` to `num_generations` (#4515) commit b7918c0 Author: Behrooz Azarkhalili <[email protected]> Date: Wed Nov 12 20:35:44 2025 -0800 Move GKDTrainer to experimental module (#4474) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 07b5011 Author: Tamoghno Kandar <[email protected]> Date: Wed Nov 12 20:07:33 2025 -0800 Replace flash attention2 with kernels-community/flash-attn2 (#4426) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 7a57fd4 Author: Yuxian Gu <[email protected]> Date: Thu Nov 13 11:16:20 2025 +0800 MiniLLM: Fix arguments in config & add to documentation index (#4518) commit a145eaf Author: Behrooz Azarkhalili <[email protected]> Date: Wed Nov 12 16:35:46 2025 -0800 refactor: Move CPOTrainer to experimental module (#4470) commit d2dc717 Author: Taha Yassine <[email protected]> Date: Thu Nov 13 00:56:47 2025 +0100 Replace `wandb_log_unique_prompts` with `log_unique_prompts` (#4508) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> commit 799b39b Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 16:21:05 2025 -0700 `device_map` and `dtype` to `"auto"` by default (#4509) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit a6a2beb Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 09:42:31 2025 -0700 Add temporary workaround for `lr_scheduler_kwargs` dtype issue in Transformers 4.57.0 (#4513) commit 346701a Author: lewtun <[email protected]> Date: Wed Nov 12 17:42:18 2025 +0100 Replace accelerate logging with stdlib in CLI (#4512) commit 4db63af Author: Quentin Gallouédec <[email protected]> Date: Wed Nov 12 02:19:51 2025 +0000 Fix GRPO unsqueeze advantages commit ecb2811 Author: Yuxian Gu <[email protected]> Date: Wed Nov 12 10:17:22 2025 +0800 Add MiniLLM Trainer (#4504) Co-authored-by: Quentin Gallouédec <[email protected]> commit 89e4688 Author: Taha Yassine <[email protected]> Date: Tue Nov 11 20:36:23 2025 +0100 Add support for images inside tables with Trackio completions logging (#4505) commit 2d3279c Author: lewtun <[email protected]> Date: Tue Nov 11 19:22:25 2025 +0100 Tweak description for vLLM sleep mode (#4506) Co-authored-by: Quentin Gallouédec <[email protected]>
1 parent 20415b2 commit e0e2cc3

File tree

116 files changed

+9932
-6182
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

116 files changed

+9932
-6182
lines changed

.github/workflows/docker-build.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ concurrency:
1313
jobs:
1414
trl:
1515
name: "Build and push TRL Docker image"
16-
runs-on: ubuntu-latest
16+
runs-on:
17+
group: aws-general-8-plus
1718
steps:
1819
- name: Checkout code
1920
uses: actions/checkout@v4
@@ -52,7 +53,8 @@ jobs:
5253

5354
trl-dev:
5455
name: "Build and push TRL Dev Docker image"
55-
runs-on: ubuntu-latest
56+
runs-on:
57+
group: aws-general-8-plus
5658
steps:
5759
- name: Checkout code
5860
uses: actions/checkout@v4

README.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Explore how to seamlessly integrate TRL with OpenEnv in our [dedicated documenta
2525

2626
## Overview
2727

28-
TRL is a cutting-edge library designed for post-training foundation models using advanced techniques like Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO). Built on top of the [🤗 Transformers](https://github.com/huggingface/transformers) ecosystem, TRL supports a variety of model architectures and modalities, and can be scaled-up across various hardware setups.
28+
TRL is a cutting-edge library designed for post-training foundation models using advanced techniques like Supervised Fine-Tuning (SFT), Group Realtive Policy Optimization (GRPO), and Direct Preference Optimization (DPO). Built on top of the [🤗 Transformers](https://github.com/huggingface/transformers) ecosystem, TRL supports a variety of model architectures and modalities, and can be scaled-up across various hardware setups.
2929

3030
## Highlights
3131

@@ -92,16 +92,13 @@ trainer.train()
9292
```python
9393
from datasets import load_dataset
9494
from trl import GRPOTrainer
95+
from trl.rewards import accuracy_reward
9596

96-
dataset = load_dataset("trl-lib/tldr", split="train")
97-
98-
# Dummy reward function: count the number of unique characters in the completions
99-
def reward_num_unique_chars(completions, **kwargs):
100-
return [len(set(c)) for c in completions]
97+
dataset = load_dataset("trl-lib/DeepMath-103K", split="train")
10198

10299
trainer = GRPOTrainer(
103100
model="Qwen/Qwen2-0.5B-Instruct",
104-
reward_funcs=reward_num_unique_chars,
101+
reward_funcs=accuracy_reward,
105102
train_dataset=dataset,
106103
)
107104
trainer.train()

docker/trl-dev/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
FROM pytorch/pytorch:2.8.0-cuda12.8-cudnn9-runtime
1+
FROM pytorch/pytorch:2.8.0-cuda12.8-cudnn9-devel
22
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
33
RUN pip install --upgrade pip uv
44
RUN uv pip install --system --no-cache "git+https://github.com/huggingface/trl.git#egg=trl[liger,peft,vlm]"
5-
RUN uv pip install --system hf_transfer kernels liger_kernel peft trackio
5+
RUN uv pip install --system kernels liger_kernel peft trackio

docker/trl/Dockerfile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1-
FROM pytorch/pytorch:2.8.0-cuda12.8-cudnn9-runtime
1+
FROM pytorch/pytorch:2.8.0-cuda12.8-cudnn9-devel
2+
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
23
RUN pip install --upgrade pip uv
3-
RUN uv pip install --system trl[liger,peft,vlm] hf_transfer kernels trackio
4+
RUN uv pip install --system trl[liger,peft,vlm] kernels trackio

docs/source/_toctree.yml

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -56,24 +56,12 @@
5656
title: Examples
5757
- sections:
5858
- sections: # Sorted alphabetically
59-
- local: cpo_trainer
60-
title: CPO
6159
- local: dpo_trainer
6260
title: DPO
63-
- local: gkd_trainer
64-
title: GKD
6561
- local: grpo_trainer
6662
title: GRPO
6763
- local: kto_trainer
6864
title: KTO
69-
- local: nash_md_trainer
70-
title: Nash-MD
71-
- local: orpo_trainer
72-
title: ORPO
73-
- local: ppo_trainer
74-
title: PPO
75-
- local: prm_trainer
76-
title: PRM
7765
- local: reward_trainer
7866
title: Reward
7967
- local: rloo_trainer
@@ -103,10 +91,12 @@
10391
title: BEMA for Reference Model
10492
- local: bco_trainer
10593
title: BCO
106-
- local: online_dpo_trainer
107-
title: Online DPO
94+
- local: cpo_trainer
95+
title: CPO
10896
- local: gfpo
10997
title: GFPO
98+
- local: gkd_trainer
99+
title: GKD
110100
- local: gold_trainer
111101
title: GOLD
112102
- local: grpo_with_replay_buffer
@@ -115,8 +105,20 @@
115105
title: GSPO-token
116106
- local: judges
117107
title: Judges
108+
- local: minillm
109+
title: MiniLLM
110+
- local: nash_md_trainer
111+
title: Nash-MD
112+
- local: online_dpo_trainer
113+
title: Online DPO
114+
- local: orpo_trainer
115+
title: ORPO
118116
- local: papo_trainer
119117
title: PAPO
118+
- local: ppo_trainer
119+
title: PPO
120+
- local: prm_trainer
121+
title: PRM
120122
- local: xpo_trainer
121123
title: XPO
122124
- local: openenv

0 commit comments

Comments
 (0)