Move OnlineDPOTrainer to experimental module #4473

behroozazarkhalili · 2025-11-05T22:09:52Z

Summary

This PR addresses #4461 by moving OnlineDPOTrainer and OnlineDPOConfig from trl.trainer to trl.experimental.online_dpo as part of the V1 refactoring effort (#4223, #4374).

Changes Made

Module Structure

✅ Created trl/experimental/online_dpo/ module with:
- __init__.py - Module exports
- online_dpo_config.py - Configuration class (385 lines)
- online_dpo_trainer.py - Trainer class (1,475 lines) with fixed imports

Deprecation Stubs

✅ Replaced trl/trainer/online_dpo_config.py with deprecation stub (~40 lines)
✅ Replaced trl/trainer/online_dpo_trainer.py with deprecation stub (~120 lines)
Both stubs inherit from experimental versions and show FutureWarning

Internal Dependencies

✅ Updated trl/trainer/xpo_config.py - Now imports from experimental
✅ Updated trl/trainer/xpo_trainer.py - Now imports from experimental
✅ Updated trl/trainer/nash_md_config.py - Now imports from experimental
✅ Updated trl/trainer/nash_md_trainer.py - Now imports from experimental

Tests

✅ Updated tests/test_online_dpo_trainer.py - Uses experimental imports
✅ Updated tests/test_trainers_args.py - Uses experimental imports

Examples

✅ Updated examples/scripts/online_dpo.py - Uses experimental imports
✅ Updated examples/scripts/online_dpo_vlm.py - Uses experimental imports

Documentation

✅ Updated docs/source/_toctree.yml - Moved from Trainers to Experimental section
✅ Updated docs/source/online_dpo_trainer.md - Quick start example
✅ Updated docs/source/reducing_memory_usage.md - Import examples
✅ Updated docs/source/vllm_integration.md - 3 code examples
✅ Updated docs/source/speeding_up_training.md - Import examples

Migration Path

Before (deprecated, triggers FutureWarning)

from trl import OnlineDPOConfig, OnlineDPOTrainer

After (recommended)

from trl.experimental.online_dpo import OnlineDPOConfig, OnlineDPOTrainer

Backward Compatibility

The old import paths continue to work until TRL 0.29 but will show:

FutureWarning: The `OnlineDPOTrainer` is now located in `trl.experimental`. 
Please update your imports to `from trl.experimental.online_dpo import OnlineDPOTrainer`. 
The current import path will be removed and no longer supported in TRL 0.29.

Testing

All existing tests continue to pass with updated imports
Deprecation warnings are properly triggered for old import paths
XPO and NashMD trainers (which inherit from OnlineDPO) work correctly

Related Issues

Closes Move OnlineDPOTrainer to trl.experimental #4461
Part of [RFC] Moving Most TRL Trainers to the experimental Submodule to Streamline the Core #4223 (V1 refactoring)
Part of Road to v1 #4374 (Move to experimental)

This commit addresses issue #4461 by moving OnlineDPOTrainer and OnlineDPOConfig from trl.trainer to trl.experimental.online_dpo as part of the V1 refactoring effort. Changes: - Created trl/experimental/online_dpo/ module with OnlineDPOConfig and OnlineDPOTrainer - Replaced original files with deprecation stubs showing FutureWarning - Updated XPO and NashMD trainers to import from experimental module - Updated all test files to use experimental imports - Updated example scripts (online_dpo.py, online_dpo_vlm.py) - Updated documentation imports across 5 doc files - Moved Online DPO from Trainers to Experimental section in docs The deprecation stubs maintain backward compatibility until TRL 0.29. All imports from trl.trainer.online_dpo_* will trigger FutureWarning. Related: #4223, #4374 Closes: #4461

HuggingFaceDocBuilderDev · 2025-11-05T22:12:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

- Reorder config import after utils imports in experimental/online_dpo/online_dpo_trainer.py - Remove unused typing.Any import from trainer/online_dpo_trainer.py All ruff checks now pass.

…dpo-to-experimental # Conflicts: # docs/source/online_dpo_trainer.md # examples/scripts/online_dpo.py # trl/trainer/nash_md_config.py # trl/trainer/online_dpo_trainer.py # trl/trainer/xpo_config.py # trl/trainer/xpo_trainer.py

…OnlineDPOTrainer - Update index.md to use experimental.online_dpo.OnlineDPOTrainer with 🧪 emoji - Update dataset_formats.md to use experimental.online_dpo.OnlineDPOTrainer - Update example_overview.md to use experimental.online_dpo.OnlineDPOTrainer (2 occurrences) - Move test file to tests/experimental/ directory - Update test imports from . to ..

behroozazarkhalili added 3 commits November 5, 2025 18:56

Fix ruff linting errors for OnlineDPOTrainer migration

6982489

- Reorder config import after utils imports in experimental/online_dpo/online_dpo_trainer.py - Remove unused typing.Any import from trainer/online_dpo_trainer.py All ruff checks now pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move OnlineDPOTrainer to experimental module #4473

Move OnlineDPOTrainer to experimental module #4473

behroozazarkhalili commented Nov 5, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Move OnlineDPOTrainer to experimental module #4473

Are you sure you want to change the base?

Move OnlineDPOTrainer to experimental module #4473

Conversation

behroozazarkhalili commented Nov 5, 2025

Summary

Changes Made

Module Structure

Deprecation Stubs

Internal Dependencies

Tests

Examples

Documentation

Migration Path

Before (deprecated, triggers FutureWarning)

After (recommended)

Backward Compatibility

Testing

Related Issues

Uh oh!

HuggingFaceDocBuilderDev commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants