Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Summary

This PR addresses #4461 by moving OnlineDPOTrainer and OnlineDPOConfig from trl.trainer to trl.experimental.online_dpo as part of the V1 refactoring effort (#4223, #4374).

Changes Made

Module Structure

  • ✅ Created trl/experimental/online_dpo/ module with:
    • __init__.py - Module exports
    • online_dpo_config.py - Configuration class (385 lines)
    • online_dpo_trainer.py - Trainer class (1,475 lines) with fixed imports

Deprecation Stubs

  • ✅ Replaced trl/trainer/online_dpo_config.py with deprecation stub (~40 lines)
  • ✅ Replaced trl/trainer/online_dpo_trainer.py with deprecation stub (~120 lines)
  • Both stubs inherit from experimental versions and show FutureWarning

Internal Dependencies

  • ✅ Updated trl/trainer/xpo_config.py - Now imports from experimental
  • ✅ Updated trl/trainer/xpo_trainer.py - Now imports from experimental
  • ✅ Updated trl/trainer/nash_md_config.py - Now imports from experimental
  • ✅ Updated trl/trainer/nash_md_trainer.py - Now imports from experimental

Tests

  • ✅ Updated tests/test_online_dpo_trainer.py - Uses experimental imports
  • ✅ Updated tests/test_trainers_args.py - Uses experimental imports

Examples

  • ✅ Updated examples/scripts/online_dpo.py - Uses experimental imports
  • ✅ Updated examples/scripts/online_dpo_vlm.py - Uses experimental imports

Documentation

  • ✅ Updated docs/source/_toctree.yml - Moved from Trainers to Experimental section
  • ✅ Updated docs/source/online_dpo_trainer.md - Quick start example
  • ✅ Updated docs/source/reducing_memory_usage.md - Import examples
  • ✅ Updated docs/source/vllm_integration.md - 3 code examples
  • ✅ Updated docs/source/speeding_up_training.md - Import examples

Migration Path

Before (deprecated, triggers FutureWarning)

from trl import OnlineDPOConfig, OnlineDPOTrainer

After (recommended)

from trl.experimental.online_dpo import OnlineDPOConfig, OnlineDPOTrainer

Backward Compatibility

The old import paths continue to work until TRL 0.29 but will show:

FutureWarning: The `OnlineDPOTrainer` is now located in `trl.experimental`. 
Please update your imports to `from trl.experimental.online_dpo import OnlineDPOTrainer`. 
The current import path will be removed and no longer supported in TRL 0.29.

Testing

  • All existing tests continue to pass with updated imports
  • Deprecation warnings are properly triggered for old import paths
  • XPO and NashMD trainers (which inherit from OnlineDPO) work correctly

Related Issues

This commit addresses issue #4461 by moving OnlineDPOTrainer and
OnlineDPOConfig from trl.trainer to trl.experimental.online_dpo as
part of the V1 refactoring effort.

Changes:
- Created trl/experimental/online_dpo/ module with OnlineDPOConfig and OnlineDPOTrainer
- Replaced original files with deprecation stubs showing FutureWarning
- Updated XPO and NashMD trainers to import from experimental module
- Updated all test files to use experimental imports
- Updated example scripts (online_dpo.py, online_dpo_vlm.py)
- Updated documentation imports across 5 doc files
- Moved Online DPO from Trainers to Experimental section in docs

The deprecation stubs maintain backward compatibility until TRL 0.29.
All imports from trl.trainer.online_dpo_* will trigger FutureWarning.

Related: #4223, #4374
Closes: #4461
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

- Reorder config import after utils imports in experimental/online_dpo/online_dpo_trainer.py
- Remove unused typing.Any import from trainer/online_dpo_trainer.py

All ruff checks now pass.
…dpo-to-experimental

# Conflicts:
#	docs/source/online_dpo_trainer.md
#	examples/scripts/online_dpo.py
#	trl/trainer/nash_md_config.py
#	trl/trainer/online_dpo_trainer.py
#	trl/trainer/xpo_config.py
#	trl/trainer/xpo_trainer.py
…OnlineDPOTrainer

- Update index.md to use experimental.online_dpo.OnlineDPOTrainer with 🧪 emoji
- Update dataset_formats.md to use experimental.online_dpo.OnlineDPOTrainer
- Update example_overview.md to use experimental.online_dpo.OnlineDPOTrainer (2 occurrences)
- Move test file to tests/experimental/ directory
- Update test imports from . to ..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move OnlineDPOTrainer to trl.experimental

3 participants