Move OnlineDPOTrainer to experimental module #4473
Open
+2,487
−1,824
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses #4461 by moving
OnlineDPOTrainerandOnlineDPOConfigfromtrl.trainertotrl.experimental.online_dpoas part of the V1 refactoring effort (#4223, #4374).Changes Made
Module Structure
trl/experimental/online_dpo/module with:__init__.py- Module exportsonline_dpo_config.py- Configuration class (385 lines)online_dpo_trainer.py- Trainer class (1,475 lines) with fixed importsDeprecation Stubs
trl/trainer/online_dpo_config.pywith deprecation stub (~40 lines)trl/trainer/online_dpo_trainer.pywith deprecation stub (~120 lines)FutureWarningInternal Dependencies
trl/trainer/xpo_config.py- Now imports from experimentaltrl/trainer/xpo_trainer.py- Now imports from experimentaltrl/trainer/nash_md_config.py- Now imports from experimentaltrl/trainer/nash_md_trainer.py- Now imports from experimentalTests
tests/test_online_dpo_trainer.py- Uses experimental importstests/test_trainers_args.py- Uses experimental importsExamples
examples/scripts/online_dpo.py- Uses experimental importsexamples/scripts/online_dpo_vlm.py- Uses experimental importsDocumentation
docs/source/_toctree.yml- Moved from Trainers to Experimental sectiondocs/source/online_dpo_trainer.md- Quick start exampledocs/source/reducing_memory_usage.md- Import examplesdocs/source/vllm_integration.md- 3 code examplesdocs/source/speeding_up_training.md- Import examplesMigration Path
Before (deprecated, triggers FutureWarning)
After (recommended)
Backward Compatibility
The old import paths continue to work until TRL 0.29 but will show:
Testing
Related Issues