-
Notifications
You must be signed in to change notification settings - Fork 730
Open
Labels
featureThe feature requestThe feature request
Description
What's up Burn community!
I'd like to suggest to add a new optimizer muon
Feature description
Add the Muon optimizer to burn-optim. Muon (Momentum Orthogonalized by Newton-schulz) is a new optimizer specifically designed for training neural network hidden layers, particularly effective for large language models and transformers.
Muon combines SGD-momentum with Newton-Schulz orthogonalization, replacing each 2D parameter's update with the nearest orthogonal matrix. This approach provides:
- Faster convergence compared to Adam/AdamW
- Better stability in large-scale training
- Memory efficiency (similar to SGD-momentum)
- Automatic learning rate transfer across different model scales
References
Feature motivation
Adding Muon to Burn would benefit:
- Large language model training projects
- Users seeking faster convergence with less hyperparameter tuning
- Applications where memory efficiency is critical
- ...and more!
(Optional) Suggest a Solution
laggui and rubenjr0xstraven
Metadata
Metadata
Assignees
Labels
featureThe feature requestThe feature request