Add muon optimizer

What's up Burn community!
I'd like to suggest to add a new optimizer *muon*

### Feature description
Add the Muon optimizer to `burn-optim`. Muon (Momentum Orthogonalized by Newton-schulz) is a new optimizer specifically designed for training neural network hidden layers, particularly effective for large language models and transformers.

Muon combines SGD-momentum with Newton-Schulz orthogonalization, replacing each 2D parameter's update with the nearest orthogonal matrix. This approach provides:
- Faster convergence compared to Adam/AdamW
- Better stability in large-scale training
- Memory efficiency (similar to SGD-momentum)
- Automatic learning rate transfer across different model scales

**References**
- [original implementation](https://github.com/KellerJordan/Muon)
- [pytorch documentation](https://docs.pytorch.org/docs/stable/generated/torch.optim.Muon.html)

### Feature motivation
Adding Muon to Burn would benefit:
- Large language model training projects
- Users seeking faster convergence with less hyperparameter tuning
- Applications where memory efficiency is critical
- ...and more!

### (Optional) Suggest a Solution
- [draft PR for this](https://github.com/tracel-ai/burn/pull/3925)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add muon optimizer #3924

Feature description

Feature motivation

(Optional) Suggest a Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add muon optimizer #3924

Description

Feature description

Feature motivation

(Optional) Suggest a Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions