Add a bias_correction_v flag to scale_by_amsgrad to align with the original AMSGrad paper and Pytorch/tensorflow impl #1423

vvsvictor · 2025-09-26T14:40:28Z

Resolves #1389

google-cla · 2025-09-26T14:40:32Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

rdyro · 2025-10-06T17:14:55Z

Wouldn't setting bias_correction_v=False not skip applying bias b2 entirely now?

Is the point to not use the bias at all or change the order of the operations as discussed here: pytorch/pytorch#142323

Pytorch applies this the bias in amsgrad to this day: https://github.com/pytorch/pytorch/blob/2164b661219ab0a76aa018e955ba3d8e8f99c083/torch/optim/adam.py#L509

But tensorflow does not (I think): https://github.com/keras-team/keras/blob/f6c4ac55692c132cd16211f4877fac6dbeead749/keras/src/optimizers/adam.py#L130-L150

vroulet

Thanks for the change!
Let's just quickly agree on the new argument name and we should merge.

vroulet · 2025-10-10T23:01:33Z

optax/_src/transform.py

      `None` then the `dtype` is inferred from `params` and `updates`.
+    bias_correction_v: Whether to apply bias correction to the second moment
+      estimate before taking the elementwise maximum (``nu_max``). Set to
+      ``False`` to match the original AMSGrad paper and PyTorch/Keras behavior.


"match pytorch behavior" -> no, see the conversation.

So just say "set to False to match original AMSGrad paper"

vroulet · 2025-10-10T23:10:04Z

optax/_src/alias.py

+          eps=eps,
+          eps_root=eps_root,
+          mu_dtype=mu_dtype,
+          bias_correction_v=bias_correction_v


bias_correction_v is not a great name.

bias_correction_nu is already better

debias_nu may even be better but "bias_correction" as a boolean argument is already used in e.g. rmsprop (shame on me for that naming).

@rdyro what do you think?

vroulet · 2025-10-10T23:16:04Z

Wouldn't setting bias_correction_v=False not skip applying bias b2 entirely now?

Is the point to not use the bias at all or change the order of the operations as discussed here: pytorch/pytorch#142323

Pytorch applies this the bias in amsgrad to this day: https://github.com/pytorch/pytorch/blob/2164b661219ab0a76aa018e955ba3d8e8f99c083/torch/optim/adam.py#L509

But tensorflow does not (I think): https://github.com/keras-team/keras/blob/f6c4ac55692c132cd16211f4877fac6dbeead749/keras/src/optimizers/adam.py#L130-L150

I think optax original implementation is the one that makes most sense (doing the bias correction after taking the max does not make sense to me).
However one could also simply remove the bias correction see plots in #1389. It seems to potentially improve and most importantly it aligns with the paper.

vroulet

Looking at the paper, neither mu nor nu had bias corrections.
So to fully align with the paper give the option to remove both.
Namely have debias_mu: bool = True, debias_nu: bool = True for example. (or bias_correction_mu: bool = True, bias_correction_nu: bool = True)

Change the code accordingly

fix amsgrad: add option to disable bias correction on second moment (g…

b4ee246

…oogle-deepmind#1389)

remove trailing whitespace in alias.py

f858c76

vvsvictor mentioned this pull request Sep 26, 2025

AMSGrad implementation differs from PyTorch/TensorFlow #1389

Open

ci retrigger actions

e7abe46

vroulet reviewed Oct 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a bias_correction_v flag to scale_by_amsgrad to align with the original AMSGrad paper and Pytorch/tensorflow impl #1423

Add a bias_correction_v flag to scale_by_amsgrad to align with the original AMSGrad paper and Pytorch/tensorflow impl #1423

Uh oh!

vvsvictor commented Sep 26, 2025

Uh oh!

google-cla bot commented Sep 26, 2025

Uh oh!

rdyro commented Oct 6, 2025 •

edited

Loading

Uh oh!

vroulet left a comment

Uh oh!

vroulet Oct 10, 2025

Uh oh!

vroulet Oct 10, 2025

Uh oh!

vroulet commented Oct 10, 2025

Uh oh!

vroulet left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add a bias_correction_v flag to scale_by_amsgrad to align with the original AMSGrad paper and Pytorch/tensorflow impl #1423

Are you sure you want to change the base?

Add a bias_correction_v flag to scale_by_amsgrad to align with the original AMSGrad paper and Pytorch/tensorflow impl #1423

Uh oh!

Conversation

vvsvictor commented Sep 26, 2025

Uh oh!

google-cla bot commented Sep 26, 2025

Uh oh!

rdyro commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vroulet left a comment

Choose a reason for hiding this comment

Uh oh!

vroulet Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

vroulet Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

vroulet commented Oct 10, 2025

Uh oh!

vroulet left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rdyro commented Oct 6, 2025 •

edited

Loading