Skip to content

Conversation

@vvsvictor
Copy link

Resolves #1389

@google-cla
Copy link

google-cla bot commented Sep 26, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@rdyro
Copy link
Collaborator

rdyro commented Oct 6, 2025

Wouldn't setting bias_correction_v=False not skip applying bias b2 entirely now?

Is the point to not use the bias at all or change the order of the operations as discussed here: pytorch/pytorch#142323

Pytorch applies this the bias in amsgrad to this day: https://github.com/pytorch/pytorch/blob/2164b661219ab0a76aa018e955ba3d8e8f99c083/torch/optim/adam.py#L509

But tensorflow does not (I think): https://github.com/keras-team/keras/blob/f6c4ac55692c132cd16211f4877fac6dbeead749/keras/src/optimizers/adam.py#L130-L150

Copy link
Collaborator

@vroulet vroulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!
Let's just quickly agree on the new argument name and we should merge.

`None` then the `dtype` is inferred from `params` and `updates`.
bias_correction_v: Whether to apply bias correction to the second moment
estimate before taking the elementwise maximum (``nu_max``). Set to
``False`` to match the original AMSGrad paper and PyTorch/Keras behavior.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"match pytorch behavior" -> no, see the conversation.

So just say "set to False to match original AMSGrad paper"

eps=eps,
eps_root=eps_root,
mu_dtype=mu_dtype,
bias_correction_v=bias_correction_v
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bias_correction_v is not a great name.

bias_correction_nu is already better

debias_nu may even be better but "bias_correction" as a boolean argument is already used in e.g. rmsprop (shame on me for that naming).

@rdyro what do you think?

@vroulet
Copy link
Collaborator

vroulet commented Oct 10, 2025

Wouldn't setting bias_correction_v=False not skip applying bias b2 entirely now?

Is the point to not use the bias at all or change the order of the operations as discussed here: pytorch/pytorch#142323

Pytorch applies this the bias in amsgrad to this day: https://github.com/pytorch/pytorch/blob/2164b661219ab0a76aa018e955ba3d8e8f99c083/torch/optim/adam.py#L509

But tensorflow does not (I think): https://github.com/keras-team/keras/blob/f6c4ac55692c132cd16211f4877fac6dbeead749/keras/src/optimizers/adam.py#L130-L150

I think optax original implementation is the one that makes most sense (doing the bias correction after taking the max does not make sense to me).
However one could also simply remove the bias correction see plots in #1389. It seems to potentially improve and most importantly it aligns with the paper.

Copy link
Collaborator

@vroulet vroulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the paper, neither mu nor nu had bias corrections.
So to fully align with the paper give the option to remove both.
Namely have debias_mu: bool = True, debias_nu: bool = True for example. (or bias_correction_mu: bool = True, bias_correction_nu: bool = True)

Change the code accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AMSGrad implementation differs from PyTorch/TensorFlow

3 participants