Skip to content

Conversation

@wkcn
Copy link
Contributor

@wkcn wkcn commented Dec 14, 2023

Description
The argument model_state.use_fp8_ddp is deprecated.
In MS-AMP examples, all of model_state.use_fp8_ddp are set to True. Besides, the function optimizer.all_reduce_grads has not been used.

Major Revision

  • Remove model_state.use_fp8_ddp
  • Remove optimizer.all_reduce_grads
  • Remove the related unittests
  • Update the unittest test_fp8linear_backward since the type of weight gradient is torch.Tensor when model_state.use_fp8_ddp is True.

@wkcn wkcn requested review from guoshzhao and tocean December 15, 2023 01:46
@tocean
Copy link
Contributor

tocean commented Dec 18, 2023

In MS-AMP-Examples, we used optimizer.all_reduce_grads. We need to remove it from examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants