Skip to content

Conversation

@cchan
Copy link

@cchan cchan commented Jan 18, 2023

Takes a full copy of grad off the peak memory usage.

Numbers based on torch.cuda.max_memory_allocated():

  • For gpt-nano: 32019456 to 31666688
  • For gpt2-xl: 30634800640 to 24607903232 (6 gigabytes!)

@karpathy
Copy link
Owner

:O ???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants