Serious conclusion: LOMO does not significantly reduce GPU memory usage！

Through comparative experiments, we found that what really reduces GPU memory is "torch.set_default_dtype(torch.float16)" and deepspeed. We used LLaMA-7B to conduct experiments, using
 {
    "zero_optimization":{
    "stage": 0
},
"gradient_accumulation_steps": 1,
"steps_per_print": 2000,
"train_micro_batch_size_per_gpu": 1,
"wall_clock_breakdown": false
} 
configuration to cancel the deepspeed function. When we do not enable mixed precision, the output of the model is fp16. This result is obviously abnormal. After our  check, we found that “torch.set_default_dtype(torch.float16)” played a key role! When we remove deepspeed and “torch.set_default_dtype(torch.float16)”, according to the default configuration on wic datasets, out of memory on the 80G A100 card!. After adding "“torch.set_default_dtype(torch.float16)”, the memory is directly reduced to about 35G. According to normal mixed precision training, the author's LOMO still out of memory on the 80G A100 card!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serious conclusion: LOMO does not significantly reduce GPU memory usage！ #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Serious conclusion: LOMO does not significantly reduce GPU memory usage！ #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions