Skip to content

Commit a084ddd

Browse files
Update blogs/deepspeed-zenflow/README.md
Co-authored-by: Olatunji Ruwase <[email protected]>
1 parent 63e1d78 commit a084ddd

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

blogs/deepspeed-zenflow/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ For more detailed performance results, please refer to our [arXiv paper](https:/
8181

8282
Training large models with offloading can save GPU memory, but often at the cost of *performance*. In this section, we briefly discuss three topics. **First**, we explain why coupling CPU-side optimizer updates with GPU compute leads to severe GPU stalls during LLM fine-tuning. **Next**, we quantify how full-gradient offloading saturates the limited PCIe bandwidth on A100/H100 servers, inflating iteration time. **Finally**, we reveal the highly skewed importance distribution of gradients, showing that uniformly updating all parameters in GPUs at the same time is wasteful and unnecessary.
8383

84-
### CPU-Induced GPU Stalls
84+
### Offloading-Induced GPU Stalls
8585

8686

8787
<div align="center">

0 commit comments

Comments
 (0)