Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
### 姓名
郑天宇

### 实习项目
面向大模型训练的高效分布式checkpoint系统研究

### 本周工作

1. 在paddleformers中接入aoa_reverse组件
2. 学习python的traceback逻辑,开发一套 AOATraceback 专门用于AOA解析与执行操作过程中的链式报错,为用户提供可追溯的报错信息
3. 优化AOA的报错信息,同时新增load hf权重时的映射检查(key to safetenfors file)


### 下周工作

1. Flex Checkpoint相关代码迁移到Paddlefleet库中,并在此基础上Refine。

### 导师评价



Loading