Skip to content

Conversation

@Zhuohao-Li
Copy link
Contributor

As titled, provide an aligned style script to run Qwen3-4B w/ FSDP under scripts

I suggest to make more detailed doc for FSDP user regarding the optimization args.

@Zhuohao-Li
Copy link
Contributor Author

image image @zhuzilin

@fzyzcjy
Copy link
Collaborator

fzyzcjy commented Nov 2, 2025

qq: what is your truncate ratio vs time? iirc we deprecated to use 8k rollout + qwen3-4b b/c it is somehow learning to speak shorter instead of learning math...

also could you please also paste the eval aime accuracy

@Zhuohao-Li
Copy link
Contributor Author

Zhuohao-Li commented Nov 2, 2025

qq: what is your truncate ratio vs time? iirc we deprecated to use 8k rollout + qwen3-4b b/c it is somehow learning to speak shorter instead of learning math...

also could you please also paste the eval aime accuracy

yes, it is indeed shorter...

image

it is better to use 32768 rollout intuitively, maybe we should also update scripts/run-qwen3-4B.sh w/ megatron, and use extra eval bench (gpqa/math) since aime-24 has limited data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants