Llama-style transformer in PyTorch with multi-node / multi-GPU training. Includes pretraining, fine-tuning, DPO, LoRA, and knowledge distillation. Scripts for dataset mixing and training from scratch.
nlp machine-learning deep-learning text-generation pytorch transformer llama lora ddp knowledge-distillation distributed-training finetuning sft dpo huggingface pretraining llm instruction-tuning fsdp causal-lm
-
Updated
Oct 24, 2025 - Python