A repository for fine-tuning Moondream vision-language models using supervised and reinforcement learning approaches.
These trainers are focused on improving the model's ability to detect and localize objects in images, but in the future we can add the other tasks Moondream can do.
So far using this codebase, I've been able to improve the model's f1 score on a held out test set of detecting basketball players by 11%.
| Before | After |
|---|---|
![]() |
![]() |
This code works best for Moondream 2, and the teacher forced trainer (sft_trainer.py).
pip install -r requirements.txtDownload the Moondream 2 base model from Hugging Face:
wget https://huggingface.co/vikhyatk/moondream2/resolve/main/model.safetensors
mv model.safetensors moondream2/model.safetensorsFor Moondream 3, do the same, but place the model at models/model_md3.safetensors.
Any COCO style dataset will work. In this case I wanted to use something relatively difficult for the existing versions of Moondream so we could actually see some improvement. In this case I used the basketball player detection dataset made by RoboFlow. You can download that dataset here.
Place that (or any other COCO style dataset) in the datasets/{dataset_name}/ directory.
Teacher-forced region fine-tuning that follows the generative detection path exactly as used during inference for more aligned training. Supports optional LoRA adapters.
Run with default settings:
python sft_trainer.pyRun with custom parameters:
python sft_trainer.py --lr=1e-5 --epochs=5 --use_lora=True --grad_accum_steps=16Group Relative Policy Optimization (GRPO) trainer that uses reinforcement learning to fine-tune the region head by collecting rollouts and computing rewards based on detection quality.
Run with default settings:
python grpo_trainer.pyRun with custom parameters:
python grpo_trainer.py --learning_rate=5e-5 --batch_size=5 --num_rollouts=5 --num_epochs=3The following shell script runs a hyperparameter search for the sft_trainer.py script:
./run_hparam_sweep.shThe results are logged to Weights & Biases, and makes it easy to determine which config is best for your dataset.
Note that the run_hparam_sweep.sh script is just a wrapper around the sft_trainer.py script, so you can modify the script to run your own hyperparameter search.
See here.
This work is based on the following repositories and tutorials:

