Moondream Fine-tuning

A repository for fine-tuning Moondream vision-language models using supervised and reinforcement learning approaches.

These trainers are focused on improving the model's ability to detect and localize objects in images, but in the future we can add the other tasks Moondream can do.

So far using this codebase, I've been able to improve the model's f1 score on a held out test set of detecting basketball players by 11%.

Sample Results

Before	After

Warning:

This code works best for Moondream 2, and the teacher forced trainer (sft_trainer.py).

Setup

1. Install Dependencies

pip install -r requirements.txt

2. Download Base Model(s)

Download the Moondream 2 base model from Hugging Face:

wget https://huggingface.co/vikhyatk/moondream2/resolve/main/model.safetensors
mv model.safetensors moondream2/model.safetensors

For Moondream 3, do the same, but place the model at models/model_md3.safetensors.

3. Prepare Dataset

Any COCO style dataset will work. In this case I wanted to use something relatively difficult for the existing versions of Moondream so we could actually see some improvement. In this case I used the basketball player detection dataset made by RoboFlow. You can download that dataset here.

Place that (or any other COCO style dataset) in the datasets/{dataset_name}/ directory.

Trainers

1. SFT Trainer (`sft_trainer.py`)

Teacher-forced region fine-tuning that follows the generative detection path exactly as used during inference for more aligned training. Supports optional LoRA adapters.

Run with default settings:

python sft_trainer.py

Run with custom parameters:

python sft_trainer.py --lr=1e-5 --epochs=5 --use_lora=True --grad_accum_steps=16

2. GRPO Trainer (`grpo_trainer.py`)

Group Relative Policy Optimization (GRPO) trainer that uses reinforcement learning to fine-tune the region head by collecting rollouts and computing rewards based on detection quality.

Run with default settings:

python grpo_trainer.py

Run with custom parameters:

python grpo_trainer.py --learning_rate=5e-5 --batch_size=5 --num_rollouts=5 --num_epochs=3

Hyperparameter Search

The following shell script runs a hyperparameter search for the sft_trainer.py script:

./run_hparam_sweep.sh

The results are logged to Weights & Biases, and makes it easy to determine which config is best for your dataset.

Note that the run_hparam_sweep.sh script is just a wrapper around the sft_trainer.py script, so you can modify the script to run your own hyperparameter search.

Model Loading

See here.

References

This work is based on the following repositories and tutorials:

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
assets		assets
datasets		datasets
moondream2		moondream2
moondream3		moondream3
predictions		predictions
.gitignore		.gitignore
MODEL_ARTIFACTS_README.md		MODEL_ARTIFACTS_README.md
README.md		README.md
__init__.py		__init__.py
grpo_trainer.py		grpo_trainer.py
load_and_save_model_weights.py		load_and_save_model_weights.py
load_finetuned_model.py		load_finetuned_model.py
requirements.txt		requirements.txt
run_hparam_sweep.sh		run_hparam_sweep.sh
sft_trainer.py		sft_trainer.py
trainer_helpers.py		trainer_helpers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Moondream Fine-tuning

Sample Results

Warning:

Setup

1. Install Dependencies

2. Download Base Model(s)

3. Prepare Dataset

Trainers

1. SFT Trainer (`sft_trainer.py`)

2. GRPO Trainer (`grpo_trainer.py`)

Hyperparameter Search

Model Loading

References

About

Uh oh!

Releases

Packages

Languages

nkasmanoff/moondream-finetuning

Folders and files

Latest commit

History

Repository files navigation

Moondream Fine-tuning

Sample Results

Warning:

Setup

1. Install Dependencies

2. Download Base Model(s)

3. Prepare Dataset

Trainers

1. SFT Trainer (sft_trainer.py)

2. GRPO Trainer (grpo_trainer.py)

Hyperparameter Search

Model Loading

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. SFT Trainer (`sft_trainer.py`)

2. GRPO Trainer (`grpo_trainer.py`)

Packages