How do I run Gemma3_270M notebook result in Ollama? #3406

Coder3333 · 2025-10-03T18:20:09Z

Coder3333
Oct 3, 2025

I went through the entire Gemma3_270M example notebook. As long as I stay in the notebook, I can perform inference successfully. However, when I export it to my local machine and load it into Ollama, no matter what question I ask the model, it always gets stuck in an infinite loop, spitting out nonsense text until I stop it. What do I need to do to be able to run the fine tuning locally in Ollama?

Notebook:
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(270M).ipynb

Steps to reproduce:
Run full notebook without alteration and then do following steps.

# Saving to float16 for VLLM
model.save_pretrained_merged("gemma-3-finetune", tokenizer, save_method = "merged_16bit")

model.save_pretrained("gemma-3-finetune")
tokenizer.save_pretrained("gemma-3-finetune")


### GGUF / llama.cpp Conversion

# creates llama.cpp and model directories.  Saves gguf file to "gemma-3-finetune".  logs tell you where gguf file is saving to.
model.save_pretrained_gguf(
    "gemma-3-finetune",
    tokenizer,
    quantization_type = "Q8_0", # For now only Q8_0, BF16, F16 supported
)

# copy contents of ._ollama_modelfile to Modelfile in gemma-3-finetune directory
print(tokenizer._ollama_modelfile)

### save Modelfile to gemma-3-finetune folder locally
!zip -r /content/gemma-3-finetune.zip /content/gemma-3-finetune

Download gemma-3-finetune.Q8_0.gguf

Download gemma-3-finetune.zip.

Unzip gemma-3-finetune.zip.

Copy gemma-3-finetune.Q8_0.gguf to gemma-3-finetune.zip.

Save response from _ollama_modelfile as Modelfile to gemma-3-finetune directory.

Edit Modelfile so that top FROM points to .gguf file without extra path. (FROM gemma-3-finetune.Q8_0.gguf)

From gemma-3-finetune folder command prompt: ollama create unsloth_gemma3_model -f Modelfile

At this point you can run Ollama and use the unsloth_gemma3_model model. Whatever I ask it, I always get an infinite amount of gibberish, where the model in the notebook works just fine.