Skip to content

Conversation

@junjunjd
Copy link

@junjunjd junjunjd commented Dec 2, 2025

Add Gguf wrapper in quantized Gemma3 loader to fix missing metadata lookups.

Fixes #3215.

@ivarflakstad
Copy link
Member

@junjunjd Thanks for this!
I tried running the embedding-gemma model mentioned in the issue and unfortunately this doesn't fix the issue. Luckily the problem isn't has complex as it may seem.
Default gemma3 uses the prefix gemma3.*, while the embedding-gemma model uses gemma-embedding.* (ref).
ModelWeights in quantized_gemma3.rs is hard coded to use gemma3.
I'll post the findings in the issue as well.

@clocksmith
Copy link
Contributor

clocksmith commented Dec 30, 2025

@junjunjd Thanks for taking a crack at this! The GGUF wrapper is a nice cleanup for the tensor loading calls

Unfortunately this doesn't quite fix the root issue. The problem is that md_get("gemma3.attention.head_count") still has the prefix hardcoded , and the wrapper passes through to the same metadata without transforming the keys.

Like @ivarflakstad mentioned

If the actual prefix can be detected before lookups, then the problem can be addressed there. Luckily it can!

The fix

Right before this existing on line 266 (currently not as robust):

let md_get = |s: &str| match ct.metadata.get(s) {
      None => candle::bail!("cannot find {s} in metadata"),
      Some(v) => Ok(v),
  };

This can be added, along with a tweak the let md_get block above

  // NEW
  let prefix = ["gemma3", "gemma2", "gemma", "gemma-embedding"]
      .iter()
      .find(|p| ct.metadata.contains_key(&format!("{}.attention.head_count", p)))
      .copied()
      .unwrap_or("gemma3");

  // TWEAKED
  let md_get = |s: &str| {
      let key = format!("{prefix}.{s}");
      match ct.metadata.get(&key) {
          None => candle::bail!("cannot find {key} in metadata"),
          Some(v) => Ok(v),
      }
  };

Then replace all instance of md_get("gemma3.attention.head_count") with md_get("attention.head_count") in the following lines 271-285.

Notes

BONUS: This proposed solutionalso has the advantage of robustness by "auto" discovering any future Gemma variants (gemma4, gemma-vision, etc.) without code changes.

NOTE: This wrapper could still be useful alongside this for a cleaner tensor API - they're complementary changes!

ALTS: I had this issue on something similar, but solution was not as tight, here: https://github.com/clocksmith/gamma/blob/83d0565c938b18a363026070b526885682450c79/src/core/hardware/gguf_parser.py#L129 This also works, but the hardcoded list of names is probably better for a library of this maturity.

Happy to help further if you'd like to update the PR.

@ivarflakstad
Copy link
Member

Closing since the root issue has been fixed. If you want to contribute with the intention of improving the gguf API feel free :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cannot find gemma3.attention.head_count in metadata

3 participants