Skip to content

Conversation

@surfiniaburger
Copy link

Fixes #6049

Description

This pull request adds support for the new Gemma 3N family of models to the SafetensorsCkptLoader, enabling developers to convert these powerful, on-device multimodal models to the MediaPipe .task format.

Gemma 3N is specifically designed for high-performance on-device use. It introduces several key architectural innovations, including:

  1. A nested "Matryoshka" architecture, where the model weights for the core language model are nested inside a language_model. prefix.
  2. Per-Layer Embeddings (PLE), an efficient memory management technique that reduces the footprint on hardware accelerators like GPUs/TPUs.

This PR updates the converter to correctly handle this nested structure, making these state-of-the-art models accessible to the MediaPipe community.

Changes Implemented

The implementation was carefully revised to ensure no regressions were introduced for previously supported models.

  1. mediapipe/tasks/python/genai/converter/safetensors_converter.py

    • Introduced a more general is_nested flag in the GemmaMapper constructor. This replaces a more specific approach and is designed to handle any Gemma model with the nested language_model. prefix.
    • The update_target_name method now uses this is_nested flag to conditionally strip the prefix. This is a more robust solution that avoids breaking support for the existing Gemma3-4B model, which also uses this nested structure.
    • The SafetensorsCkptLoader has been updated to recognize the new special model names for the Gemma 3N series (e.g., GEMMA3N_4B, GEMMA_3N_E2B_IT, etc.).
    • The loader now correctly sets the is_nested=True flag for both the new Gemma 3N models and the pre-existing Gemma3-4B, ensuring both are handled correctly.
    • Also includes a minor fix for a variable name typo (raw vs raw_tensor) in the read_tensor_as_numpy function.
  2. mediapipe/tasks/python/genai/converter/safetensors_converter_test.py

    • To validate the changes and prevent regressions, a new parameterized test (testNestedGemmaConversion) has been added.
    • This single test case runs twice, validating the conversion logic for both a new model (GEMMA3N_4B) and the existing nested model (Gemma3-4B).
    • The test asserts two key outcomes:
      1. That the language_model. prefix is correctly stripped from all relevant tensor names.
      2. That non-language-model layers (e.g., vision_tower, multi_modal_projector) are correctly identified and skipped.

A Note on Testing

As detailed in the original issue, setting up the full local build environment for Python changes on an Apple Silicon Mac proved to be exceptionally challenging. Therefore, while the core logic has been thoroughly validated with the new parameterized unit tests, a full, end-to-end conversion could not be performed locally.

I would be grateful if the maintainers could rely on the project's CI pipeline and their own established environments for final validation. This contribution should enable many developers to bring the latest on-device models to their mobile applications.

Thank you for your consideration

@google-cla
Copy link

google-cla bot commented Oct 8, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@surfiniaburger
Copy link
Author

@google-cla check

so google-cli can update check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Gemma 3N models and improve local development setup for Python changes

1 participant