Skip to content

Is dynamic weight update / fine-tuning supported in QNN / XNNPACK backends? #16123

@qqqqqqqwy

Description

@qqqqqqqwy

🚀 The feature, motivation and pitch

I’m working on a research project to fine-tune a model on Android devices. I am exploring using ExecuTorch + QNN or XNNPACK backend for inference acceleration, but need to ensure that the backend can support dynamic modification of weights (i.e., after initialization, allow updating weights / biases, and then run forward again).

What I found

  • In executorch/extension/training/examples/XOR/train.cpp, training code based on executorch is provided, but it does not mention the supported backends.
  • The official XNNPACK backend documentation describes that, during runtime initialization, weights / biases are “packed” (i.e. weight packing) into XNNPACK’s internal data structures, and the original preprocessed blob’s data is freed. This seems to imply that weights become static / immutable from the perspective of the backend’s execute graph.
  • I did not find description in the docs or runtime API of any mechanism to “unlock” or “update” those packed weights at runtime.
  • There is an existing issue (PT2E Dynamic Quantization with XNNPACK Backend for Android fails Loading of method forward in runtime #11355) reporting that even dynamic quantization + XNNPACK + Android may fail to load “forward” method, which suggests that non-static quantization / dynamic behavior is fragile or unsupported.
  • For QNN backend, I saw open / triaged issues about compilation or binary loading, but none that explicitly mention support for runtime weight update.

My questions

  1. Does ExecuTorch (any of its backends: QNN, XNNPACK, Vulkan, etc.) currently support runtime in-place weight updates (i.e. treat model weights as mutable parameters, allow updating them between forward calls, as required in fine-tuning / training / zeroth-order optimization)?
  2. If not supported, is there a recommended workflow / workaround for on-device fine-tuning with ExecuTorch? Or is this explicitly out of scope?
  3. If it’s not currently supported, would the maintainers be open to considering such a feature in future (e.g. a “mutable weight” delegate, or mechanism to reload new weights into backend graph)?

Thank you for your time and for developing ExecuTorch — it is a great tool for on-device inference / deployment, and I hope it can support on-device fine-tuning in the future.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

cc @JacobSzwejbka

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: trainingIssues related to training models on edge devices

    Type

    No type

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions