Feature: Add Z-Image-Turbo regional guidance #8672

Pfannkuchensack · 2025-12-04T23:51:38Z

Summary

This PR adds Regional Guidance support for Z-Image (S3-DiT Transformer) models, enabling users to apply different prompts to different regions of the image using attention masks.

Key implementation details:

Backend:

New ZImageRegionalPromptingExtension class that builds regional attention masks
New ZImageTextConditioning and ZImageRegionalTextConditioning dataclasses for managing regional text embeddings
Transformer forward patching via patch_transformer_for_regional_prompting context manager
Attention mask format: 4D additive float mask (0.0 = attend, -inf = block) in bfloat16 dtype
Alternating layer strategy: even layers use regional mask, odd layers use full attention for global coherence
Z-Image uses sequence order [img_tokens, txt_tokens] (different from FLUX's [txt_tokens, img_tokens])

Frontend:

Updated buildZImageGraph.ts to support regional conditioning collectors
Updated addRegions.ts to create z_image_text_encoder nodes for Z-Image regions
Updated addZImageLoRAs.ts to handle optional negCond when guidance_scale=0
Added Z-Image validation in validators.ts (no IP adapters, no autoNegative support)
Negative conditioning nodes only created when guidance_scale > 0

Related Issues / Discussions

#8670
Extends Z-Image support (from the Z-Image-Turbo PR) with regional prompting capabilities.

QA Instructions

Select a Z-Image model (e.g., Z-Image-Turbo)
Create two or more Regional Guidance layers in the Control Layers panel
Draw masks for each region
Add different prompts to each region (e.g., "red apple" for left region, "blue sky" for right region)
Add a global prompt (optional)
Generate image
Verify that different regions follow their respective prompts

Merge Plan

Should be merged after the main Z-Image support PR (feat/z-image-turbo-support), as this builds on top of that implementation.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

Add comprehensive support for Z-Image-Turbo (S3-DiT) models including: Backend: - New BaseModelType.ZImage in taxonomy - Z-Image model config classes (ZImageTransformerConfig, Qwen3TextEncoderConfig) - Model loader for Z-Image transformer and Qwen3 text encoder - Z-Image conditioning data structures - Step callback support for Z-Image with FLUX latent RGB factors Invocations: - z_image_model_loader: Load Z-Image transformer and Qwen3 encoder - z_image_text_encoder: Encode prompts using Qwen3 with chat template - z_image_denoise: Flow matching denoising with time-shifted sigmas - z_image_image_to_latents: Encode images to 16-channel latents - z_image_latents_to_image: Decode latents using FLUX VAE Frontend: - Z-Image graph builder for text-to-image generation - Model picker and validation updates for z-image base type - CFG scale now allows 0 (required for Z-Image-Turbo) - Clip skip disabled for Z-Image (uses Qwen3, not CLIP) - Optimal dimension settings for Z-Image (1024x1024) Technical details: - Uses Qwen3 text encoder (not CLIP/T5) - 16 latent channels with FLUX-compatible VAE - Flow matching scheduler with dynamic time shift - 8 inference steps recommended for Turbo variant - bfloat16 inference dtype

Add comprehensive LoRA support for Z-Image models including: Backend: - New Z-Image LoRA config classes (LoRA_LyCORIS_ZImage_Config, LoRA_Diffusers_ZImage_Config) - Z-Image LoRA conversion utilities with key mapping for transformer and Qwen3 encoder - LoRA prefix constants (Z_IMAGE_LORA_TRANSFORMER_PREFIX, Z_IMAGE_LORA_QWEN3_PREFIX) - LoRA detection logic to distinguish Z-Image from Flux models - Layer patcher improvements for proper dtype conversion and parameter

…ntification Move Flux layer structure check before metadata check to prevent misidentifying Z-Image LoRAs (which use `diffusion_model.layers.X`) as Flux AI Toolkit format. Flux models use `double_blocks` and `single_blocks` patterns which are now checked first regardless of metadata presence.

…ibility Add comprehensive support for GGUF quantized Z-Image models and improve component flexibility: Backend: - New Main_GGUF_ZImage_Config for GGUF quantized Z-Image transformers - Z-Image key detection (_has_z_image_keys) to identify S3-DiT models - GGUF quantization detection and sidecar LoRA patching for quantized models - Qwen3Encoder_Qwen3Encoder_Config for standalone Qwen3 encoder models Model Loader: - Split Z-Image model

Implements regional prompting for Z-Image (S3-DiT Transformer) allowing different prompts to affect different image regions using attention masks. Backend changes: - Add ZImageRegionalPromptingExtension for mask preparation - Add ZImageTextConditioning and ZImageRegionalTextConditioning data classes - Patch transformer forward to inject 4D regional attention masks - Use additive float mask (0.0 attend, -inf block) in bfloat16 for compatibility - Alternate regional/full attention layers for global coherence Frontend changes: - Update buildZImageGraph to support regional conditioning collectors - Update addRegions to create z_image_text_encoder nodes for regions - Update addZImageLoRAs to handle optional negCond when guidance_scale=0 - Add Z-Image validation (no IP adapters, no autoNegative)

Fix windows path again

…fannkuchensack/InvokeAI into feat/z-image-regional-guidance

…kuchensack/InvokeAI into feat/z-image-turbo-support

Add support for loading Z-Image transformer and Qwen3 encoder models from single-file safetensors format (in addition to existing diffusers directory format). Changes: - Add Main_Checkpoint_ZImage_Config and Main_GGUF_ZImage_Config for single-file Z-Image transformer models - Add Qwen3Encoder_Checkpoint_Config for single-file Qwen3 text encoder - Add ZImageCheckpointModel and ZImageGGUFCheckpointModel loaders with automatic key conversion from original to diffusers format - Add Qwen3EncoderCheckpointLoader using Qwen3ForCausalLM with fast loading via init_empty_weights and proper weight tying for lm_head - Update z_image_denoise to accept Checkpoint format models

Add support for saving and recalling Z-Image component models (VAE and Qwen3 Encoder) in image metadata. Backend: - Add qwen3_encoder field to CoreMetadataInvocation (version 2.1.0) Frontend: - Add vae and qwen3_encoder to Z-Image graph metadata - Add Qwen3EncoderModel metadata handler for recall - Add ZImageVAEModel metadata handler (uses zImageVaeModelSelected instead of vaeSelected to set Z-Image-specific VAE state) - Add qwen3Encoder translation key This enables "Recall Parameters" / "Remix Image" to restore the VAE and Qwen3 Encoder settings used for Z-Image generations.

Add robust device capability detection for bfloat16, replacing hardcoded dtype with runtime checks that fallback to float16/float32 on unsupported hardware. This prevents runtime failures on GPUs and CPUs without bfloat16. Key changes: - Add TorchDevice.choose_bfloat16_safe_dtype() helper for safe dtype selection - Fix LoRA device mismatch in layer_patcher.py (add device= to .to() call) - Replace all assert statements with descriptive exceptions (TypeError/ValueError) - Add hidden_states bounds check and apply_chat_template fallback in text encoder - Add GGUF QKV tensor validation (divisible by 3 check) - Fix CPU noise generation to use float32 for compatibility - Remove verbose debug logging from LoRA conversion utils

…inModelConfig The FLUX Dev license warning in model pickers used isCheckpointMainModelConfig incorrectly: ``` isCheckpointMainModelConfig(config) && config.variant === 'dev' ``` This caused a TypeScript error because CheckpointModelConfig type doesn't include the 'variant' property (it's extracted as `{ type: 'main'; format: 'checkpoint' }` which doesn't narrow to include variant). Changes: - Add isFluxDevMainModelConfig type guard that properly checks base='flux' AND variant='dev', returning MainModelConfig - Update MainModelPicker and InitialStateMainModelPicker to use new guard - Remove isCheckpointMainModelConfig as it had no other usages The function was removed because: 1. It was only used for detecting FLUX Dev models (incorrect use case) 2. No other code needs a generic "is checkpoint format" check 3. The pattern in this codebase is specific type guards per model variant (isFluxFillMainModelModelConfig, isRefinerMainModelModelConfig, etc.)

…ters - Add Qwen3EncoderGGUFLoader for llama.cpp GGUF quantized text encoders - Convert llama.cpp key format (blk.X., token_embd) to PyTorch format - Handle tied embeddings (lm_head.weight ↔ embed_tokens.weight) - Dequantize embed_tokens for embedding lookups (GGMLTensor limitation) - Add QK normalization key mappings (q_norm, k_norm) for Qwen3 - Set Z-Image defaults: steps=9, cfg_scale=0.0, width/height=1024 - Allow cfg_scale >= 0 (was >= 1) for Z-Image Turbo compatibility - Add GGUF format detection for Qwen3 model probing

…rNorm - Add CustomDiffusersRMSNorm for diffusers.models.normalization.RMSNorm - Add CustomLayerNorm for torch.nn.LayerNorm - Register both in AUTOCAST_MODULE_TYPE_MAPPING Enables partial loading (enable_partial_loading: true) for Z-Image models by wrapping their normalization layers with device autocast support

…dont.

Fixed the DEFAULT_TOKENIZER_SOURCE to Qwen/Qwen3-4B

lstein · 2025-12-16T03:11:34Z

I will start my review after the base z-image-turbo support is merged, which should be soon.

…noise node The Z-Image denoise node outputs latents, not images, so these mixins were unnecessary. Metadata and board handling is correctly done in the L2I (latents-to-image) node. This aligns with how FLUX denoise works.

…guidance

…fannkuchensack/InvokeAI into feat/z-image-regional-guidance

Pfannkuchensack and others added 10 commits December 1, 2025 00:22

fix windows path again.

13ac16e

Fix windows path again again

eaf4742

Fix windows path again again again...

66729ea

Merge branch 'main' into feat/z-image-turbo-support

9f6d04c

Merge branch 'invoke-ai:main' into feat/z-image-regional-guidance

a8da845

github-actions bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files python-deps PRs that change python dependencies labels Dec 4, 2025

Pfannkuchensack added 13 commits December 5, 2025 00:52

@Pfannkuchensack

4a25b9a

Fix windows path again

Merge branch 'feat/z-image-regional-guidance' of https://github.com/P…

9d6290a

…fannkuchensack/InvokeAI into feat/z-image-regional-guidance

fix for the typegen-checks

4a1710b

Merge branch 'feat/z-image-turbo-support' of https://github.com/Pfann…

b28d58b

…kuchensack/InvokeAI into feat/z-image-turbo-support

Patch from @lstein for the update of diffusers

2e0cd4d

fix typegen wrong

3e862ce

fix typegen

8551ff8

Pfannkuchensack and others added 4 commits December 10, 2025 17:15

z-image-turbo-fp8-e5m2 works. the z-image-turbo_fp8_scaled_e4m3fn_KJ …

f9605e1

…dont.

Remove the ParamScheduler for z-images

3ee24cb

Fixed the DEFAULT_TOKENIZER_SOURCE to Qwen/Qwen3-4B

Remove unneeded Loggging

1e72feb

chore: fix ruff checks

8785d9a

Pfannkuchensack added 5 commits December 16, 2025 09:41

stupid windows file path again.

4ce0ef5

Merge branch 'main' into feat/z-image-regional-guidance

9989002

Merge branch 'feat/z-image-turbo-support' into feat/z-image-regional-…

88b7e24

…guidance

Merge branch 'feat/z-image-regional-guidance' of https://github.com/P…

f4929ae

…fannkuchensack/InvokeAI into feat/z-image-regional-guidance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Add Z-Image-Turbo regional guidance #8672

Feature: Add Z-Image-Turbo regional guidance #8672

Pfannkuchensack commented Dec 4, 2025 •

edited

Loading

Uh oh!

lstein commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature: Add Z-Image-Turbo regional guidance #8672

Are you sure you want to change the base?

Feature: Add Z-Image-Turbo regional guidance #8672

Conversation

Pfannkuchensack commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

lstein commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pfannkuchensack commented Dec 4, 2025 •

edited

Loading