Improve prefix handling, attention mask compatibility, and KV cache control #11

YangYangGirl · 2025-08-21T05:55:01Z

Prefix mask fix
Changed prefix mask values from 0 to 1 so virtual prefix tokens can properly participate in attention.
Prefix K/V concatenation during training
Added logic to concatenate prefix key/value states during training.
2D attention mask support
Enabled attention_mask in [B, T] format in addition to 4D [B, H, T_q, T_k].
Configurable KV cache usage
Added use_cache parameter; KV cache is disabled during training.

- Prefix mask fix Changed prefix mask values from 0 to 1 so virtual prefix tokens can properly participate in attention. - Prefix K/V concatenation during training Added logic to concatenate prefix key/value states during training. - 2D attention mask support Enabled attention_mask in [B, T] format in addition to 4D [B, H, T_q, T_k]. - Configurable KV cache usage Added use_cache parameter; KV cache is disabled during training.

YangYangGirl added 2 commits August 21, 2025 13:52

Update modeling_qwen2_parscale.py

e080941

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve prefix handling, attention mask compatibility, and KV cache control #11

Improve prefix handling, attention mask compatibility, and KV cache control #11

Uh oh!

YangYangGirl commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve prefix handling, attention mask compatibility, and KV cache control #11

Are you sure you want to change the base?

Improve prefix handling, attention mask compatibility, and KV cache control #11

Uh oh!

Conversation

YangYangGirl commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant