Skip to content

Conversation

@YangYangGirl
Copy link

  • Prefix mask fix
    Changed prefix mask values from 0 to 1 so virtual prefix tokens can properly participate in attention.
  • Prefix K/V concatenation during training
    Added logic to concatenate prefix key/value states during training.
  • 2D attention mask support
    Enabled attention_mask in [B, T] format in addition to 4D [B, H, T_q, T_k].
  • Configurable KV cache usage
    Added use_cache parameter; KV cache is disabled during training.

- Prefix mask fix
Changed prefix mask values from 0 to 1 so virtual prefix tokens can properly participate in attention.
- Prefix K/V concatenation during training
Added logic to concatenate prefix key/value states during training.
- 2D attention mask support
Enabled attention_mask in [B, T] format in addition to 4D [B, H, T_q, T_k].
- Configurable KV cache usage
Added use_cache parameter; KV cache is disabled during training.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant