You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixing untie to be used only as needed and automatic (#1963)
when models have shared in/out embedding, we had to call the untie
function to
separate them before modifying those layers. This caused increased
memory and was applied at all times
regardless of whether those layers were targeted for modification.
change: automatically detect when transform or mixin needs to untie
shared embeddings.
This also adds try except to the untieing code so that if it is invoked
on a model that can't be untied, it gives a warning rather than erroring
new tests are added to test this functionality, old tests are modified
to use the automatic untieing
the new tests were initially written using claude-code, I then rewrote
them
TEST PLAN:
pytest
tests/llmcompressor/modifiers/quantization/test_handling_shared_embeddings.py
---------
Signed-off-by: HDCharles <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
0 commit comments