-
Notifications
You must be signed in to change notification settings - Fork 273
[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759
Conversation
|
@Varshith-Yadav, thank you for the contribution! |
|
please also add unit tests. |
I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter? |
|
@ljaljushkin I will update the implementation to use a I'll proceed with this approach and update the PR shortly. |
|
@ljaljushkin I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings. |
| assign_weight_column, | ||
| assign_weight_slice, | ||
| extract_weight_column, | ||
| slice_weight, | ||
| zero_mask_columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you need just 2 methods
def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor:
return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None:
if is_transposed:
weight[:, slice_obj] = value
else:
weight[slice_obj, :] = value| weight_tensor = fns.astype(weight_tensor, TensorDataType.float32) | ||
|
|
||
| # Get transpose_b value to handle weight shape correctly | ||
| transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961
| from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters | ||
| from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight | ||
| from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation | ||
| from nncf.quantization.algorithms.weight_compression.utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py
|
also recommend configuring automatic code formating: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#2-automating-code-formatting |
|
@ljaljushkin Thanks for the detailed feedback! I have updated the PR with the following changes: Refactored tensor_slicing: I implemented the simplified get_weight_slice and set_weight_slice helpers exactly as suggested (using generic slicing) in src/nncf/quantization/algorithms/weight_compression/tensor_slicing.py. Algorithm Support: I updated GPTQ, AWQ, Scale Estimation, and LoRA Correction to use these new helpers. They now correctly identify the reduction axis based on transpose_b to handle non-transposed weights. Testing: Added test_compress_weights_algorithms_transpose_b_false which successfully verifies that all 4 algorithms work on a model with transpose_b=False without crashing. Added a new test file tests/openvino/native/test_weight_compression_utils.py to unit-test the helpers with both Numpy and PyTorch tensors. Formatting: Ran pre-commit to apply the automatic Ruff formatting. Ready for review! |
|
Thanks @Varshith-Yadav! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Varshith-Yadav, thank you for your contribution!
My initial comments are below. Besides that, I believe we have to expand tests for each compression algorithm with the transpose_b option for each backend. I'm working on a similar issue right now (support of transpose_a), and when I'll finish my tests I'll share them with you so you can do the same in your PR.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we really need to make a separate function for that. An example on how to do slicing using the build in slice: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-cefaf6a4a2cb473c23106efa01889f05dc899e43c0dfc74ef8e8d60830e8a467R276-R281
| transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][ | ||
| "transpose" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an openvino/onnx specific code, please introduce a backend method to get this value from the model
| transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][ | ||
| "transpose" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Openvino/ONNX specific code as well
| # Get transpose_b value to handle weight shape correctly | ||
| transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Openvino/ONNX specific code in a common algorithm
| # Get transpose_b value to handle weight shape correctly | ||
| transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Onnx/Openvino specific code in a common algorithm
Changes
fixes #3494
Added full support for data-aware weight compression when
MatMulnodes usetranspose_b=False.Updated and validated
test_compression_with_transposeto ensure it passes fortranspose_b=False.Reason for changes
Previously, NNCF’s weight compression flow assumed that the weight input of
MatMuloperations was always transposed (transpose_b=True).Related tickets
Tests
pytest tests/openvino/native/quantization/test_weights_compression.py -v(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)