[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Varshith-Yadav · 2025-11-26T10:36:22Z

Changes

fixes #3494
Added full support for data-aware weight compression when MatMul nodes use transpose_b=False.
Updated and validated test_compression_with_transpose to ensure it passes for transpose_b=False.

Reason for changes

Previously, NNCF’s weight compression flow assumed that the weight input of MatMul operations was always transposed (transpose_b=True).

Related tickets

Tests

pytest tests/openvino/native/quantization/test_weights_compression.py -v
(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)

…_b=False

ljaljushkin · 2025-11-26T14:26:17Z

@Varshith-Yadav, thank you for the contribution!
Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

ljaljushkin · 2025-11-26T14:49:21Z

please also add unit tests.
at least you can copy-paste from #3725: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1955-R1979 and make sure exception is not raised for transpose_b=False

ljaljushkin · 2025-11-28T15:10:16Z

@Varshith-Yadav, thank you for the contribution! Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter?

Varshith-Yadav · 2025-11-28T20:09:32Z

@ljaljushkin
That makes sense. I agree that explicitly transposing the full weight tensor could introduce unnecessary overhead

I will update the implementation to use a slice_weight helper method . This way, we can fetch the necessary channels dynamically based on the transpose_b parameter without physically reshaping the underlying tensor.

I'll proceed with this approach and update the PR shortly.

Varshith-Yadav · 2025-12-01T20:50:55Z

@ljaljushkin
I've updated the implementation as requested. I added a slice_weight helper in utils.py to handle the data access without performing a full transpose, and refactored the GPTQ logic to use it.

I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings.

ljaljushkin · 2025-12-02T10:33:14Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

+    assign_weight_column,
+    assign_weight_slice,
+    extract_weight_column,
+    slice_weight,
+    zero_mask_columns,


I believe you need just 2 methods

def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor: return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]

def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None: if is_transposed: weight[:, slice_obj] = value else: weight[slice_obj, :] = value

ljaljushkin · 2025-12-02T10:39:28Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

        weight_tensor = fns.astype(weight_tensor, TensorDataType.float32)
+
+        # Get transpose_b value to handle weight shape correctly
+        transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"]


the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961

ljaljushkin · 2025-12-02T10:49:54Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

 from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters
 from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight
 from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation
+from nncf.quantization.algorithms.weight_compression.utils import (


utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py

ljaljushkin · 2025-12-02T10:53:28Z

also recommend configuring automatic code formating: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#2-automating-code-formatting

…lgorithms

Varshith-Yadav · 2025-12-09T19:12:45Z

@ljaljushkin Thanks for the detailed feedback! I have updated the PR with the following changes:

Refactored tensor_slicing: I implemented the simplified get_weight_slice and set_weight_slice helpers exactly as suggested (using generic slicing) in src/nncf/quantization/algorithms/weight_compression/tensor_slicing.py.

Algorithm Support: I updated GPTQ, AWQ, Scale Estimation, and LoRA Correction to use these new helpers. They now correctly identify the reduction axis based on transpose_b to handle non-transposed weights.

Testing:

Added test_compress_weights_algorithms_transpose_b_false which successfully verifies that all 4 algorithms work on a model with transpose_b=False without crashing.

Added a new test file tests/openvino/native/test_weight_compression_utils.py to unit-test the helpers with both Numpy and PyTorch tensors.

Formatting: Ran pre-commit to apply the automatic Ruff formatting.

Ready for review!

ljaljushkin · 2025-12-10T12:04:43Z

Thanks @Varshith-Yadav!
@daniil-lyakhov, could you please help with the review?

daniil-lyakhov

@Varshith-Yadav, thank you for your contribution!
My initial comments are below. Besides that, I believe we have to expand tests for each compression algorithm with the transpose_b option for each backend. I'm working on a similar issue right now (support of transpose_a), and when I'll finish my tests I'll share them with you so you can do the same in your PR.

Thank you!

daniil-lyakhov · 2025-12-11T15:48:24Z

src/nncf/quantization/algorithms/weight_compression/tensor_slicing.py

I'm not sure we really need to make a separate function for that. An example on how to do slicing using the build in slice: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-cefaf6a4a2cb473c23106efa01889f05dc899e43c0dfc74ef8e8d60830e8a467R276-R281

daniil-lyakhov · 2025-12-11T15:52:08Z

src/nncf/quantization/algorithms/weight_compression/lora_correction.py

+        transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][
+            "transpose"
+        ]


Looks like an openvino/onnx specific code, please introduce a backend method to get this value from the model

daniil-lyakhov · 2025-12-11T15:53:00Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

+        transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][
+            "transpose"
+        ]


Openvino/ONNX specific code as well

daniil-lyakhov · 2025-12-11T15:53:49Z

src/nncf/quantization/algorithms/weight_compression/scale_estimation.py

+            # Get transpose_b value to handle weight shape correctly
+            transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]
+


Openvino/ONNX specific code in a common algorithm

daniil-lyakhov · 2025-12-11T15:55:07Z

src/nncf/quantization/algorithms/weight_compression/awq.py

+        # Get transpose_b value to handle weight shape correctly
+        transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]
+


Onnx/Openvino specific code in a common algorithm

[NNCF] Enable data-aware weight compression for MatMul with transpose…

13fcd44

…_b=False

Varshith-Yadav requested a review from a team as a code owner November 26, 2025 10:36

Refactor: Use slice_weight helper instead of full transpose

ea6e5e3

github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Dec 1, 2025

ljaljushkin requested changes Dec 2, 2025

View reviewed changes

Refactor weight compression to support transpose_b=False across all a…

80ef438

…lgorithms

Varshith-Yadav requested a review from ljaljushkin December 9, 2025 19:12

github-actions bot added the API Public API-impacting changes label Dec 10, 2025

daniil-lyakhov self-requested a review December 11, 2025 15:10

daniil-lyakhov reviewed Dec 11, 2025

View reviewed changes

		# Get transpose_b value to handle weight shape correctly
		transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Are you sure you want to change the base?

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Conversation

Varshith-Yadav commented Nov 26, 2025

Changes

Reason for changes

Related tickets

Tests

Uh oh!

ljaljushkin commented Nov 26, 2025

Uh oh!

ljaljushkin commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljaljushkin commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Dec 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ljaljushkin commented Dec 2, 2025

Uh oh!

Varshith-Yadav commented Dec 9, 2025

Uh oh!

ljaljushkin commented Dec 10, 2025

Uh oh!

daniil-lyakhov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ljaljushkin commented Nov 26, 2025 •

edited

Loading

daniil-lyakhov left a comment •

edited

Loading