Skip to content

Conversation

@Varshith-Yadav
Copy link

Changes

fixes #3494
Added full support for data-aware weight compression when MatMul nodes use transpose_b=False.
Updated and validated test_compression_with_transpose to ensure it passes for transpose_b=False.

Reason for changes

Previously, NNCF’s weight compression flow assumed that the weight input of MatMul operations was always transposed (transpose_b=True).

Related tickets

Tests

pytest tests/openvino/native/quantization/test_weights_compression.py -v
(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)

@Varshith-Yadav Varshith-Yadav requested a review from a team as a code owner November 26, 2025 10:36
@ljaljushkin
Copy link
Contributor

@Varshith-Yadav, thank you for the contribution!
Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

@ljaljushkin
Copy link
Contributor

ljaljushkin commented Nov 26, 2025

please also add unit tests.
at least you can copy-paste from #3725: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1955-R1979 and make sure exception is not raised for transpose_b=False

@ljaljushkin
Copy link
Contributor

@Varshith-Yadav, thank you for the contribution! Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter?

@Varshith-Yadav
Copy link
Author

@ljaljushkin
That makes sense. I agree that explicitly transposing the full weight tensor could introduce unnecessary overhead

I will update the implementation to use a slice_weight helper method . This way, we can fetch the necessary channels dynamically based on the transpose_b parameter without physically reshaping the underlying tensor.

I'll proceed with this approach and update the PR shortly.

@github-actions github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Dec 1, 2025
@Varshith-Yadav
Copy link
Author

@ljaljushkin
I've updated the implementation as requested. I added a slice_weight helper in utils.py to handle the data access without performing a full transpose, and refactored the GPTQ logic to use it.

I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings.

Comment on lines 31 to 35
assign_weight_column,
assign_weight_slice,
extract_weight_column,
slice_weight,
zero_mask_columns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need just 2 methods

def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor:
     return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]
def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None:
    if is_transposed:
        weight[:, slice_obj] = value
    else:
        weight[slice_obj, :] = value

weight_tensor = fns.astype(weight_tensor, TensorDataType.float32)

# Get transpose_b value to handle weight shape correctly
transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961

from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters
from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight
from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation
from nncf.quantization.algorithms.weight_compression.utils import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py

@ljaljushkin
Copy link
Contributor

@Varshith-Yadav
Copy link
Author

@ljaljushkin Thanks for the detailed feedback! I have updated the PR with the following changes:

Refactored tensor_slicing: I implemented the simplified get_weight_slice and set_weight_slice helpers exactly as suggested (using generic slicing) in src/nncf/quantization/algorithms/weight_compression/tensor_slicing.py.

Algorithm Support: I updated GPTQ, AWQ, Scale Estimation, and LoRA Correction to use these new helpers. They now correctly identify the reduction axis based on transpose_b to handle non-transposed weights.

Testing:

Added test_compress_weights_algorithms_transpose_b_false which successfully verifies that all 4 algorithms work on a model with transpose_b=False without crashing.

Added a new test file tests/openvino/native/test_weight_compression_utils.py to unit-test the helpers with both Numpy and PyTorch tensors.

Formatting: Ran pre-commit to apply the automatic Ruff formatting.

Ready for review!

@ljaljushkin
Copy link
Contributor

Thanks @Varshith-Yadav!
@daniil-lyakhov, could you please help with the review?

@github-actions github-actions bot added the API Public API-impacting changes label Dec 10, 2025
@daniil-lyakhov daniil-lyakhov self-requested a review December 11, 2025 15:10
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Varshith-Yadav, thank you for your contribution!
My initial comments are below. Besides that, I believe we have to expand tests for each compression algorithm with the transpose_b option for each backend. I'm working on a similar issue right now (support of transpose_a), and when I'll finish my tests I'll share them with you so you can do the same in your PR.

Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we really need to make a separate function for that. An example on how to do slicing using the build in slice: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-cefaf6a4a2cb473c23106efa01889f05dc899e43c0dfc74ef8e8d60830e8a467R276-R281

Comment on lines +126 to +128
transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][
"transpose"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an openvino/onnx specific code, please introduce a backend method to get this value from the model

Comment on lines +224 to +226
transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][
"transpose"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Openvino/ONNX specific code as well

Comment on lines +144 to +146
# Get transpose_b value to handle weight shape correctly
transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Openvino/ONNX specific code in a common algorithm

Comment on lines +225 to +227
# Get transpose_b value to handle weight shape correctly
transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Onnx/Openvino specific code in a common algorithm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Public API-impacting changes NNCF OpenVINO Pull requests that updates NNCF OpenVINO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Good First Issue][NNCF]: Support not transposed weight for data-aware weight compression methods

3 participants