Skip to content

Two test_matmul_cuda.py cases failed in torch-xpu-ops CI but passed in local machine. #2481

@daisyden

Description

@daisyden

🐛 Describe the bug

Cases:
op_ut,third_party.torch-xpu-ops.test.xpu.test_matmul_cuda_xpu.TestMatmulCudaXPU,test_cublas_deterministic_shape_1024_xpu_float32
op_ut,third_party.torch-xpu-ops.test.xpu.test_matmul_cuda_xpu.TestMatmulCudaXPU,test_cublas_deterministic_shape_512_xpu_float32

pytest -v test/xpu/test_matmul_cuda_xpu.py -k test_cublas_deterministic_shape_1024_xpu_float32
pytest -v test/xpu/test_matmul_cuda_xpu.py -k test_cublas_deterministic_shape_512_xpu_float32

Traceback

  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
    method(*args, **kwargs)
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test
    result = test(self, **param_kwargs)
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1435, in only_fn
    return fn(slf, *args, **kwargs)
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2009, in wrapper
    return fn(*args, **kwargs)
  File "/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/test_matmul_cuda_xpu.py", line 434, in test_cublas_deterministic
    self.assertEqual(first, torch.matmul(inp, inp), atol=0.0, rtol=0.0)
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
    return super().assertEqual(x, y, *args, **kwargs)
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
    raise error_metas.pop()[0].to_error(  # type: ignore[index]
AssertionError: Tensor-likes are not equal!

Mismatched elements: 4810 / 262144 (1.8%)
Greatest absolute difference: 7.62939453125e-06 at index (144, 485)
Greatest relative difference: 0.00027137043070979416 at index (483, 414)

Versions

PR: #2432

Test passed on PVC

dpkg -l |grep intel
dpkg-query: warning: parsing file '/var/lib/dpkg/status' near line 17718 package 'libdpst':
 missing 'Maintainer' field
dpkg-query: warning: parsing file '/var/lib/dpkg/status' near line 20641 package 'libghe':
 missing 'Maintainer' field
ii  intel-fw-gpu                                                2025.13.2-398~22.04                     all          Firmware package for Intel integrated and discrete GPUs
ii  intel-gsc                                                   0.9.5-123~u22.04                        amd64        Intel(R) Graphics System Controller Firmware
ii  intel-i915-dkms                                             1.25.3.13.250521.17+i25-1               all          Out of tree i915 driver.
ii  intel-igc-cm                                                1.0.225.54105-1201~22.04                amd64        Frontend libraries for Intel(R) C for Metal Compiler
ii  intel-level-zero-gpu-raytracing                             1.0.0-92~u22.04                         amd64        oneAPI Level Zero Ray Tracing Support
ii  intel-media-va-driver-non-free:amd64                        25.4.3-1204~22.04                       amd64        Video Acceleration API driver for the Intel Core processors
ii  intel-metrics-discovery                                     1.14.182-1189~22.04                     amd64        Shared library for Intel(R) Metrics Discovery API
ii  intel-metrics-discovery-dev                                 1.14.182-1189~22.04                     amd64        Development files for Intel(R) Metrics Discovery API
ii  intel-metrics-library                                       1.0.200-1189~22.04                      amd64        Shared library for Intel(R) Metrics Library for Metrics Discovery API
ii  intel-metrics-library-dev                                   1.0.200-1189~22.04                      amd64        Development files for Intel(R) Metrics Library for Metrics Discovery API
ii  intel-microcode                                             3.20250812.0ubuntu0.22.04.1             amd64        Processor microcode firmware for Intel CPUs
ii  intel-ocloc                                                 25.40.35563.7-1206~22.04                amd64        ocloc compiler for Intel Graphics Compute Runtime
ii  intel-ocloc-dev                                             25.40.35563.7-1206~22.04                amd64        Development headers for Intel Graphics Compute Runtime
ii  intel-opencl-icd                                            25.40.35563.7-1206~22.04                amd64        OpenCL installable UMD driver for Intel Graphics Compute Runtime
ii  libdrm-intel1:amd64                                         2.4.121-2119~22.04                      amd64        Userspace interface to intel-specific kernel DRM services -- runtime
ii  libze-intel-gpu-dev                                         25.40.35563.7-1206~22.04                amd64        Development headers for Intel Graphics Compute Runtime
ii  libze-intel-gpu1                                            25.40.35563.7-1206~22.04                amd64        Shared library for Intel Graphics Compute Runtime
ii  xserver-xorg-video-intel                                    2:2.99.917+git20210115-1                amd64        X.Org X server -- Intel i8xx, i9xx display driver

Metadata

Metadata

Assignees

No one assigned

    Labels

    skippedUsed for temp UT failure to parallel fix

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions