Describe the bug
Correlate2d receives two array arguments: (a,b). When b is the larger array, it takes a lot more time to compute than when a is the larger array. One would not expect this to happen, since correlate should be invariant to swapping when mode is full.
Steps/Code to reproduce bug
Code:
import cusignal
import cupy as cp
from cupyx.profiler import benchmark
large_array = cp.zeros((1000, 1000))
small_array = cp.zeros((1, 10))
# Expected order of arguments - fast runtime
print(benchmark(cusignal.correlate2d, (large_array, small_array, 'full'), n_repeat=5))
# Swapped order of arguments - slower runtime
print(benchmark(cusignal.correlate2d, (small_array, large_array, 'full'), n_repeat=5))
Expected behavior
Expect them to run at the same time. An easy fix - check which filter is larger and swap them in the implementation.
Environment details (please complete the following information):
Nothing special here