-
-
Couldn't load subscription status.
- Fork 118
Description
Hi Dmitrii,
Could you clarify the current state of the convolution mode?
It seems not all use cases are supported, the information of the current state is relatively sparse, and I gather a few from the past issues, e.g.,
#79
#159
#193
#66
I focus on 2D C2C convolution in my interest.
I have run several experiments, and my observations are
-
Radix transform is well supported; however, when it comes to multi-upload, one must be careful to either rely on vkFFT disabling fourStepReorder to compute the transformed convolutional kernel or manually transpose the kernel to match the data layout.
-
Other non-radix transform seems not to be supported at all (I am not sure if this is a bug or not)
For example, if I perform a 1234 x 1234 convolution, I always get a error code 4031 corresponding to VKFFT_ERROR_FAILED_TO_COMPILE_PROGRAM -
For my NVIDIA A100 GPU, the transition to the multi-load algorithm seems to occur in 8192.
The following is the result I got:
VkFFTApp[CUDA]: (8192,8192) C2C/s/i/⨂ [rr] [21] buf= 0
app.nb_axis_upload: [1, 2]
app.use_bluestein_fft: [False, False]
app.tmp_buffer_nbytes: 0
app.axis_split [[8192 0 0 0]
[ 128 64 0 0]]
As you can see, I inspected the axisSplit variable, but it's unclear to me why only the second dimension entails a 2-axis upload, though the first dimension size is the same as the second.
Thank you!