Skip to content

Conversation

@wangtianxiang
Copy link

This PR introduces performance optimization for FFT computations by leveraging batched FFT execution (via batchFFT). To provide flexibility and backward compatibility, a new parameter fft_batch_size is added with the following behavior:

  • fft_batch_size = 0 (default): Automatically determine the optimal FFT batch size based on system and problem characteristics.
  • fft_batch_size = 1: Disable batching and execute FFTs sequentially (identical to the original behavior).
  • fft_batch_size > 1: Use batched FFT with a maximum batch size of fft_batch_size.

Note that on NVIDIA GPUs, batched FFT can be slower than looped FFT in many cases; it is recommended to set fft_batch_size = 1 to disable batching and retain the original behavior.

Signed-off-by:Tianxiang Wang[email protected],Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.

Signed-off-by:Tianxiang Wang<[email protected]>,Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
Signed-off-by:Tianxiang Wang<[email protected]>,Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant