The csrc directory contains a axpby function in cuda (multi_tensor_axpby_kernel.cu). I have a question about the problem size (workload). It is not clear how the function is used outside the apex library. Do you know the typical dimensions of each tensor in the tensor list (shown below) and the length of the tensor list ? Thanks.
void multi_tensor_axpby_cuda(int chunk_size, Tensor<int> noop_flag, std::vector<std::vector<at::Tensor>> tensor_lists, float a, float b, int arg_to_check) {