how can i use the multiple GPU reasoning the cli? and when i used the single gpu i always got the error :RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)