I tried with sam3-audio-large on RTX Pro 6000 with 96G vram, and still got "cuda out of memory" error. Although on hf it's just 16G, it downloads and runs multiple other models together. So how much vram do we need for each size of the sam3-audio models?