Recommended audio input duration for optimal performance?

Hi team! 

What's the recommended (or best) audio input length for SAM-Audio to achieve optimal performance and avoid memory issues?
Regarding the model's architecture, are there plans to support streaming inference in the near future? 

Thanks for your hard work!