Help needed: Loading two models on separate devices #5069

BaleChen · 2025-04-04T16:26:17Z

BaleChen
Apr 4, 2025

Hey, thanks for the great project.

My question is that if I have two LLMs for inference, is there a way to load model A on GPU 0&1, and model B on GPU 2&3? I am using offline inference engine. Is it possible to do it in one script, or do I need to launch the two engines in two processes and use CUDA_VISIBLE_DEVICES to control it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help needed: Loading two models on separate devices #5069

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Help needed: Loading two models on separate devices #5069

Uh oh!

BaleChen Apr 4, 2025

Replies: 0 comments

BaleChen
Apr 4, 2025