You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My question is that if I have two LLMs for inference, is there a way to load model A on GPU 0&1, and model B on GPU 2&3? I am using offline inference engine. Is it possible to do it in one script, or do I need to launch the two engines in two processes and use CUDA_VISIBLE_DEVICES to control it?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey, thanks for the great project.
My question is that if I have two LLMs for inference, is there a way to load model A on GPU 0&1, and model B on GPU 2&3? I am using offline inference engine. Is it possible to do it in one script, or do I need to launch the two engines in two processes and use CUDA_VISIBLE_DEVICES to control it?
Beta Was this translation helpful? Give feedback.
All reactions