Replies: 2 comments
-
|
@zhaochenyang20 WDYT? |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
I've asked our team. thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey Community!
I am exploring collaboration with different inference engines to create a reusable orchestration flows on Kubernetes. I have started a proposal at vLLM Production Stack.
The goal is to create a generic API so that different inference engines (vLLM or SGLang) can be deployed on Kubernetes for different performance, SLA, and resource usage goals.
Such API should support the following use cases:
Currently, there are quite a few efforts towards to this goal. However, they lacks the reusable support for the above use cases.
I am looking forward to hearing from the community and starting a productive collaboration to explore this direction.
Cheers!
Huamin Chen, Red Hat
Beta Was this translation helpful? Give feedback.
All reactions