-
Notifications
You must be signed in to change notification settings - Fork 192
Description
I've successfully got E2B self-hosted and running on a single machine, and it's working great! As I'm thinking about scaling this up (especially on Kubernetes), I have a few questions:
- Custom Template Startup Speed
I'm wondering how to handle custom templates at scale.
If I have many users creating their own templates, it's not feasible to cache all of them on every node. I noticed the scheduler doesn't seem to check for template cache hits when placing a sandbox.
This seems like it could cause slow "cold starts" for custom templates. How do you recommend solving this?
- Resource Cleanup on Orchestrator Crash
My second question is about cleanup. I've observed that if the orchestrator process dies unexpectedly, it does leak resources—specifically orphaned Firecracker VMs and veth pairs from the network pool.
How are you currently handling this? Is there a garbage collector, a reconciliation loop, or some other process that cleans up these resources after the orchestrator restarts? I'm curious about your maintenance strategy here.
- Co-hosting with Kubernetes (kubelet)
we have internally clusters that are already running on AWS and k8s (EKS).
I saw you mentioned that you have already running e2b in k8s. This is exactly my goal.
I'm very curious how you manage running the orchestrator and kubelet on the same node without resource conflicts. Specifically:
How do you manage resources like hugepages so they don't conflict?
How do you make kubelet aware of the CPU and Memory that the orchestrator (and its VMs) are consuming, so k8s doesn't over-provision pods?
If kubelet is using the static CPU policy (with cpuset), how do you prevent the orchestrator's VMs from using those reserved CPUs?
Are you running the orchestrator as a DaemonSet, or just as a separate process on the host?
- Flexibility for envd (and Core Binaries)
My last question is about updating core components like envd that are baked into the templates.
It seems that if I want to add a feature to envd or update it, I would need to rebuild all existing templates (including all custom ones) to get the new version. This maybe difficult to manage at scale.
Is this correct, or do you have another solution? I was wondering if it would be technically feasible to mount the envd binary into the VM (e.g., right before boot) rather than baking it into the filesystem? This seems like it would be much more flexible for rolling out updates.
Thanks for any insights, and thanks for building this!