-
Notifications
You must be signed in to change notification settings - Fork 164
feat(BA-2851): Add resource isolation options for multi-agent #6498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/BA-2753/multiple-agents
Are you sure you want to change the base?
feat(BA-2851): Add resource isolation options for multi-agent #6498
Conversation
e6c1f4b to
d84258e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces resource isolation options for multi-agent setups, enabling multiple agents to run on the same physical host with controlled resource allocation. The implementation adds three allocation modes: SHARED (default, backward compatible), AUTO_SPLIT (automatic equal division), and MANUAL (explicit per-agent configuration).
Key changes:
- Introduces
ResourcePartitionerclass to manage resource allocation across agents - Adds
ResourceAllocationModeenum with SHARED, AUTO_SPLIT, and MANUAL modes - Implements validation logic to ensure consistent manual allocations across agents
- Updates agent initialization to use resource partitioning
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 23 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ai/backend/agent/resources.py | Adds ResourcePartitioner class and changes abstract methods to raise NotImplementedError |
| src/ai/backend/agent/config/unified.py | Defines allocation modes, new config fields (allocated_cpu/mem/disk/devices), and validation logic |
| src/ai/backend/agent/agent.py | Integrates ResourcePartitioner into agent initialization and updates slot calculations |
| src/ai/backend/agent/server.py | Creates ResourcePartitioner instances per agent and adds resource reconciliation |
| src/ai/backend/agent/docker/agent.py | Adds resource_partitioner parameter to constructor |
| src/ai/backend/agent/kubernetes/agent.py | Adds resource_partitioner parameter to constructor |
| tests/agent/test_resource_allocation.py | Comprehensive unit tests for all three allocation modes |
| tests/agent/test_config_validation.py | Tests for config validation of allocation modes and device consistency |
| tests/agent/docker/test_agent.py | Updates test to pass ResourcePartitioner to agent |
| changes/6498.feature.md | Changelog entry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d84258e to
c5114a9
Compare
9f12687 to
fdee4b0
Compare
310d847 to
3faac0f
Compare
fdee4b0 to
90f0702
Compare
36824ac to
279e71b
Compare
90f0702 to
e2b1902
Compare
280831f to
db07080
Compare
This change adds configuration for partitioning resources rather than every agent always seeing the full resource pool. This prevents unintended over-allocation that could crash kernels. SHARED mode allows all agents to see full resources (useful for stress testing). This is the same behavior as before. AUTO_SPLIT automatically divides resources equally among agents. MANUAL mode lets users specify exact per-agent allocations for all resources. Single-agent deployments remain unaffected and retain access to all available hardware resources.
db07080 to
3aac7df
Compare
resolves #6432 (BA-2851)
This change adds configuration for partitioning resources rather than every agent always seeing the full resource pool. This prevents unintended over-allocation that could crash kernels.
Single-agent deployments remain unaffected and retain access to all available hardware resources.
Checklist: (if applicable)
ai.backend.testdocsdirectory