feat: implement inject-and-continue for agent coordination (preempt-not-restart) #398
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Title Format
feat: implement inject-and-continue for agent coordination (preempt-not-restart)
Description
This PR implements the "inject-and-continue" approach for agent coordination, replacing the previous "restart from scratch" behavior when new answers arrive. When an agent provides a
new_answerwhile other agents are working, those agents now receive an update injection and continue their work with preserved context, rather than being killed and restarted from zero.Key improvements:
Type of change
feat:) - Non-breaking change which adds functionalitydocs:) - Documentation updatesfix:) - Non-breaking change which fixes an issuebreaking:) - Fix or feature that would cause existing functionality to not work as expectedrefactor:) - Code changes that neither fix a bug nor add a featuretest:) - Adding missing tests or correcting existing testschore:) - Maintenance tasks, dependency updates, etc.perf:) - Code changes that improve performancestyle:) - Changes that do not affect the meaning of the codeci:) - Changes to CI/CD configuration files and scriptsChanges Made
Core Implementation (
massgen/orchestrator.py)Removed premature restart check (lines 1797-1802):
restart_pending=TrueEnhanced
_inject_update_and_continue()method:pending_agent_restartsflag after successful injectionImproved
_build_update_message()method:Coordination Tracking (
massgen/coordination_tracker.py,massgen/utils.py)UPDATE_INJECTEDevent type to track when agents receive mid-work updatesDocumentation Updates
Design Documentation (
docs/dev_notes/preempt_not_restart_design.md)Architecture Documentation (
docs/source/development/architecture.rst)Core Concepts Documentation (
docs/source/user_guide/concepts.rst)Technical Details
How It Works
Before (Restart Approach):
After (Inject-and-Continue):
Thus the agent will never throw away its context; instead of being forced to restart, the agent will continue its stream with the new answer from the other agent in its context. Without this, we are wasting runs and potentially eliminating diversity as agents will want to converge earlier.
Safe-Point Injection
Updates are injected at safe points:
Race Condition (Acceptable)
If an agent is deep in its first response when a new answer arrives:
Checklist
Pre-commit status
How to Test
Test CLI Command
Prerequisites:
test_preempt_not_restart.yamlas referenceTest command:
Alternative test (watch coordination table in real-time):
Expected Results
Console Output:
📨 [agent_id] receiving update with new answerswhen injection happensUPDATE_INJECTEDeventsLog Verification:
Example successful injection log:
Workspace Update Message (agents with filesystem):
What Success Looks Like
UPDATE_INJECTEDtypeEdge Cases Tested
Additional Context
Related Issues
Design Decisions
Why not interrupt mid-stream?
Why specific workspace paths?
Future Improvements
Breaking Changes
None. This is a pure enhancement - existing behavior remains for agents without active streams.
Migration Notes
No migration needed. Feature works automatically with existing configs.