feat: implement inject-and-continue for agent coordination (preempt-not-restart) #398

ncrispino · 2025-10-31T23:14:51Z

PR Title Format

feat: implement inject-and-continue for agent coordination (preempt-not-restart)

Description

This PR implements the "inject-and-continue" approach for agent coordination, replacing the previous "restart from scratch" behavior when new answers arrive. When an agent provides a new_answer while other agents are working, those agents now receive an update injection and continue their work with preserved context, rather than being killed and restarted from zero.

Key improvements:

Agents preserve their full thinking history when receiving updates
No wasted computation regenerating ideas from scratch
Enables true collaborative building where agents can synthesize and improve each other's work
More specific workspace update messages showing exactly which workspaces were affected

Type of change

Changes Made

Core Implementation (`massgen/orchestrator.py`)

Removed premature restart check (lines 1797-1802):
- Previously: Agents were restarted from scratch at the start of execution if restart_pending=True
- Now: All agents go through injection logic in the iteration loop for consistent behavior
Enhanced _inject_update_and_continue() method:
- Properly tracks coordination state in both orchestrator and coordination tracker
- Clears pending_agent_restarts flag after successful injection
- Handles edge case where agent already has full context (no new answers to inject)
Improved _build_update_message() method:
- Now shows specific workspace paths affected by the update
- Example: "agent1's work: /path/to/temp_workspaces/gemini_agent/"
- Helps agents know exactly where to find new files

Coordination Tracking (`massgen/coordination_tracker.py`, `massgen/utils.py`)

Added UPDATE_INJECTED event type to track when agents receive mid-work updates
Properly integrated with existing restart tracking mechanisms

Documentation Updates

Design Documentation (`docs/dev_notes/preempt_not_restart_design.md`)

Added implementation status section with completion date
Documented race condition limitation (acceptable by design)
Explained safe-point injection and why agents won't be interrupted mid-stream
Included real example from test logs

Architecture Documentation (`docs/source/development/architecture.rst`)

Added comprehensive "Inject-and-Continue (Preempt-Not-Restart)" section
Visual comparison of traditional vs MassGen approach
Explained benefits: context preservation, efficiency, better collaboration
Documented safe-point injection mechanism and race condition

Core Concepts Documentation (`docs/source/user_guide/concepts.rst`)

Updated coordination flow diagram from "RESTART coordination" to "INJECT update to others"
Changed key insight from "Restart on new_answer" to "Inject-and-continue"
Clarified that agents receive updates mid-work and continue with preserved thinking

Technical Details

How It Works

Before (Restart Approach):

Agent A: Working on solution... [thinking deeply about approach X]
Agent B: ✅ Provides new answer
         ↓
Agent A: 🔁 RESTART - Kill stream, clear context, start fresh
         ❌ Lost all thinking about approach X

After (Inject-and-Continue):

Agent A: Working on solution... [thinking deeply about approach X]
Agent B: ✅ Provides new answer
         ↓
Agent A: 📨 UPDATE RECEIVED - Inject new context and continue
         ✅ Keeps all thinking about approach X
         ✅ Can now build on Agent B's answer

Thus the agent will never throw away its context; instead of being forced to restart, the agent will continue its stream with the new answer from the other agent in its context. Without this, we are wasting runs and potentially eliminating diversity as agents will want to converge earlier.

Safe-Point Injection

Updates are injected at safe points:

✅ Between iteration loops (after completing a response)
✅ When agent checks for new context
❌ NOT mid-stream (would break agent reasoning)

Race Condition (Acceptable)

If an agent is deep in its first response when a new answer arrives:

Won't see injection until completing that response
By then, may already have full context from orchestrator
Agent still gets all answers, just via different mechanism
This is acceptable - same final outcome

Checklist

I have run pre-commit on my changed files and all checks pass
My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Pre-commit status

# Pre-commit checks to be run before merging
uv run pre-commit run --files massgen/orchestrator.py massgen/coordination_tracker.py massgen/utils.py
uv run pre-commit run --files docs/dev_notes/preempt_not_restart_design.md
uv run pre-commit run --files docs/source/development/architecture.rst docs/source/user_guide/concepts.rst

How to Test

Test CLI Command

Prerequisites:

Configure test file with 3 agents (fast models recommended for quick testing)
Use test_preempt_not_restart.yaml as reference

Test command:

# Run with test config
uv run massgen --config test_preempt_not_restart.yaml "Create a simple website about dogs"

# Monitor logs for injection events
tail -f .massgen/massgen_logs/log_*/massgen.log | grep -i "inject"

Alternative test (watch coordination table in real-time):

# Start MassGen in one terminal
uv run massgen --config test_preempt_not_restart.yaml "Create a website about Bob Dylan"

# In another terminal, watch coordination events
watch -n 2 'find .massgen/massgen_logs -name "coordination_events.json" -exec tail -20 {} \;'

Expected Results

Console Output:

Agents show 📨 [agent_id] receiving update with new answers when injection happens
Coordination table logs should show UPDATE_INJECTED events
Agents should continue their reasoning without full restart

Log Verification:

# Check for successful injections
grep "Injecting update for" .massgen/massgen_logs/log_*/massgen.log

# Verify agents received NEW answers (not empty)
grep "NEW answers since agent started" .massgen/massgen_logs/log_*/massgen.log

# Count injection events
grep -c "UPDATE_INJECTED" .massgen/massgen_logs/log_*/attempt_1/coordination_events.json

Example successful injection log:

17:03:23 | INFO | [Orchestrator] Agent grok_agent started with 0 answer(s), now has 1 answer(s)
17:03:23 | INFO | [Orchestrator] NEW answers since agent started: ['gemini_agent']
17:03:23 | INFO | [Orchestrator] Injecting update for grok_agent

Workspace Update Message (agents with filesystem):

WORKSPACE UPDATE:
- Your workspace files are preserved
- New workspace snapshots available from 2 agent(s):
  - agent1's work: /path/to/temp_workspaces/gemini_agent/
  - agent3's work: /path/to/temp_workspaces/grok_agent/

What Success Looks Like

✅ Agents receive update injections mid-work (not always at start)
✅ Agents continue with preserved context
✅ Agents can reference new answers in their reasoning
✅ Coordination events show UPDATE_INJECTED type
✅ Workspace paths are specific to affected agents

Edge Cases Tested

Fast agent provides answer early: Other agents should get injection
Slow agent deep in first response: May get full context on restart (acceptable)
No new answers to inject: Flag cleared, agent proceeds normally
Multiple rapid updates: Each handled sequentially at safe points

Additional Context

Related Issues

Fixes [FEATURE] Capture agent in-progress summaries/memories during restart_pending to preserve diverse perspectives #376 - Preserve agent work-in-progress during coordination

Design Decisions

Why not interrupt mid-stream?

Would break agent reasoning and potentially corrupt responses
Safe-point injection ensures agent completes current thought
Race condition is acceptable - agent still gets all context

Why specific workspace paths?

Helps agents know exactly where to find new files
Reduces confusion about which workspaces have updates
Makes collaboration more explicit and debuggable

Future Improvements

Add metrics for injection effectiveness
Consider batching multiple rapid updates into single injection
Explore pre-emptive hints for agents about incoming updates

Breaking Changes

None. This is a pure enhancement - existing behavior remains for agents without active streams.

Migration Notes

No migration needed. Feature works automatically with existing configs.

ncrispino added 2 commits October 30, 2025 20:20

Fix preempt not restart; begun testing

8e29f90

Adjust docs

eee3044

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement inject-and-continue for agent coordination (preempt-not-restart) #398

feat: implement inject-and-continue for agent coordination (preempt-not-restart) #398

Uh oh!

ncrispino commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: implement inject-and-continue for agent coordination (preempt-not-restart) #398

Are you sure you want to change the base?

feat: implement inject-and-continue for agent coordination (preempt-not-restart) #398

Uh oh!

Conversation

ncrispino commented Oct 31, 2025

PR Title Format

Description

Type of change

Changes Made

Core Implementation (massgen/orchestrator.py)

Coordination Tracking (massgen/coordination_tracker.py, massgen/utils.py)

Documentation Updates

Design Documentation (docs/dev_notes/preempt_not_restart_design.md)

Architecture Documentation (docs/source/development/architecture.rst)

Core Concepts Documentation (docs/source/user_guide/concepts.rst)

Technical Details

How It Works

Safe-Point Injection

Race Condition (Acceptable)

Checklist

Pre-commit status

How to Test

Test CLI Command

Expected Results

What Success Looks Like

Edge Cases Tested

Additional Context

Related Issues

Design Decisions

Future Improvements

Breaking Changes

Migration Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Core Implementation (`massgen/orchestrator.py`)

Coordination Tracking (`massgen/coordination_tracker.py`, `massgen/utils.py`)

Design Documentation (`docs/dev_notes/preempt_not_restart_design.md`)

Architecture Documentation (`docs/source/development/architecture.rst`)

Core Concepts Documentation (`docs/source/user_guide/concepts.rst`)