feat: unified operation registry for intelligent auth/rate limit recovery#1698
feat: unified operation registry for intelligent auth/rate limit recovery#1698
Conversation
…very - Add OperationRegistry singleton to track ALL Claude SDK operations (tasks, PR reviews, insights, etc.) - Implement intelligent pause/resume for rate limits with automatic wait until reset - Add auth failure pause phase with 24-hour timeout protection - Enable proactive account swapping for all operation types (not just autonomous tasks) - Add autoSwitchOnAuthFailure setting for multi-account auth failure handling - Fix infinite loop in WorktreeSelector dropdown (pre-existing bug) - Add comprehensive unit tests for OperationRegistry (39 tests) Backend changes: - Add is_rate_limit_error() and is_authentication_error() detection - Add RATE_LIMIT_PAUSED and AUTH_FAILURE_PAUSED execution phases - Implement wait_for_rate_limit_reset() with periodic resume checks - Implement wait_for_auth_resume() with 24-hour max timeout - Handle negative wait_seconds edge case Frontend changes: - Create operation-registry.ts for unified operation tracking - Update usage-monitor.ts to use registry instead of AgentManager - Update agent-manager.ts to register operations with registry - Extend subprocess-runner.ts with optional operation registration - Register PR reviews with operation registry - Add i18n keys for new settings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Test User seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Important Review skippedToo many files! This PR contains 165 files, which is 15 over the limit of 150. You can disable this status message by setting the
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @AndyMik90, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the application's resilience and user experience by introducing intelligent recovery mechanisms for API rate limits and authentication failures. It unifies the tracking of all Claude Agent SDK operations, enabling proactive account switching and graceful pausing/resuming of tasks. These changes address critical issues like infinite retry loops and limited proactive swapping, ensuring smoother operation and better resource management. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is an impressive and significant feature enhancement that introduces a unified operation registry for intelligent auth and rate limit recovery. The implementation is robust, with comprehensive backend logic for pausing and resuming, a well-designed frontend registry, and thorough testing. The proactive account swapping for all operations is a major improvement. I've left a few comments with suggestions for minor refactoring to improve code clarity and maintainability, and one potential regression in a useEffect dependency.
apps/backend/agents/coder.py
Outdated
| if not error_info: | ||
| return None | ||
|
|
||
| message = error_info.get("message", "") | ||
|
|
||
| # Try to find patterns like "resets at 3:00 PM" or "in 5 minutes" | ||
| # Pattern: "in X minutes/hours" | ||
| in_time_match = re.search(r"in\s+(\d+)\s*(minute|hour|min|hr)s?", message, re.I) | ||
| if in_time_match: | ||
| amount = int(in_time_match.group(1)) | ||
| unit = in_time_match.group(2).lower() | ||
| if unit.startswith("hour") or unit.startswith("hr"): | ||
| delta = timedelta(hours=amount) | ||
| else: | ||
| delta = timedelta(minutes=amount) | ||
| return int((datetime.now() + delta).timestamp()) | ||
|
|
||
| # Pattern: "at HH:MM" (12 or 24 hour) | ||
| at_time_match = re.search(r"at\s+(\d{1,2}):(\d{2})(?:\s*(am|pm))?", message, re.I) | ||
| if at_time_match: | ||
| hour = int(at_time_match.group(1)) | ||
| minute = int(at_time_match.group(2)) | ||
| meridiem = at_time_match.group(3) | ||
| if meridiem: | ||
| if meridiem.lower() == "pm" and hour < 12: | ||
| hour += 12 | ||
| elif meridiem.lower() == "am" and hour == 12: | ||
| hour = 0 | ||
|
|
||
| now = datetime.now() | ||
| reset_time = now.replace(hour=hour, minute=minute, second=0, microsecond=0) | ||
| if reset_time <= now: | ||
| reset_time += timedelta(days=1) | ||
| return int(reset_time.timestamp()) | ||
|
|
||
| # Default: 5 hour wait (typical session reset) | ||
| return int((datetime.now() + timedelta(hours=5)).timestamp()) |
There was a problem hiding this comment.
datetime.now() is called multiple times within this function. This is slightly inefficient and could lead to minor inconsistencies if the function's execution spans a second boundary. It's better to capture the current time once at the beginning of the function and reuse it.
if not error_info:
return None
now = datetime.now()
message = error_info.get("message", "")
# Try to find patterns like "resets at 3:00 PM" or "in 5 minutes"
# Pattern: "in X minutes/hours"
in_time_match = re.search(r"in\s+(\d+)\s*(minute|hour|min|hr)s?", message, re.I)
if in_time_match:
amount = int(in_time_match.group(1))
unit = in_time_match.group(2).lower()
if unit.startswith("hour") or unit.startswith("hr"):
delta = timedelta(hours=amount)
else:
delta = timedelta(minutes=amount)
return int((now + delta).timestamp())
# Pattern: "at HH:MM" (12 or 24 hour)
at_time_match = re.search(r"at\s+(\d{1,2}):(\d{2})(?:\s*(am|pm))?", message, re.I)
if at_time_match:
hour = int(at_time_match.group(1))
minute = int(at_time_match.group(2))
meridiem = at_time_match.group(3)
if meridiem:
if meridiem.lower() == "pm" and hour < 12:
hour += 12
elif meridiem.lower() == "am" and hour == 12:
hour = 0
reset_time = now.replace(hour=hour, minute=minute, second=0, microsecond=0)
if reset_time <= now:
reset_time += timedelta(days=1)
return int(reset_time.timestamp())
# Default: 5 hour wait (typical session reset)
return int((now + timedelta(hours=5)).timestamp())
apps/backend/agents/coder.py
Outdated
|
|
||
| reset_timestamp = parse_rate_limit_reset_time(error_info) | ||
| if reset_timestamp: | ||
| from datetime import datetime |
| // Register with unified OperationRegistry for proactive swap support | ||
| const activeProfile = profileManager.getActiveProfile(); | ||
| if (activeProfile) { | ||
| // Keep internal state tracking for backward compatibility | ||
| this.assignProfileToTask(taskId, activeProfile.id, activeProfile.name, 'proactive'); | ||
|
|
||
| // Register with unified registry for proactive swap | ||
| const operationRegistry = getOperationRegistry(); | ||
| operationRegistry.registerOperation( | ||
| taskId, | ||
| 'spec-creation', | ||
| activeProfile.id, | ||
| activeProfile.name, | ||
| (newProfileId: string) => this.restartTask(taskId, newProfileId), | ||
| { | ||
| stopFn: () => { this.killTask(taskId); }, | ||
| metadata: { projectPath, taskDescription, specDir } | ||
| } | ||
| ); | ||
| console.log('[AgentManager] Task registered with OperationRegistry:', { taskId, profileId: activeProfile.id, profileName: activeProfile.name, type: 'spec-creation' }); | ||
| } |
There was a problem hiding this comment.
| setInitSuccess(false); | ||
| setInitError(null); | ||
| }, [selectedProjectId]); | ||
| }, []); |
There was a problem hiding this comment.
Changing the dependency array of this useEffect from [selectedProjectId] to [] may introduce a UI bug. Previously, the initialization status (initSuccess, initError) was reset immediately when switching projects. Now, it only resets on the initial mount. This means if you switch from a project with an initialization error to another project, the error message might persist until the new project's initialization check completes. It's safer to keep selectedProjectId as a dependency to ensure a clean state on project switch.
| }, []); | |
| }, [selectedProjectId]); |
| return; | ||
| } | ||
|
|
||
| // If task failed and hit max retries, clean up | ||
| if (context.swapCount >= 2) { | ||
| this.taskExecutionContext.delete(taskId); | ||
| // Unregister from OperationRegistry | ||
| getOperationRegistry().unregisterOperation(taskId); | ||
| } | ||
| // Otherwise keep context for potential restart | ||
| }, 1000); // Delay to allow restart logic to run first |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
- Merge origin/develop into fix/auth-swapping/pause - Resolve agent-manager.ts conflict: keep OperationRegistry code + add projectId parameter - Resolve test_integration_phase4.py conflict: use cleaner comment style - Fix task-store.ts: restore reviewReason parameter name (was incorrectly renamed by Biome) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
AndyMik90
left a comment
There was a problem hiding this comment.
🤖 Auto Claude PR Review
Merge Verdict: 🔴 BLOCKED
🔴 Blocked - Merge conflicts must be resolved before merge.
Blocked: PR has merge conflicts with base branch. Resolve conflicts before merge.
Risk Assessment
| Factor | Level | Notes |
|---|---|---|
| Complexity | High | Based on lines changed |
| Security Impact | Low | Based on security findings |
| Scope Coherence | Good | Based on structural review |
🚨 Blocking Issues (Must Fix)
- Merge Conflicts: PR has conflicts with base branch that must be resolved
- CI Failed: Lint Complete
- CI Failed: TypeScript (Biome)
Findings Summary
- Medium: 4 issue(s)
- Low: 3 issue(s)
Generated by Auto Claude PR Review
Findings (7 selected of 7 total)
🔵 [1a6a9ff8067b] [LOW] Subprocess error output exposed to renderer via IPC
📁 apps/frontend/src/main/rate-limit-detector.ts:334
The detectAuthFailure() function captures the full subprocess stdout/stderr output and stores it in the 'originalError' field, which is then sent to the renderer via CLAUDE_AUTH_FAILURE IPC channel. While this is useful for debugging, subprocess output may contain internal paths, API error details, or other information that could be valuable to an attacker if the renderer context were compromised. In this desktop application context where users have filesystem access anyway, this is low risk.
Suggested fix:
Consider sanitizing or truncating the originalError field before sending to renderer, or only include it when DEBUG mode is enabled.
🟡 [612c5e50b7a2] [MEDIUM] Misleading docstring - function never returns None
📁 apps/backend/agents/coder.py:175
The docstring for parse_rate_limit_reset_time claims 'Returns: Unix timestamp of reset time, or None if not parseable' but the function has a default return that always returns a 5-hour future timestamp when no patterns match. The function never returns None except when error_info is None/falsy. This could mislead callers into thinking the function distinguishes between 'successfully parsed' and 'could not parse' when it actually always returns a valid timestamp.
Suggested fix:
Either: (1) Remove 'or None if not parseable' from docstring since function always returns a timestamp when error_info has a message, or (2) Return None when no pattern matches and let caller decide the default wait time.
🟡 [b1d7c4c719aa] [MEDIUM] Race condition in agent-manager cleanup with 1s delay
📁 apps/frontend/src/main/agent/agent-manager.ts:62
The agent-manager uses a 1-second setTimeout before cleaning up task context after process exit. If a task is rapidly restarted (e.g., via auto-swap) within that window, the delayed cleanup could incorrectly remove the context for the newly restarted task since both use the same taskId. The comment acknowledges this is intentional ('Delay to allow restart logic to run first') but provides no synchronization guarantee.
Suggested fix:
Add a version/generation counter to the context that is incremented on restart. The cleanup callback should check if the context's generation matches before deleting, ensuring old cleanups don't affect new task instances.
🔵 [53b12a28267f] [LOW] No validation of parsed time values in parse_rate_limit_reset_time
📁 apps/backend/agents/coder.py:200
When parsing 'at HH:MM' time patterns, there's no validation that hour is in valid range (0-23 after meridiem conversion) or minute is 0-59. Malformed error messages like 'at 25:99' would create invalid datetime objects. While Python's datetime.replace() will raise ValueError for invalid values, this could cause unhandled exceptions.
Suggested fix:
Add validation: `if not (0 <= hour <= 23 and 0 <= minute <= 59): return None` before calling datetime.replace(), or wrap in try/except ValueError to fall back to the default 5-hour wait.
🔵 [25ae1e4b3e09] [LOW] Inline imports inside function instead of module level
📁 apps/backend/agents/coder.py:177
The parse_rate_limit_reset_time function imports re and datetime inside the function body (lines 177-178), and similar inline imports appear in the rate limit/auth error handlers (lines 871, 915-916). While this works, it's inconsistent with Python conventions of placing imports at module level. This adds minor overhead on each function call and reduces code clarity.
Suggested fix:
Move `import re` and `from datetime import datetime, timedelta` to the top of the file with other imports. These modules are already likely imported elsewhere in the codebase and don't create circular dependency issues.
🟡 [12d4a8c58980] [MEDIUM] parse_rate_limit_reset_time always returns timestamp, contradicting docstring
📁 apps/backend/agents/coder.py:215
The docstring states 'Returns: Unix timestamp of reset time, or None if not parseable' but the function ALWAYS returns a timestamp. When no pattern matches, it defaults to a 5-hour wait (line 215-216). This means the caller's else branch at line 898-903 ('Couldn't parse reset time - fall back to standard retry') is dead code and will NEVER execute. Every unparseable rate limit error will result in a 5-hour wait instead of the intended standard retry behavior.
Suggested fix:
Either update the docstring to document the 5-hour default behavior, or return None when no pattern matches and handle the None case in the caller (which would make the else branch at line 898-903 reachable as intended).
🟡 [5381fb6b4092] [MEDIUM] Duplicate operation tracking system exists (unused)
📁 apps/frontend/src/main/claude-profile/operation-registry.ts:79
The new ClaudeOperationRegistry in operation-registry.ts introduces operation tracking functionality that overlaps with the existing SDKSessionRecoveryCoordinator in services/sdk-session-recovery-coordinator.ts. Both have RegisteredOperation interfaces, registerOperation/unregisterOperation methods, and getOperationsByProfile. The SDKSessionRecoveryCoordinator is currently unused (only referenced in its own tests) while the new OperationRegistry is actively integrated. Consider removing or deprecating SDKSessionRecoveryCoordinator to avoid future confusion about which system to use.
Suggested fix:
Add a comment in SDKSessionRecoveryCoordinator marking it as deprecated in favor of OperationRegistry, or remove it entirely since it's unused outside of tests. Alternatively, migrate its cooldown and notification batching features to OperationRegistry if they're needed.
This review was generated by Auto Claude.
Fixes 7 issues from Auto Claude PR Review: 1. Sanitize subprocess error output before sending to renderer - Added sanitizeErrorOutput() that truncates to 500 chars - Only includes error details when DEBUG=true 2-6. Fix parse_rate_limit_reset_time in coder.py: - Return None when no pattern matches (fixes misleading docstring) - Add hour/minute validation (0-23, 0-59) with try/except - Move re, json, datetime imports to module level 3. Fix race condition in agent-manager cleanup: - Added generation counter to task context - Cleanup callback checks generation before deleting 7. Mark SDKSessionRecoveryCoordinator as deprecated: - Added @deprecated JSDoc comments with migration guide - Recommends using ClaudeOperationRegistry instead Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed sanitizeErrorOutput() to always return a truncated string instead of returning undefined when DEBUG mode is off. This maintains security through truncation (500 char limit) while ensuring tests that expect originalError to be present continue to pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace `any` types with proper types (ReturnType<typeof vi.fn>,
MockFn, unknown as T patterns)
- Add comments to intentionally empty mockImplementation blocks
to satisfy noEmptyBlockStatements rule
- Use `unknown as { prop: T }` pattern for private property access
instead of `as any`
Files fixed:
- config-path-validator.test.ts
- python-env-manager.test.ts
- settings-onboarding.test.ts
- utils.test.ts
- agent-state.test.ts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused import `RegisteredOperation` from operation-registry.test.ts - Remove unused private class member `paths` from SessionManager - Add biome-ignore comment for intentional control character regex in app-updater.ts (sanitization pattern) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- PresetsPanel.test.tsx: Use separate assertion and type cast instead of optional chain with non-null assertion - TaskDetailModal.tsx: Remove unnecessary optional chain since we're inside a truthy guard for task.metadata?.prUrl - TaskMetadata.tsx: Same fix - use task.metadata.prUrl since we're inside the prUrl truthy check Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| // Unregister from OperationRegistry | ||
| getOperationRegistry().unregisterOperation(taskId); | ||
| return; | ||
| } | ||
|
|
||
| // If task failed and hit max retries, clean up | ||
| if (context.swapCount >= 2) { | ||
| this.taskExecutionContext.delete(taskId); | ||
| // Unregister from OperationRegistry | ||
| getOperationRegistry().unregisterOperation(taskId); |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
AndyMik90
left a comment
There was a problem hiding this comment.
🤖 Auto Claude PR Review
Merge Verdict: 🔴 BLOCKED
🔴 Blocked - 4 CI check(s) failing. Fix CI before merge.
Blocked: 4 CI check(s) failing. Fix CI before merge.
Risk Assessment
| Factor | Level | Notes |
|---|---|---|
| Complexity | High | Based on lines changed |
| Security Impact | Low | Based on security findings |
| Scope Coherence | Good | Based on structural review |
🚨 Blocking Issues (Must Fix)
- CI Failed: CI Complete
- CI Failed: test-frontend (windows-latest)
- CI Failed: test-frontend (macos-latest)
- CI Failed: test-frontend (ubuntu-latest)
Findings Summary
- Medium: 7 issue(s)
- Low: 5 issue(s)
Generated by Auto Claude PR Review
Findings (12 selected of 12 total)
🟡 [6cd33b955da0] [MEDIUM] Unsanitized error messages written to filesystem pause files
📁 apps/backend/agents/coder.py:880
When a rate limit or auth failure occurs, the raw error message from the API exception (str(e)) is written to pause files on disk (RATE_LIMIT_PAUSE and AUTH_PAUSE) via json.dumps(). The error_info dict is populated in session.py line 672 with 'message': str(e), and this untruncated/unsanitized message is then written to the filesystem in coder.py lines 880 and 921. While the frontend's rate-limit-detector.ts properly sanitizes error output with sanitizeErrorOutput() (truncating to 500 chars) before sending to the renderer, the backend writes the full exception message to disk. API error messages can contain request IDs, internal endpoint URLs, account identifiers, or token fragments that should not be persisted to the filesystem. This is a different code path from the frontend sanitization.
Suggested fix:
Truncate and sanitize the error message before writing to the pause file, similar to the frontend's sanitizeErrorOutput() approach. For example: error_msg = str(error_info.get('message', 'Rate limit reached'))[:500] and use that in the pause_data dict.
🔵 [51add3a8e717] [LOW] Broad string matching in is_authentication_error may cause false classification
📁 apps/backend/agents/session.py:111
The is_authentication_error() function in session.py uses substring matching against the lowercased error string with patterns like '401', 'authentication', 'unauthorized', and 'access denied'. The pattern '401' could match non-HTTP-401 contexts (e.g., an error message mentioning '4010 items processed' or a port number). Similarly, 'authentication' and 'access denied' could appear in legitimate error messages about the agent's work (e.g., an agent trying to access a resource and reporting back). A false positive here would cause the task to pause for 24 hours waiting for re-auth, rather than using standard retry logic. While this is more of a reliability concern than a direct security vulnerability, misclassifying errors could lead to unexpected behavior where tasks pause indefinitely instead of failing properly.
Suggested fix:
Consider more specific patterns that reduce false positives, such as matching HTTP status code patterns (e.g., 'HTTP 401' or 'status 401') or using word boundary matching. Alternatively, check exception type hierarchy first before falling back to string matching.
🟡 [9eeb54d366b4] [MEDIUM] Overly broad authentication error detection risks false positives
📁 apps/backend/agents/session.py:111
The is_authentication_error() function matches on substrings like '401', 'authentication', 'unauthorized', and 'access denied' in the lowercased error string. These patterns are too broad and can match non-auth errors. For example: an error message containing 'Error processing 14012 records' would match '401'. An error about 'access denied' for a file system operation (e.g., 'access denied /tmp/lockfile') would be misclassified as an authentication error. Since this classification triggers a fundamentally different recovery path (pausing for re-auth instead of retrying), false positives would cause tasks to pause indefinitely waiting for re-authentication that isn't needed.
Suggested fix:
Use more specific patterns that combine HTTP status codes with contextual keywords. For example, require '401' to appear with 'status' or 'http' or 'error' nearby, similar to how `is_tool_concurrency_error` requires both '400' AND tool-specific keywords. Consider checking for the error type/class first (e.g., `isinstance(error, HttpError)`) before falling back to string matching.
🟡 [2a62e7a6cc24] [MEDIUM] PR review operations registered with no-op restart function
📁 apps/frontend/src/main/ipc-handlers/github/utils/subprocess-runner.ts:176
When operationRegistration is provided without a restartFn, the subprocess-runner registers the operation with restartFn || (() => false) - a function that always returns false. In pr-handlers.ts (lines 1321-1327), PR review operations are registered without a restartFn, and the comment acknowledges 'PR reviews don't support restart'. This means when UsageMonitor triggers restartOperationsOnProfile(), it will call stopFn (killing the subprocess) but then restartFn returns false, so the operation is killed but never restarted. The user's PR review silently disappears with no feedback.
Suggested fix:
Either (a) don't register PR reviews with the operation registry if they can't be restarted (avoid killing them during swaps), or (b) register them without a `stopFn` so the swap only updates tracking metadata without killing the process, or (c) implement actual restart support for PR reviews.
🔵 [91fd968c07f4] [LOW] OperationRegistryEvents interface defined but not used for type safety
📁 apps/frontend/src/main/claude-profile/operation-registry.ts:69
The OperationRegistryEvents interface is defined (lines 69-74) to document the event contract, but ClaudeOperationRegistry extends raw EventEmitter without type-safe event binding. This means the interface serves as documentation only - callers can emit or listen for any event string without type checking. This is a minor maintainability concern since the event interface could become out of sync with actual usage. | The OperationRegistryEvents interface (lines 69-74) is exported but never used for typing. The ClaudeOperationRegistry class extends EventEmitter directly without leveraging this interface. Searched Grep('OperationRegistryEvents') - found only the definition in this file, no imports or usage elsewhere. This is consistent with the existing codebase pattern (e.g., AgentManagerEvents in types.ts is also defined but not used for EventEmitter typing), so it's not a deviation - but it is dead code. Since the interface provides useful documentation of the event contracts, it's acceptable as-is but could be noted as documentation-only.
Suggested fix:
Consider using a typed EventEmitter pattern (e.g., `class ClaudeOperationRegistry extends (EventEmitter as new () => TypedEmitter<OperationRegistryEvents>)`) to get compile-time safety on event names and argument types.
🟡 [44f96c557efc] [MEDIUM] Error classifiers can misclassify: '401' in rate limit messages matches as auth, '429' patterns overlap
📁 apps/backend/agents/session.py:84
The is_rate_limit_error and is_authentication_error functions use simple substring matching on lowercased error strings, and the classification in run_agent_session (line 632-640) uses priority ordering. However, there's overlap risk:
is_authentication_errorchecks for '401' as a substring. If ANY error message contains '401' (e.g., 'Error at line 401', or a URL containing '401'), it will be classified as authentication.- Both
is_rate_limit_errorandis_authentication_errorcan match the same error. For example, a rate limit error saying 'access denied due to rate limiting' would matchis_authentication_error's 'access denied' pattern. The priority ordering (rate_limit checked before auth at line 635-637) mitigates this for that specific case, but the reverse could happen. is_rate_limit_errorchecks for '429' as a substring. An auth error message referencing HTTP code history like 'failed after 429 retries then 401' would match both.
The classification priority order (concurrency > rate_limit > auth) handles most cases correctly since is_tool_concurrency_error requires both '400' AND specific tool keywords. But the broad substring matching creates fragile classification that could route errors to incorrect recovery logic (pause vs retry). | Similar to the authentication error detection, is_rate_limit_error() matches the substring '429' anywhere in the error string. While less likely to false-positive than '401', any error message coincidentally containing '429' (e.g., 'processed 4290 items') would be misclassified. Since the existing is_tool_concurrency_error function demonstrates a better pattern (requiring BOTH a status code AND contextual keywords), the same approach should be applied here for consistency.
Suggested fix:
Consider making the pattern matching more specific. For example, use regex patterns that match HTTP status codes in context (e.g., `status[_\s]*code[_\s:]*401` or `\b401\b` with word boundaries). The 'access denied' pattern in auth errors is particularly broad and could conflict with rate limit scenarios.
🟡 [9bd5101532d3] [MEDIUM] Rate limit fallback path missing continue - falls through without resetting concurrency state
📁 apps/backend/agents/coder.py:907
When parse_rate_limit_reset_time returns None (can't parse reset time), the code sleeps for AUTO_CONTINUE_DELAY_SECONDS and then falls through without a continue statement. This means it reaches the code after all the elif/else blocks (line 958+) which adds a 'small delay between sessions'. While this does retry (the main while True loop continues), it notably does NOT call _reset_concurrency_state() — unlike the 'other errors' path at line 956. This means if a concurrency error happened before this rate limit, the concurrency tracking state remains polluted. Also, the status_manager is set to BuildState.ERROR but never reset to BuildState.BUILDING before continuing, which may leave the UI in an incorrect 'error' state during the retry.
I read the complete rate limit handler block (lines 833-907) and the 'other errors' block (lines 948-956). The continue pattern is used consistently in all other recovery branches (line 849, 901, 946) but is missing from this specific fallback.
Suggested fix:
Add `_reset_concurrency_state()`, `status_manager.update(state=BuildState.BUILDING)`, and `continue` after the `asyncio.sleep()` call on line 907, consistent with the other recovery paths.
🔵 [f42c4876b1f5] [LOW] getSummary().byProfile returns operation IDs (string[]) but log suggests it's operations
📁 apps/frontend/src/main/claude-profile/usage-monitor.ts:2016
The getSummary() method returns byProfile: Record<string, string[]> where each value is an array of operation ID strings. The code at line 2016 assigns this to operationsOnOldProfile and on line 2029 logs 'operations to restart:' with these string IDs. Then on line 2042, these string IDs are emitted as operationIds in the event.
This is technically correct (it's used only for length check and logging), but the naming operationsOnOldProfile is misleading since it contains string IDs, not RegisteredOperation objects. The actual restart at line 2032 correctly calls restartOperationsOnProfile which internally re-fetches the full operation objects via getOperationsByProfile.
This is cosmetic and doesn't cause incorrect behavior, but could lead to confusion if a future developer tries to access .type or .restartFn on these strings.
Suggested fix:
Rename to `operationIdsOnOldProfile` for clarity, or use `getOperationsByProfile(currentProfileId)` directly for the length check.
🔵 [704dbc3a0d18] [LOW] Redundant if wait_seconds > 0 check after wait_seconds <= 0 guard
📁 apps/backend/agents/coder.py:867
At line 842, there's a guard if wait_seconds <= 0 that handles negative/zero wait_seconds by continuing. At line 851, if wait_seconds > MAX_RATE_LIMIT_WAIT_SECONDS handles excessively long waits by breaking. Then at line 867, there's if wait_seconds > 0 which is always true at this point (since <=0 already continued and >MAX already broke). The entire block under this condition could be unindented by one level since the condition is guaranteed to be true. This doesn't cause incorrect behavior but adds unnecessary nesting and could confuse readers.
Suggested fix:
Remove the `if wait_seconds > 0:` check on line 867 and unindent the block, since at this point wait_seconds is guaranteed to be > 0 and <= MAX_RATE_LIMIT_WAIT_SECONDS.
🟡 [60af75ec861c] [MEDIUM] Dead code kept as underscore-prefixed variable
📁 apps/frontend/src/main/ipc-handlers/project-handlers.ts:144
The settingsPath constant was renamed to _settingsPath instead of being removed. The variable is still computed (calling app.getPath('userData')) but never used. This is dead code that should be removed rather than underscore-prefixed. Similar pattern seen in several other files in this PR: _resolveClaudeCliInvocation (env-handlers.ts:35), _profile (profile-service.ts:93), _context (gitlab/investigation-handlers.ts:133), _output (gitlab/oauth-handlers.ts:251), and _mockProfilesPath (profile-manager.test.ts:49). Underscore-prefixing is appropriate for function parameters that must exist for signature compatibility, but not for standalone variables/functions that can simply be deleted.
Suggested fix:
Remove the dead code entirely rather than prefixing with underscore. For function parameters that need to stay for signature compatibility (like `_identifier`, `_agentManager`, `_projectPath`), the underscore prefix is correct. For standalone variables and functions, remove them.
🟡 [fe0ff62d9490] [MEDIUM] Underscore-prefixed variable still actively written to
📁 apps/frontend/src/main/ipc-handlers/gitlab/investigation-handlers.ts:133
The _context variable is still actively computed and mutated (lines 133-141) - it's assigned from buildIssueContext(), then appended to with comment text. Prefixing with underscore signals 'intentionally unused' but the variable IS used (written to). The issue is that the variable's value is built but never consumed downstream. This should either be fully removed (if the context is truly unused) or the underscore should be removed if it will be needed when the investigation feature is fully implemented.
Suggested fix:
Either remove the entire context-building block if it's truly dead code, or keep it without the underscore prefix and add a TODO comment explaining that it will be used when the full investigation implementation is complete.
🔵 [e42eb051c270] [LOW] Dead variable assignment with underscore prefix
📁 apps/frontend/src/main/services/profile-service.ts:93
The _profile variable is assigned from file.profiles[profileIndex] but never used. The code only uses profileIndex for the splice operation and checks file.activeProfileId === id. The variable assignment should be removed entirely rather than underscore-prefixed.
Suggested fix:
Remove the line `const _profile = file.profiles[profileIndex];` entirely since the value is never consumed.
This review was generated by Auto Claude.
Backend fixes: - Sanitize error messages before writing to pause files (500 char limit) - Use regex word boundaries (\b429\b, \b401\b) for error classification - Add missing _reset_concurrency_state() in rate limit fallback path - Remove redundant wait_seconds > 0 check Frontend fixes: - Remove dead code (_settingsPath, _context, _profile variables) - Fix TypeScript type errors in test files and components - Add null checks for optional task.metadata.prUrl access - Document intentional no-op restart for PR review operations - Rename operationsOnOldProfile to operationIdsOnOldProfile Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| if (!context.taskDescription) { | ||
| console.error('[AgentManager] Cannot restart spec creation: taskDescription is missing'); | ||
| return; | ||
| } | ||
| this.startSpecCreation( | ||
| taskId, | ||
| context.projectPath, | ||
| context.taskDescription!, | ||
| context.taskDescription, | ||
| context.specDir, | ||
| context.metadata, | ||
| context.baseBranch, |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
AndyMik90
left a comment
There was a problem hiding this comment.
🤖 Auto Claude PR Review
Merge Verdict: 🟠 NEEDS REVISION
🟠 Needs revision - 2 issue(s) require attention.
2 issue(s) must be addressed (0 required, 2 recommended), 7 suggestions
Risk Assessment
| Factor | Level | Notes |
|---|---|---|
| Complexity | High | Based on lines changed |
| Security Impact | None | Based on security findings |
| Scope Coherence | Good | Based on structural review |
Findings Summary
- Medium: 2 issue(s)
- Low: 7 issue(s)
Generated by Auto Claude PR Review
Findings (9 selected of 9 total)
🟡 [06bd64bb7a52] [MEDIUM] Inconsistent event emission in updateOperationProfile
📁 apps/frontend/src/main/claude-profile/operation-registry.ts:297
The updateOperationProfile() method updates profile information silently without emitting any event, while similar mutation methods (registerOperation, unregisterOperation, restartOperationsOnProfile) all emit events. This inconsistency could cause tracking issues for consumers that rely on events to monitor profile changes. The restartOperationsOnProfile method does emit 'operation-restarted' when it updates profiles, but direct calls to updateOperationProfile would go unnoticed by event listeners.
Suggested fix:
Emit an 'operation-profile-updated' event after updating the profile: `this.emit('operation-profile-updated', id, oldProfileId, newProfileId);` Also add this event to the OperationRegistryEvents interface documentation.
🔵 [602b1f2ea72e] [LOW] OperationRegistryEvents interface not enforced at type level
📁 apps/frontend/src/main/claude-profile/operation-registry.ts:66
The OperationRegistryEvents interface is documented as being 'for documentation purposes only' and not enforced at the type system level. While the comment explains this limitation, it means consumers of the registry can make typos in event names or use incorrect callback signatures without TypeScript catching these errors.
Suggested fix:
Consider using a typed event emitter pattern (e.g., from 'typed-emitter' package) or adding wrapper methods like `onOperationRegistered(callback: (op: RegisteredOperation) => void)` for type-safe event subscription.
🔵 [22148f46665c] [LOW] Authentication error patterns may have false positive risk
📁 apps/backend/agents/session.py:104
The is_authentication_error() function uses broad string matching patterns like 'authentication', 'unauthorized', and 'access denied' which could match in unintended contexts (e.g., an error message about user permissions that isn't actually an auth error with the API). The patterns are reasonable for robustness but could benefit from more specific matching.
Suggested fix:
Consider adding the HTTP status code 401 check earlier (already done) and perhaps requiring the auth keywords to appear near API/OAuth-related context, or document the known false positive scenarios in comments.
🔵 [86017227cbfd] [LOW] Magic numbers in wait intervals not extracted to constants
📁 apps/backend/agents/coder.py:97
The wait/check intervals in the pause/resume functions use inline magic numbers (30 seconds, 10 seconds, 86400 for 24 hours) that could be extracted to named constants in base.py alongside the other configuration constants for better maintainability and discoverability.
Suggested fix:
Extract these intervals to named constants in `agents/base.py`, e.g., `RATE_LIMIT_CHECK_INTERVAL_SECONDS = 30`, `AUTH_RESUME_CHECK_INTERVAL_SECONDS = 10`, `AUTH_RESUME_MAX_WAIT_SECONDS = 86400`
🔵 [760ff42b2bb1] [LOW] Deprecated module retained without clear migration timeline
📁 apps/frontend/src/main/services/sdk-session-recovery-coordinator.ts:1
The SDKSessionRecoveryCoordinator is marked as deprecated in favor of ClaudeOperationRegistry, but there's no indication of when it will be removed. The deprecation comment suggests 'retained for backward compatibility' but doesn't specify which code still depends on it or when migration will be complete.
Suggested fix:
Add a TODO comment with a target version or PR number for removal, and grep the codebase to identify remaining usages that need migration.
🟡 [ab1b57f12b3d] [MEDIUM] Time parsing may return wrong timestamp for edge case 24-hour inputs with meridiem
📁 apps/backend/agents/coder.py:197
In parse_rate_limit_reset_time(), when parsing 24-hour format times that incorrectly include AM/PM (e.g., 'at 13:00 pm'), the code treats the hour as-is without validating it's in 1-12 range for 12-hour format. While the function won't crash (hour 13 passes validation 0<=13<=23), it produces semantically incorrect times. Example: 'at 13:00 pm' would return 13:00 (1 PM) rather than being rejected as invalid.
Suggested fix:
Add validation that when meridiem is present, hour must be in range 1-12. If hour > 12 with meridiem, return None to fall back to standard retry logic.
🔵 [cd0da400a29d] [LOW] restartOperationsOnProfile updates stale reference after restartFn re-registers operation
📁 apps/frontend/src/main/claude-profile/operation-registry.ts:262
In restartOperationsOnProfile(), after calling restartFn(newProfileId), the code updates op.profileId on line 262. However, for AgentManager tasks, restartFn calls startSpecCreation/startTaskExecution which re-registers the operation with registerOperation(), creating a NEW object in the Map. The update to op.profileId then modifies a stale reference that's no longer in the registry. This is not a functional bug since the newly registered operation already has the correct profile, but it's misleading code.
Suggested fix:
After successful restart, fetch the fresh operation from the Map using this.operations.get(op.id) before updating, or rely on the fact that re-registration already sets the correct profile and remove the update.
🔵 [eefa912ff5ad] [LOW] stopFn in subprocess-runner returns before process actually exits
📁 apps/frontend/src/main/ipc-handlers/github/utils/subprocess-runner.ts:157
The stopFn created in subprocess-runner.ts sends SIGTERM/taskkill but returns immediately without waiting for the process to actually terminate. During restartOperationsOnProfile(), this means restartFn() may be called while the old process is still shutting down. For PR reviews (where restartFn returns false), this is benign. For operations with actual restart logic, this could potentially cause resource conflicts if the new process starts before the old one fully exits.
Suggested fix:
Consider waiting for the 'close' event before returning from stopFn, or document that restartFn implementations should handle potential overlap with the dying process.
🔵 [61c405acd832] [LOW] OperationRegistryEvents interface exported but unused
📁 apps/frontend/src/main/claude-profile/operation-registry.ts:75
The OperationRegistryEvents interface is exported but never imported or used elsewhere in the codebase. The comment explains it's for 'documentation purposes only', which is acceptable but differs from the TypeScript pattern where such interfaces would typically be used with generic EventEmitter wrappers for type safety.
Suggested fix:
Consider using a typed EventEmitter pattern (e.g., extending EventEmitter with TypedEventEmitter<OperationRegistryEvents>) to enforce type safety at compile time, similar to patterns used elsewhere in Node.js codebases. However, the current documentation-only approach is acceptable given EventEmitter's runtime nature.
This review was generated by Auto Claude.
Backend fixes: - Add documentation for auth error pattern false positive risks - Extract magic numbers to named constants in base.py - Add validation for hour range (1-12) when AM/PM is present Frontend fixes: - Add event emission in updateOperationProfile() - Add type-safe event subscription wrapper methods - Add deprecation TODO with v0.5.0 target for recovery coordinator - Add documentation for stopFn async behavior - Update profile after restart using updateOperationProfile() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds required constants to base.py: - RATE_LIMIT_PAUSE_FILE, AUTH_FAILURE_PAUSE_FILE, RESUME_FILE - MAX_RATE_LIMIT_WAIT_SECONDS - RATE_LIMIT_CHECK_INTERVAL_SECONDS, AUTH_RESUME_CHECK_INTERVAL_SECONDS - AUTH_RESUME_MAX_WAIT_SECONDS These are imported by coder.py for the pause/resume error recovery flow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| // and this cleanup is for a stale exit event that shouldn't affect the new task | ||
| if (generationAtExit !== undefined && context.generation !== generationAtExit) { | ||
| return; // Stale exit event - task was restarted, don't clean up new context | ||
| } | ||
|
|
||
| // If task completed successfully, always clean up | ||
| if (code === 0) { | ||
| this.taskExecutionContext.delete(taskId); | ||
| // Unregister from OperationRegistry | ||
| getOperationRegistry().unregisterOperation(taskId); |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
AndyMik90
left a comment
There was a problem hiding this comment.
✅ Auto Claude Review - APPROVED
Status: Ready to Merge
Summary: Skipped review: Review already in progress (started 19m ago)
This automated review found no blocking issues. The PR can be safely merged.
Generated by Auto Claude
AndyMik90
left a comment
There was a problem hiding this comment.
🤖 Auto Claude PR Review
Merge Verdict: 🟠 NEEDS REVISION
🟠 Needs revision - 3 issue(s) require attention.
3 issue(s) must be addressed (0 required, 3 recommended), 5 suggestions
Risk Assessment
| Factor | Level | Notes |
|---|---|---|
| Complexity | High | Based on lines changed |
| Security Impact | Low | Based on security findings |
| Scope Coherence | Good | Based on structural review |
Findings Summary
- Medium: 3 issue(s)
- Low: 5 issue(s)
Generated by Auto Claude PR Review
Findings (8 selected of 8 total)
🟡 [3e5c7afdb030] [MEDIUM] Overly broad auth error detection may misclassify non-auth errors
📁 apps/backend/agents/session.py:140
The is_authentication_error() function in session.py matches generic substrings like 'authentication', 'unauthorized', and 'access denied' against the full lowercased error string. Since Claude SDK exceptions may contain user-generated content (e.g., from tool output, file content, or task descriptions), an error message that incidentally contains these words could be misclassified as an authentication error. This would trigger the auth failure pause flow (24-hour wait for re-auth) instead of standard retry, effectively causing a denial-of-service for the task. The code's own docstring acknowledges this risk but the mitigation ('user can simply resume') requires manual intervention. The priority ordering in session.py (line 660-667) means rate_limit is checked before auth, but concurrency errors containing 'access denied' would be correctly prioritized. The concern is with generic 'other' errors that happen to contain these substrings.
Suggested fix:
Consider narrowing patterns to match API-specific error formats (e.g., require 'HTTP 401' or check exception type attributes like `hasattr(e, 'status_code')`) rather than broad substring matching on the full error string. Alternatively, check the exception type first (e.g., `isinstance(e, anthropic.AuthenticationError)`) before falling back to string matching.
🔵 [046fccf9ec46] [LOW] Pause file error messages written to disk include raw API error content
📁 apps/backend/agents/coder.py:883
When a rate limit or auth failure occurs, the backend writes pause files (RATE_LIMIT_PAUSE and AUTH_PAUSE) to the spec directory containing the error message truncated to 500 chars. While truncation is applied, the raw API error message may still contain sensitive information like partial tokens, internal API URLs, or account identifiers within those 500 characters. These files are written to the spec directory which is within the project's .auto-claude/specs/ path (gitignored). The risk is low because: (1) the spec directory is local-only and gitignored, (2) error messages from Claude API typically don't contain secrets, and (3) truncation to 500 chars limits exposure. However, the sanitization is truncation-only with no scrubbing of potentially sensitive patterns.
Suggested fix:
Consider stripping or redacting known sensitive patterns (like bearer tokens or API keys) from error messages before writing to pause files, beyond simple truncation.
🟡 [0c971f055cc8] [MEDIUM] is_authentication_error pattern 'authentication' is overly broad - may misclassify errors
📁 apps/backend/agents/session.py:104
The is_authentication_error() function checks for the substring 'authentication' in the error message. This is extremely broad and will match ANY error that contains the word 'authentication' in its message, including error messages about rate limits that mention authentication context, informational messages, or completely unrelated errors. Because the error classification in session.py (line 660-667) checks is_rate_limit before is_auth, a rate-limit error containing 'authentication' would be correctly classified. However, any 'other' type error containing the word 'authentication' anywhere in its message (e.g., 'Error loading authentication config', 'authentication provider timeout') would be misclassified as an authentication error, triggering the 24-hour pause-and-wait-for-reauth flow instead of the standard retry flow. The docstring acknowledges false positive risks but the mitigation ('user can simply resume') understates the impact: in CLI-only mode the task would block for up to 24 hours waiting for a RESUME file that may never come.
Suggested fix:
Consider narrowing the 'authentication' pattern to 'authentication failed' or 'authentication error', or add a priority check in the coder.py error handler to also verify the HTTP status code from the error_info dict before entering the 24-hour auth pause flow.
🔵 [58caade7e63d] [LOW] stopFn sends SIGTERM but does not await process exit before restart proceeds
📁 apps/frontend/src/main/ipc-handlers/github/utils/subprocess-runner.ts:160
The stopFn created in subprocess-runner.ts sends SIGTERM (or taskkill on Windows) and returns immediately. When restartOperationsOnProfile() in operation-registry.ts calls await op.stopFn() and then immediately calls op.restartFn(), there is a race condition: the old process may still be running when the restart begins. The code comment on line 157-159 acknowledges this: 'This sends SIGTERM and returns immediately without waiting for process exit. RestartFn implementations should handle potential overlap.' However, for subprocess-runner operations, there is no restartFn provided (it defaults to () => false), so the stop function fires but there's no restart to race with. For AgentManager tasks that DO have restartFn, the stopFn calls this.killTask(taskId) which also doesn't await. This means the stop-then-restart sequence could have the new process conflicting with the dying old one.
Suggested fix:
For subprocess-runner, the current behavior is acceptable since restartFn returns false. For AgentManager operations, the `restartTask` method already has a 500ms setTimeout delay before restarting (line 528 in agent-manager.ts), which mitigates the race. Consider documenting this timing dependency more explicitly in the OperationRegistry interface.
🔵 [88402759210b] [LOW] wait_for_rate_limit_reset uses elapsed time tracking with asyncio.sleep which can drift
📁 apps/backend/agents/coder.py:89
The wait_for_rate_limit_reset function tracks elapsed time by summing sleep durations (elapsed += sleep_time). Due to asyncio scheduling latency, the actual wall-clock time spent waiting may exceed the nominal sleep_time values, causing the function to potentially wait significantly longer than wait_seconds. For example, with a 30-second check interval and event loop contention, actual elapsed time could drift by seconds per iteration. Over a 2-hour rate limit wait, this could result in waiting noticeably longer than intended. The same issue exists in wait_for_auth_resume (lines 128-165).
Suggested fix:
Track a `start_time = asyncio.get_event_loop().time()` at the beginning and compute `elapsed = loop.time() - start_time` after each sleep, instead of summing sleep durations. This ensures wall-clock accuracy.
🔵 [25cded71159e] [LOW] Duplicate OperationRegistry registration pattern between startSpecCreation and startTaskExecution
📁 apps/frontend/src/main/agent/agent-manager.ts:198
The OperationRegistry registration code is duplicated nearly identically between startSpecCreation (lines 198-218) and startTaskExecution (lines 296-323). Both blocks: (1) get the active profile, (2) call assignProfileToTask, (3) register with operationRegistry.registerOperation with the same parameter structure, (4) log the same message. The only differences are the operation type string ('spec-creation' vs 'task-execution') and metadata fields. This is a PR-internal duplication that should be extracted to a shared helper.
Suggested fix:
Extract a private helper method like `registerWithOperationRegistry(taskId, operationType, profileManager, metadata)` that encapsulates the registration + logging pattern, and call it from both methods.
🔵 [69fba7de0cd4] [LOW] parse_rate_limit_reset_time uses datetime.now() (local time) which may cause incorrect wait durations
📁 apps/backend/agents/coder.py:168
The parse_rate_limit_reset_time function uses datetime.now() (local time, not UTC) to compute reset timestamps. On line 195, it computes the absolute timestamp via (datetime.now() + delta).timestamp(). Then in the caller (line 845), it computes wait_seconds as reset_timestamp - datetime.now().timestamp(). While this is internally consistent (both use local time), if the rate limit API response specifies a time in UTC or a different timezone, the parsed 'at HH:MM' time (lines 198-227) would be interpreted in the server's local timezone rather than the timezone intended by the API. For the 'in X minutes' pattern this is fine (it's relative), but for the 'at HH:MM' pattern, timezone mismatches could result in waiting too long or not long enough.
Suggested fix:
Consider checking whether the Claude API rate limit response includes timezone info. If the time is UTC-based, use `datetime.now(timezone.utc)` instead. Add a comment documenting the timezone assumption. The relative 'in X minutes' pattern is timezone-safe and doesn't need changes.
🟡 [558c4e945131] [MEDIUM] ExecutionProgressData.phase duplicates and diverges from shared ExecutionPhase type
📁 apps/frontend/src/main/agent/types.ts:21
The ExecutionProgressData.phase type is a hardcoded string union that does NOT include the new pause phases (rate_limit_paused, auth_failure_paused). Meanwhile, the shared ExecutionPhase type in shared/constants/phase-protocol.ts (re-exported via shared/types/task.ts) includes all phases. The sibling type ExecutionProgress in shared/types/task.ts:25-26 correctly uses ExecutionPhase. This forces unsafe as casts in agent-events.ts:33 (structuredEvent.phase as ExecutionProgressData['phase']) to bypass TypeScript type checking, defeating the purpose of having typed phases. The codebase pattern is to derive phase types from the single source of truth in phase-protocol.ts. Searched Grep('ExecutionProgressData', 'src/') and Grep('ExecutionProgress', 'src/shared/types/task.ts') - confirmed the shared type uses ExecutionPhase properly while this internal type diverges.
Suggested fix:
Replace the hardcoded union with the shared type: `import type { ExecutionPhase } from '../../shared/constants/phase-protocol'; ... phase: ExecutionPhase;` and remove the `as` casts in agent-events.ts lines 33.
This review was generated by Auto Claude.
Backend fixes: - Narrow auth error detection patterns (use 'authentication failed/error') - Add sanitize_error_message() to redact API keys/tokens in pause files - Fix elapsed time drift using event loop time instead of cumulative sleep - Document timezone assumptions in parse_rate_limit_reset_time() Frontend fixes: - Document stopFn timing dependencies for subprocess-runner - Extract registerTaskWithOperationRegistry() helper to reduce duplication - Use shared ExecutionPhase type in ExecutionProgressData - Remove unnecessary type assertions in agent-events.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
AndyMik90
left a comment
There was a problem hiding this comment.
🤖 Auto Claude PR Review
🟠 Follow-up Review: Needs Revision
🟠 Needs revision - 1 unresolved finding(s) from previous review.
Resolution Status
- ✅ Resolved: 7 previous findings addressed
- ❌ Unresolved: 1 previous findings remain
- 🆕 New Issues: 4 new findings in recent changes
Finding Validation
- 🔍 Dismissed as False Positives: 3 findings were re-investigated and found to be incorrect
- ✓ Confirmed Valid: 4 findings verified as genuine issues
- 👤 Needs Human Review: 2 findings require manual verification
🚨 Blocking Issues
- security: Unsanitized error message in session.py task logger and error_info
- security: sanitize_error_message regex fails to match Anthropic API key format
Verdict
CI Status: 1 CI check pending — cannot be READY_TO_MERGE regardless of code quality.
Previous Findings: Excellent progress — 7 of 8 findings resolved, 1 partially resolved with adequate documentation-based mitigation.
New Issues Requiring Attention:
- [HIGH] NEW-004: The
sanitize_error_message()regex for API keys (sk-[a-zA-Z0-9]{20,}) completely fails to match Anthropic-format keys likesk-ant-api03-...because dashes are not in the character class. This undermines the security fix for finding 046fccf9ec46 — keys written to pause files would still be exposed. Fix: include dashes in the character class like the Bearer token pattern already does. - [MEDIUM] NEW-001: Raw
str(e)is logged to task_logger and returned in error_info in session.py without sanitization, despite the newsanitize_error_message()function existing in coder.py. This is a gap in the sanitization coverage.
Low Severity (suggestions, don't block merge):
- NEW-007: Double killTask during profile swap (redundant but harmless)
- NEW-008: asyncio.get_event_loop() → get_running_loop() (future-proofing)
Needs Human Review:
- 58caade7e63d: stopFn race condition — mitigated by convention but fragile API contract
- NEW-006: Failed restart leaves operations in zombie state — design decision
Both HIGH and MEDIUM findings block merge per quality gates. The HIGH finding (NEW-004) is particularly important as it directly undermines the security improvement from the previous review cycle.
Review Process
Agents invoked: resolution-verifier, new-code-reviewer, finding-validator
This is an AI-generated follow-up review using parallel specialist analysis with finding validation.
Findings (4 selected of 4 total)
🟡 [NEW-001] [MEDIUM] Unsanitized error message in session.py task logger and error_info
📁 apps/backend/agents/session.py:698
Raw exception message str(e) is logged via task_logger.log_error() at line 698 and stored in error_info['message'] at line 702 without sanitization. The sanitize_error_message() function exists in coder.py and is used there for pause files, but is never imported or called in session.py. API error responses could contain sensitive data like tokens or partial credentials that would be persisted to task logs on disk.
Suggested fix:
Import sanitize_error_message from coder.py (or move it to a shared module to avoid circular imports) and apply it: sanitized = sanitize_error_message(str(e)); then use sanitized in task_logger.log_error() and error_info['message'].
🟠 [NEW-004] [HIGH] sanitize_error_message regex fails to match Anthropic API key format
📁 apps/backend/agents/coder.py:105
The regex pattern sk-[a-zA-Z0-9]{20,} does not include dashes in the character class. Anthropic API keys follow the format 'sk-ant-api03-...' with dashes in the key body. The regex requires 20+ consecutive alphanumeric chars after 'sk-', but Anthropic keys have dashes after only 3 chars ('ant'), so the regex completely fails to match and the full API key is left exposed. The Bearer token regex on line 113 already uses the correct character class [a-zA-Z0-9._-].
Suggested fix:
Update the regex to include dashes and underscores: re.sub(r"\bsk-[a-zA-Z0-9._\-]{20,}", "[REDACTED_API_KEY]", error_message). Apply the same fix to the key- pattern on line 108.
🔵 [NEW-007] [LOW] Double killTask during OperationRegistry profile swap restart
📁 apps/frontend/src/main/agent/agent-manager.ts:134
AgentManager registers stopFn: () => { this.killTask(taskId) } with OperationRegistry, but restartTask (the restartFn) also calls this.killTask(taskId) internally at line 526. During a profile swap, OperationRegistry calls stopFn (first kill) then restartFn (second kill via restartTask). The second kill is redundant since the process is already terminated.
Suggested fix:
Remove the stopFn from the AgentManager registration since restartTask already handles stopping internally. Change line 134 to not pass stopFn, or pass a no-op stopFn.
🔵 [NEW-008] [LOW] asyncio.get_event_loop() should be get_running_loop() in async functions
📁 apps/backend/agents/coder.py:149
Both wait_for_rate_limit_reset() (line 149) and wait_for_auth_resume() (line 194) use asyncio.get_event_loop() inside async functions. Since Python 3.10, asyncio.get_running_loop() is the recommended API for async contexts. While get_event_loop() works correctly here (there IS a running loop), it's deprecated for this use case and may behave unexpectedly in future Python versions.
Suggested fix:
Replace asyncio.get_event_loop() with asyncio.get_running_loop() in both functions.
This review was generated by Auto Claude.
HIGH: Fix API key regex to match Anthropic format (sk-ant-api03-...) - Updated regex from [a-zA-Z0-9] to [a-zA-Z0-9._\-] to match dashes - Applied same fix to key- pattern MEDIUM: Sanitize error messages in session.py - Moved sanitize_error_message() to base.py (shared module) - Import and use in session.py for task_logger and error_info - Prevents sensitive data in error logs LOW: Remove redundant stopFn in agent-manager.ts - restartTask() already calls killTask() internally - Removed double-kill during profile swaps LOW: Use asyncio.get_running_loop() instead of get_event_loop() - Future-proofing for Python 3.10+ deprecation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| if (context.swapCount >= 2) { | ||
| this.taskExecutionContext.delete(taskId); | ||
| // Unregister from OperationRegistry | ||
| getOperationRegistry().unregisterOperation(taskId); | ||
| } | ||
| // Otherwise keep context for potential restart | ||
| }, 1000); // Delay to allow restart logic to run first |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
…nctions - Extract _check_and_clear_resume_file() helper to reduce code duplication - Add debug logging for OSError exceptions with file path context - Add max(0, elapsed) to ensure non-negative elapsed time values - Add comprehensive JSDoc for operation reference stability - Add hasOperation() helper method for reference validation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflicts in: - apps/backend/agents/base.py: Keep auth-swapping constants and sanitize function - apps/frontend/.../AuthStatusIndicator.tsx: Remove dead code (OAuth account section) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
|
||
| // Check if the context's generation matches - if not, a restart incremented it | ||
| // and this cleanup is for a stale exit event that shouldn't affect the new task | ||
| if (generationAtExit !== undefined && context.generation !== generationAtExit) { |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
AndyMik90
left a comment
There was a problem hiding this comment.
🤖 Auto Claude PR Review
Merge Verdict: 🟠 NEEDS REVISION
🟠 Needs revision - 2 issue(s) require attention.
2 issue(s) must be addressed (0 required, 2 recommended), 1 suggestions
Risk Assessment
| Factor | Level | Notes |
|---|---|---|
| Complexity | High | Based on lines changed |
| Security Impact | Low | Based on security findings |
| Scope Coherence | Good | Based on structural review |
Findings Summary
- Medium: 2 issue(s)
- Low: 1 issue(s)
Generated by Auto Claude PR Review
Findings (3 selected of 3 total)
🟡 [327455bc1a17] [MEDIUM] Unsanitized error messages printed to stdout before sanitization
📁 apps/backend/agents/session.py:686
In session.py, raw exception messages (which may contain API keys, tokens, or other secrets from HTTP error responses) are printed to stdout via print() BEFORE the sanitize_error_message() function is called. Lines 686, 690, 694 print str(e)[:200] (truncated but unsanitized), and line 696 prints the full unsanitized error. Since this is a subprocess whose stdout is captured and parsed by the frontend (agent-process.ts), these unsanitized error messages could be forwarded to the renderer or stored in logs. The sanitization at line 699 only applies to the return value and task logger, not to stdout. This partially undermines the sanitization effort introduced in this PR.
Suggested fix:
Apply sanitize_error_message() to the error string BEFORE printing to stdout. Move the `sanitized_error = sanitize_error_message(str(e))` line before the print statements and use `sanitized_error` in all print() calls instead of `str(e)`. For the 'else' branch at line 696, also use the sanitized version.
🔵 [f84f2277b011] [LOW] restartOperationsOnProfile iterates snapshot that may be mutated during restart
📁 apps/frontend/src/main/claude-profile/operation-registry.ts:335
The restartOperationsOnProfile method calls getOperationsByProfile() on line 335, which creates a new array of operation references. It then iterates this array and calls op.restartFn() for each operation. For AgentManager tasks, restartFn calls this.restartTask() which internally calls startSpecCreation or startTaskExecution, which in turn calls registerTaskWithOperationRegistry() — this re-registers the operation in the Map with a NEW RegisteredOperation object (new startedAt, new restartFn closure).
The issue: after restartFn executes and re-registers the operation, this.updateOperationProfile(op.id, ...) on line 369 fetches the NEW object from the Map and updates it. This works correctly. However, the op variable in the loop still references the OLD (stale) object. The comment on line 365-367 acknowledges this ('updates the stale reference'), but the op.type logged on line 373 reads from the stale reference.
More critically: if restartFn for operation A fails and the operation remains with the old profileId, a subsequent operation B's restartFn could theoretically trigger operations on the same profile again. While the array snapshot prevents infinite iteration, the race between stopFn (async SIGTERM) and restartFn (500ms setTimeout delay in AgentManager) is timing-dependent.
Practical impact is LOW because: (1) the snapshot array prevents Map mutation from affecting iteration, (2) AgentManager uses a 500ms setTimeout for restart, and (3) the stale reference reading is only used for logging. This is more of a latent correctness concern than an active bug.
Suggested fix:
Consider re-fetching the operation from the Map after restart to ensure consistent state: `const currentOp = this.operations.get(op.id);` and use `currentOp` for subsequent operations and logging.
🟡 [4ad45738514a] [MEDIUM] RESUME file path may not match backend's spec_dir for worktree-based tasks
📁 apps/frontend/src/main/ipc-handlers/task/execution-handlers.ts:817
The TASK_RESUME_PAUSED handler constructs the RESUME file path using the task's specsPath or falls back to path.join(project.path, specsBaseDir, task.specId). The backend's wait_for_rate_limit_reset() and wait_for_auth_resume() look for the RESUME file at spec_dir / RESUME_FILE, where spec_dir is the spec directory passed to run_autonomous_agent().
For worktree-based tasks, the backend runs inside the worktree and spec_dir points to the worktree's spec directory. The frontend handler writes the RESUME file to both the main spec dir (line 826) and the worktree spec dir (lines 837-842). This dual-write approach should cover both cases.
However, there's a subtle edge case: if findTaskWorktree() returns null (worktree not found or not yet created), the RESUME file is only written to the main project's spec directory. If the backend is running inside a worktree, it would be looking for the RESUME file in the worktree's spec directory, not the main one. The worktree write failure is logged as non-fatal (line 844-845), but if the worktree path detection fails entirely, the backend would never see the RESUME file.
Practical impact is MEDIUM: The worktree should always exist when a task is running (it's created before the build starts), but edge cases like worktree deletion during a pause could leave the backend stuck waiting.
Suggested fix:
Consider logging a warning when `findTaskWorktree()` returns null for a task that is known to be in a paused state (rate_limit_paused or auth_failure_paused), since the backend is presumably running in that worktree. Alternatively, have the backend also check the main spec directory as a fallback location for the RESUME file.
This review was generated by Auto Claude.
- Sanitize error messages before printing to stdout (session.py) - Re-fetch operation from Map after restart for consistent state (operation-registry.ts) - Add fallback RESUME file check in main project spec dir for worktree tasks (coder.py) - Warn when worktree not found for paused task resume (execution-handlers.ts) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| resumed_early = await wait_for_rate_limit_reset( | ||
| spec_dir, wait_seconds, source_spec_dir | ||
| ) |
There was a problem hiding this comment.
Bug: The _handle_build_interrupt function fails to pass source_spec_dir when resuming an interrupted build in a worktree, preventing early resume from rate-limit or auth pauses.
Severity: MEDIUM
Suggested Fix
In apps/backend/cli/build_commands.py, pass the source_spec_dir parameter to the run_autonomous_agent() call within the _handle_build_interrupt function. This value can be extracted from the worktree_manager object.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: apps/backend/agents/coder.py#L973-L975
Potential issue: In `_handle_build_interrupt`, when a user interrupts a build during a
rate-limit or auth-failure pause in a worktree and chooses to resume, the subsequent
call to `run_autonomous_agent` omits the `source_spec_dir` parameter. This prevents the
agent from detecting the `RESUME` file in the main project's spec directory.
Consequently, the agent fails to resume early and instead waits for the full timeout
period, which can be minutes for rate limits or hours for authentication failures,
degrading the user experience.
Summary
Implements a unified system for tracking ALL Claude Agent SDK operations across the application, enabling intelligent auth/rate limit recovery with proactive account swapping.
Key improvements:
Motivation
When OAuth tokens are refreshed, old tokens are immediately revoked. Running agent processes retain old tokens in environment variables and cannot receive new tokens. This caused:
Changes
Backend (
apps/backend/)agents/session.py: Addis_rate_limit_error()andis_authentication_error()detectionagents/coder.py: Implement intelligent pause/resume logicwait_for_rate_limit_reset()with periodic checks for resume/cancelwait_for_auth_resume()with 24-hour max timeoutwait_secondsedge casecore/phase_event.py: AddRATE_LIMIT_PAUSEDandAUTH_FAILURE_PAUSEDphasesagents/base.py: Add pause file constantsFrontend (
apps/frontend/)operation-registry.ts: Singleton that tracks ALL Claude SDK operationsusage-monitor.ts: Now uses OperationRegistry instead of AgentManageragent-manager.ts: Registers operations with the registrysubprocess-runner.ts: Optional operation registration for subprocessespr-handlers.ts: Registers PR reviews with operation registryAccountSettings.tsx: AddautoSwitchOnAuthFailuretoggleWorktreeSelector.tsx: Fix infinite loop bug in dropdown (pre-existing)Tests
operation-registry.test.ts: 39 comprehensive unit teststest_phase_event.py: Updated for new phasesTesting
Architecture
Scenarios Handled