Workflows #13

ssrihari · 2025-09-23T16:16:12Z

Implement a simple workflow that mimics a typical SDLC workflow for breaking down problems, and iteratively improving the units of work.

Workflow

Phase 1: Codebase Context

Retrieves repository file structure
Generates questions about the codebase based on PRD, tech spec, and repo
structure
Queries the repository to answer those questions
Stores the context for later use

Phase 2: Story Generation (Iterative)

Generates initial list of user story titles based on PRD, tech spec, and
codebase context
Shows story titles to user for approval
If rejected: takes user comments and regenerates stories (loops until approved)
If approved: moves to next phase

Phase 3: Story Detailing (Per Story, Iterative)

For each story in the approved list:
- Step 3a: Defines acceptance criteria for the story
- Step 3b: Enriches the story with additional context from PRD, tech spec, and
  repo context
- Shows the detailed story (title + ACs + context) to user
- If rejected: takes comments and revises both ACs and context (loops until
  approved)
- If approved: saves the detailed story and moves to next one

Phase 4: Final Output

Prints complete list of all stories with their acceptance criteria and enriched
context
Returns the final story list

Key Characteristics:

Two-level iteration: story list approval, then individual story detailing
Combines PRD, tech spec, and live codebase context throughout
User has veto power at both the story list level and individual story level

TODOs

Find a better hook for ruff check fix (perhaps pre-push / pre-commit)
Clean up duplicated request creation in follow up calls
Figure out confusion when there's mixed feedback for ACs and enriched context
Refactor logging to use module name instead of manual function/file context

Notes

Moving models to separate file so they can be used in any higher level code
Separate calling AI from domain logic
Extract tool call ritual - making another request with function call and response makes the response complete and allows continuation
Using OpenAI's conversations API to manage state with global state assumption (only 1 ongoing conversation per CLI run)
Structured logging pollutes code a bit right now, will tweak as needed with progress
story_machine.py has been superseded by current implementation
New implementations don't have tests yet
Currently using Claude Code SDK for repo questions, but can switch to something else in the future (preferably something that doesn't need local codebase)
Claude Code SDK is isolated in ai.py for easy implementation swap
CC CLI needs to be installed on user machine for this to work

- Moving them to another file so it can be used in any higher level code

- Separate calling ai from domain logic - Simplify parsing response using list comprehension - Extract tool call, might reuse it for other calls

- CLI calls into workflow - Removing the older flow temporarily, not generating files, no target dir - Extracting reasoning effort into env, and refactoring settings to have the file name specified inside config.py itself

- Extract spinner as an activity so it can be used only around parts of workflow - Rename FeedbackResponse and status appropriately

- Make another request to responses with the function call and response both. - Not doing this makes the response incomplete, and we can't continue it. - Note that the actual tool call implementation is just an empty string, because there is no tool, just that the tool call args are in json.

- This was removing imports - Need to find a better hook for this, perhaps pre-push / pre-commit

- Use OpenAI's conversations API to manage state - Use global state with the assumption that there will only ever be 1 ongoing conversation per CLI run - Only prompt to use the tool, nothing else.

- Don't enforce the story structure too much, be a bit flexible - Add basic t-shirt size estimates - Add a note about ordering stories - Add a note about cutting pieces for CRUD tasks

- Add prompt, activity and extension to workflow

- Aware that this pollutes the code a bit right now - Adding function and file context for debuggability, should ideally just use module name. Will refactor this later. - This is an init commit, will tweak logging as needed with progress

- Refactor ai.py to output reasoning, and then send it back - Refactor it to use types defined in the python library - And extract the response parsing into a function - Some of the request creation is duplicated in the follow up. want to clean that up sometime.

- Remove story_machine.py. It has been superseeded by current implementation. - CLI.py only has the basic cli setup. Workflow is extracted to its own. - New implementations don't have tests yet.

- Add these details along with ACs, and then ask the user for feedback. - Use the same prompt when iterating too. This makes it confusing when there's mixed feedback for ACs and enriched context. Need to figure this out.

- Given PRD and tech spec, get a list of relevant questions to ask about the codebase - Reuse the existing call_openai function, just without tool calls - Temporarily breaking the workflow until the next step of getting the codebase context is implemented

- Given repo and questions, get answers. Currently using claude code sdk. But, can switch to something else in the future. Preferably something that doesn't need a local codebase. Note that the SDK is wrapper over the CC CLI. CC CLI needs to be installed on user machine for this to work. - Isolate claude code sdk integration into ai.py. We should be able to swap out implementation easily enough. - Setup config and environment for claude code sdk. - Add CLI input for repo path, add to workflow input

- Continue the workflow with the added context of what's in the repo

claude · 2025-10-02T20:06:57Z

Pull Request Review: Workflows

Overview

This PR implements a workflow system for breaking down problems into user stories with an iterative improvement process. The implementation is substantial (1179 additions, 752 deletions) and introduces several new architectural patterns.

🎯 Code Quality & Best Practices

✅ Strengths

Good separation of concerns: activities.py, ai.py, types.py, and workflow.py provide clear module boundaries
Type hints present: Functions have type annotations as per CLAUDE.md requirements
Structured logging: Using structlog for better observability
Configuration management: Proper use of Pydantic settings

⚠️ Issues Found

Critical Issues

Missing Type Hints in Several Functions
- ai.py:102: getattr calls without proper type narrowing
- activities.py:109: Missing return type annotation for parse_text_from_response
Inconsistent Error Handling
- ai.py:267: Raises generic ValueError instead of specific exception
- Missing try-except blocks around file I/O operations in cli.py:33-35
- No error handling for OpenAI API failures in call_openai_api function
Line Length Violations
- activities.py:177: Line exceeds 120 chars (per CLAUDE.md)
- Multiple lines in ai.py exceed the limit

Code Quality Issues

Global State Management (ai.py:20)
```
conversation_id: Optional[str] = None
```
Global mutable state is problematic for testing and concurrent usage. Consider using a context manager or dependency injection pattern.

Exception Handling (ai.py:267)

raise ValueError("ANTHROPIC_API_KEY not found in .env file")

Per CLAUDE.md guidelines, should catch specific exceptions. Consider creating custom exception types:

class ConfigurationError(Exception):
    pass

if not settings.anthropic_api_key:
    raise ConfigurationError("ANTHROPIC_API_KEY not found")

Dynamic Attribute Setting (ai.py:97-100)

setattr(response, "_reasoning_items", reasoning_items)
setattr(response, "_function_calls", function_calls)

This bypasses type safety. Consider wrapping the Response object in a domain-specific class:

@dataclass
class ParsedResponse:
    response: Response
    reasoning_items: List[ResponseReasoningItem]
    function_calls: List[ResponseFunctionToolCall]

Spinner Implementation (activities.py:23-48)
- Uses daemon threads which can cause issues on shutdown
- The spinner writes to stderr without checking if it's a TTY
- Consider using a library like rich or checking sys.stderr.isatty()

🐛 Potential Bugs

Race Condition in Spinner (activities.py:40-47)
The stop.set() and t.join() sequence could have timing issues. The thread might not have finished writing when the cleanup starts.
Unclosed Resources (ai.py:149)
The OpenAI client is created but never explicitly closed. While Python's garbage collector will handle this, it's better practice to use context managers:
```
with OpenAI(api_key=settings.openai_api_key) as client:
    # use client
```
File Path Handling (workflow.py:33-35)
```
prd_content = Path(prd_path).read_text()
```
No validation that the file exists or is readable. Should handle FileNotFoundError, PermissionError, etc.
Empty Response Handling (activities.py:200)
```
return updated_stories[0] if updated_stories else story
```
Silently returns the original story if API fails. Should log this case or raise an error.

🚀 Performance Considerations

Synchronous API Calls in Loop (workflow.py:71-78)
The workflow processes stories sequentially. Since you mentioned wanting parallelization in the TODO doc, consider using asyncio.gather() for concurrent story processing:
```
enriched_stories = await asyncio.gather(
    *[enrich_story_async(story) for story in stories]
)
```
Repeated Settings Initialization
Settings() is called in multiple functions. Consider injecting it once at the top level.
Large Prompt Loading
get_prompt() reads files synchronously every time. Consider caching prompt templates.

🔒 Security Concerns

API Key Exposure
- Keys are properly loaded from .env ✅
- Environment variable is set in ai.py:270 which persists for the process lifetime - consider more scoped approach
Input Validation
- No validation of user inputs in get_human_input()
- File paths from CLI args aren't validated before use
- Consider sanitizing repo paths before passing to Claude SDK
Prompt Injection Risk
User comments are directly interpolated into prompts without sanitization. While OpenAI has protections, consider adding input validation.

🧪 Test Coverage

Critical Gap

No tests for the new implementation (as noted in PR description). This is a significant concern given the complexity:

activities.py: 284 lines, 0 tests
ai.py: 304 lines, 0 tests
workflow.py: 100+ lines, 0 tests

Required Tests (per CLAUDE.md)

Unit tests for each activity function
Mock tests for OpenAI/Anthropic API calls
Integration tests for the workflow
Error case tests (API failures, invalid inputs, etc.)

Suggest using pytest with fixtures:

@pytest.fixture
def mock_openai_client(monkeypatch):
    # Mock OpenAI responses
    pass

@pytest.mark.anyio  # As per CLAUDE.md for async tests
async def test_get_codebase_context(mock_openai_client):
    # Test implementation
    pass

📝 Documentation Issues

Missing Docstrings
- workflow.py:18: w1() function lacks docstring
- Several helper functions in activities.py need docs
Incomplete Type Documentation
- types.py: Enums and classes need docstrings explaining when to use each state
README Update
README was updated but doesn't explain the new workflow concept or show example usage

✨ Recommendations

High Priority

Add comprehensive test suite - This should block merging
Fix error handling - Add proper exception handling around all I/O and API calls
Remove global state - Refactor conversation_id to use dependency injection

Run formatters:

uv run --frozen ruff format .
uv run --frozen ruff check . --fix
uv run --frozen pyright

Medium Priority

Add input validation for file paths and user inputs
Implement proper resource cleanup for API clients
Add docstrings to all public functions
Consider extracting the spinner to a separate utility module

Low Priority

Refactor dynamic attributes to use proper domain classes
Add prompt template caching for performance
Consider using rich library for better CLI formatting

🎨 Style Notes

From CLAUDE.md:

✅ Using uv correctly
⚠️ Some lines exceed 120 chars
⚠️ Need to run uv run --frozen ruff format .
❌ Missing pytest configuration for anyio

📋 Before Merging Checklist

Add test suite with >80% coverage
Run and fix all formatting/linting issues
Add error handling for all I/O and API operations
Add docstrings to public APIs
Validate all file paths before use
Refactor global state
Update documentation with workflow examples
Address TODOs in PR description (especially the ruff check hook)

💭 Final Thoughts

This is a solid architectural foundation with good separation of concerns. The main concerns are:

Lack of tests (blocking issue)
Error handling gaps (important for production use)
Global state management (affects testability)

The workflow concept is well-designed and the code is generally readable. With tests and improved error handling, this will be a strong addition to the codebase.

Great work on the structured approach! Looking forward to seeing the tests and fixes. 🚀

claude · 2025-10-03T18:03:05Z

Code Review: Workflows PR

Overview

This PR introduces a significant refactor implementing a workflow-based approach. Changes: 1691 additions, 756 deletions.

Critical Issues

1. Global Mutable State - src/storymachine/ai.py:20

Global conversation_id breaks functional programming principles and causes issues with concurrent execution and testing.
Recommendation: Pass conversation_id as part of WorkflowInput or create a ConversationContext class.

2. Missing Test Coverage

Core workflow logic has zero test coverage, violating project guidelines.
Recommendation: Add unit and integration tests before merging.

3. Resource Leak - src/storymachine/logging.py:23

File handle never closed, causing descriptor leaks.
Recommendation: Use proper file rotation setup.

4. Unsafe Dynamic Attributes - src/storymachine/ai.py:100-103

Using setattr on OpenAI response objects violates immutability contract.
Recommendation: Create wrapper class instead.

Security Concerns

Environment Variable Mutation - src/storymachine/ai.py:276

Modifying os.environ at runtime can leak credentials and create race conditions.
Recommendation: Pass API keys directly to SDK.

API Key Validation Missing

No validation or redaction in logs.
Recommendation: Add field validators.

Other Issues

Missing error handling in answer_repo_questions
Broad exception handling instead of specific types
Type checking suppressions that could be fixed
Magic numbers should be in Settings
Sequential story processing could be parallelized
Prompt templates not cached

Must Fix Before Merge

Add comprehensive test coverage
Fix global conversation_id state
Fix logging file handle leak
Add error handling

Conclusion

Well-structured architecture but should NOT be merged without tests. Global state and resource leak also need fixing.

Estimated effort: Critical fixes 4-6h, test coverage 8-12h.

Happy to help implement fixes!

- nudging it to make parallel tool calls - not read the full codebase

claude · 2025-10-07T15:35:56Z

Code Review for PR #13: Workflows

This is a substantial refactoring that implements a workflow system for story generation. Overall, the architecture is clean with good separation of concerns.

Strengths

Good separation of concerns between AI logic, workflow orchestration, and activities
Proper use of type hints and Pydantic models
Well-implemented structured logging with context
Nice iterative workflow with human-in-the-loop validation

Critical Issues

1. Resource Leak in logging.py:23
The log file is opened but never closed. This will remain open for program lifetime.

2. Unsafe JSON Parsing in activities.py:96
No error handling for invalid JSON or missing keys. Will crash with unhandled exception. Should wrap in try/except per CLAUDE.md.

3. Global Mutable State in ai.py:20
Global conversation_id is problematic for testing, concurrent usage, and state management.

4. Missing Error Handling in activities.py:137-142
No exception handling for network errors, API failures, or invalid repo URLs. Per CLAUDE.md, should catch ConnectionError, TimeoutError.

5. Type Safety Issue in ai.py:69
Missing type hint for logger parameter.

Security Concerns

1. API Key Exposure in ai.py:276
Setting environment variables at runtime can expose secrets to child processes.

2. Logging Sensitive Data
Multiple locations log full API responses and prompts which may contain sensitive user data (ai.py:97, ai.py:303, activities.py:147).

3. Missing Input Validation
get_prompt doesn't validate kwargs which could allow template injection.

Performance Issues

1. Sequential Story Processing in workflow.py:64
Stories processed one-by-one. Could be parallelized using asyncio.gather for significant speedup.

2. Settings Recreated Every Call
Settings() instantiated repeatedly. Should be memoized per TODO in config.py:6.

Test Coverage Gap

Major Issue: No tests for new core modules:

activities.py (297 lines, 0 tests)
workflow.py (113 lines, 0 tests)
ai.py (304 lines, 0 tests)
types.py (47 lines, 0 tests)
logging.py (46 lines, 0 tests)

Per CLAUDE.md: "New features require tests". This is 807 lines of untested code.

Recommendations

High Priority:

Add comprehensive tests for all new modules
Fix resource leak in logging
Add proper exception handling for JSON parsing and network calls
Remove global mutable state
Validate inputs to prevent crashes

Medium Priority:
6. Implement Settings memoization
7. Fix line length violations (120 char limit)
8. Add missing type hints
9. Parallelize story processing
10. Review and redact sensitive data in logs

Summary

Well-structured refactoring with good architecture, but has significant gaps in error handling, testing, and resource management. The lack of tests for 807 lines of new code is particularly concerning.

Recommendation: Request changes to address critical bugs and add test coverage before merging.

claude · 2025-10-07T16:54:41Z

Code Review for PR #13: Workflows

Overview

This PR implements a significant refactoring that introduces a workflow-based architecture for breaking down PRDs and tech specs into user stories. The implementation is well-structured overall, but there are several areas that need attention before merging.

🔴 Critical Issues

1. Missing Tests for New Code

Issue: The PR introduces ~1700 lines of new code in activities.py, ai.py, workflow.py, types.py, and logging.py, but no corresponding unit tests
Files: src/storymachine/activities.py, src/storymachine/ai.py, src/storymachine/workflow.py, src/storymachine/logging.py
Impact: High - Zero test coverage for core workflow logic violates CLAUDE.md requirement: "New features require tests"
Recommendation:
- Add tests for parse_stories_from_response() - line 89 in activities.py
- Add tests for get_or_create_conversation() - line 48 in ai.py
- Add tests for supports_reasoning_parameters() - line 23 in ai.py
- Add tests for workflow state transitions in w1() - workflow.py:20
- Add tests for error handling in get_codebase_context() - activities.py:114

2. Global Mutable State

Issue: Global conversation_id variable in ai.py:20 creates shared state that persists across workflow runs
File: src/storymachine/ai.py:20
Impact: High - Multiple CLI runs in the same process will share conversation state, breaking isolation
Code:

# Global conversation state
conversation_id: Optional[str] = None

Recommendation: Move conversation state to a class or pass it through function parameters. Consider using a context manager or dependency injection.

3. Broad Exception Handling Missing

Issue: Multiple async functions and API calls lack proper exception handling
Files:
- activities.py:114 - get_codebase_context() has no try/except around ask-github call
- ai.py:265 - answer_repo_questions() has no error handling for SDK failures
Impact: Medium-High - Per CLAUDE.md: "Catch specific exceptions where possible"
Recommendation: Add specific exception handling for network errors, API failures, and JSON parsing issues

🟡 Code Quality Issues

4. Inconsistent Type Annotations

Issue: Missing return type hint on _create_and_parse_response() parameter logger
File: src/storymachine/ai.py:68-70
Code:

def _create_and_parse_response(
    client: OpenAI, params: dict, logger, log_prefix: str  # logger has no type hint
) -> Response:

Impact: Medium - Violates CLAUDE.md requirement: "Type hints required for all code"
Recommendation: Add logger: structlog.BoundLogger type hint

5. Hardcoded Magic Values

Issue: Multiple hardcoded values that should be configurable
Files:
- activities.py:141 - max_iterations=100 hardcoded
- ai.py:22 - Model version strings hardcoded in set
Impact: Low-Medium - Reduces flexibility and makes testing harder
Recommendation: Move to config or constants

6. Line Length Violations

Issue: Several lines exceed 120 character limit per CLAUDE.md
Files:
- activities.py:139 - 140+ characters
- ai.py:228-235 - Complex dict comprehension exceeds limit
Recommendation: Run uv run --frozen ruff format . and uv run --frozen ruff check . --fix

7. Docstring Quality

Issue: Several public functions have minimal docstrings that don't describe parameters or return values
Examples:
- get_codebase_context() - doesn't document the async nature or potential exceptions
- problem_break_down() - doesn't explain the comments parameter behavior
Impact: Low-Medium - Violates CLAUDE.md: "Public APIs must have docstrings"
Recommendation: Add comprehensive docstrings with Args, Returns, and Raises sections

🟢 Performance Considerations

8. Sequential Story Processing

Issue: Stories are processed sequentially in workflow.py:64-108, but could be parallelized
File: src/storymachine/workflow.py:64-108
Impact: Low - Performance issue for PRDs with many stories
Recommendation: Consider using asyncio.gather() to process multiple stories in parallel, though this may conflict with the interactive approval workflow

9. File I/O on Every Log

Issue: logging.py:23 opens log file in append mode on every log entry
File: src/storymachine/logging.py:23
Code:

logger_factory=structlog.WriteLoggerFactory(file=log_file.open("a"))

Impact: Low - May cause file handle exhaustion with heavy logging
Recommendation: Consider using a context manager or buffered writer

🔒 Security Concerns

10. API Key Environment Variable Manipulation

Issue: ai.py:276 directly sets os.environ["ANTHROPIC_API_KEY"]
File: src/storymachine/ai.py:276
Impact: Medium - Modifying environment can affect other code and creates side effects
Code:

os.environ["ANTHROPIC_API_KEY"] = settings.anthropic_api_key

Recommendation: Check if Claude SDK supports passing API key as parameter instead of relying on environment variables

11. Potential Prompt Injection

Issue: User comments are directly interpolated into prompts without sanitization
Files:
- activities.py:164 - comments parameter inserted directly into prompt
- Various prompt files receive unsanitized user input
Impact: Low-Medium - Malicious comments could manipulate AI behavior
Recommendation: Add comment validation/sanitization or use structured prompt formats

📊 Test Coverage

12. No Tests for New Modules

Missing Test Files:
- tests/test_activities.py - doesn't exist
- tests/test_ai.py - doesn't exist
- tests/test_workflow.py - doesn't exist
- tests/test_logging.py - doesn't exist
- tests/test_types.py - doesn't exist
Impact: Critical - Per CLAUDE.md: "New features require tests"
Recommendation: Achieve at least 70% coverage for new code before merging

✅ Positive Aspects

Good Separation of Concerns: AI logic, activities, and workflow are well separated
Structured Logging: Good use of structlog for observability
Type Safety: Most code has proper type hints (besides noted issues)
Clear Workflow Structure: The w1() workflow function is easy to follow
User Experience: Spinner feedback and progressive disclosure are well implemented
Documentation: CLAUDE.md guidelines are generally followed

📝 Recommendations Summary

Before Merging (Required):

✅ Add comprehensive unit tests for all new modules
✅ Fix global conversation_id state management
✅ Add exception handling for async API calls
✅ Fix type hint for logger parameter in ai.py:69
✅ Run formatters: uv run --frozen ruff format . && uv run --frozen ruff check . --fix

Nice to Have:

Move hardcoded values to config
Improve docstrings with Args/Returns/Raises
Consider parallelizing story processing
Remove os.environ mutation for API keys
Add input sanitization for user comments

Test Execution

Before merging, ensure all tests pass:

PYTEST_DISABLE_PLUGIN_AUTOLOAD="" uv run --frozen pytest
uv run --frozen pyright
uv run --frozen ruff check .

Great work on the architecture refactoring! The code is well-structured and the workflow approach is solid. Address the critical issues above and this will be ready to merge.

- pass the right flags in - read the right configs - update README

claude · 2025-10-07T17:04:15Z

PR Review: Workflows

This is a significant refactoring with excellent architectural improvements but several critical issues need addressing.

Critical Issues

1. Missing Exception Handling (activities.py:89-98, ai.py:72-73)

parse_stories_from_response() could fail on malformed JSON with no recovery
client.responses.create() doesn't catch network/API errors
get_codebase_context() has no error handling

Per CLAUDE.md, catch specific exceptions (json.JSONDecodeError, ConnectionError, etc.)

2. Resource Leaks (logging.py:23)
File handle never closed - will leak file descriptors

3. Type Safety Issues (ai.py:68-70, activities.py:89)
Missing type hints violate CLAUDE.md requirements:

_create_and_parse_response() needs Dict[str, Any]
logger parameters missing types
parse_stories_from_response() response param untyped

4. Global Mutable State (ai.py:19-20)
Global conversation_id creates thread-safety issues. Use context manager or singleton.

Security Concerns

1. Sensitive Data Logging (ai.py:91-98, 174-185)
Logging full response objects may expose API keys/sensitive content

2. Environment Variable Injection (ai.py:276)
Direct os.environ manipulation could affect other code

3. Token Validation (activities.py:136-141)
No validation that tokens exist - will fail silently if None

Code Quality

1. Line Length Violations
Multiple lines exceed 120 chars. Run: uv run --frozen ruff format . && uv run --frozen ruff check . --fix

2. Incomplete Docstrings
Many functions missing Args, Returns, Raises sections per CLAUDE.md

3. Magic Numbers (activities.py:148, ai.py:22)
Hard-coded values (max_iterations=100) should be constants

4. Code Duplication (ai.py:167-172, 214-219)
Reasoning parameter setup duplicated - extract helper

Test Coverage

Major concern: Zero tests for new modules (activities.py, ai.py, workflow.py, types.py)

250 lines deleted from test_cli.py
test_story_machine.py deleted entirely (172 lines)

Need unit tests with mocked API calls and error path testing.

Positive Aspects

Excellent separation: activities, AI layer, workflow orchestration
Good use of structlog for observability
Clean dataclasses and enums in types.py
Great UX with spinner and iterative feedback
Nice reasoning summaries integration

Action Items

Must Fix:

Add exception handling per CLAUDE.md
Fix resource leak in logging.py
Add all type hints
Address global state
Add token validation
Add tests (aim >70% coverage)

Should Fix:

Run ruff format/check
Complete docstrings
Sanitize logging
Extract magic numbers

Nice to Have:

Parallelize story processing (asyncio.gather)
Memoize Settings
Clean up doc/workflow.md

Recommendation

Request changes - excellent architecture but needs hardening for production. Prioritize exception handling, resource management, and test coverage.

Code references: activities.py:89, ai.py:19, ai.py:68, logging.py:23, workflow.py:64

claude · 2025-10-09T17:38:01Z

Pull Request Review: Workflows

Overview

This PR introduces a significant refactoring that implements a workflow-based approach to story generation, replacing the previous single-pass implementation. The changes represent good progress toward an iterative, feedback-driven system.

✅ Strengths

Code Organization

Clean separation of concerns: ai.py, activities.py, workflow.py, and types.py provide clear boundaries
Good abstraction: OpenAI API interactions isolated in ai.py makes it easy to swap implementations
Structured logging: Using structlog with JSON output provides good observability

Type Safety

Proper use of type hints throughout the codebase
Dataclasses for structured data (Story, WorkflowInput, FeedbackResponse)
Type narrowing with hasattr() checks before attribute access

Workflow Design

Iterative feedback loops allow for quality improvement
Reasoning summaries provide transparency into AI decision-making
Conversation context management enables coherent multi-turn interactions

🔴 Critical Issues

1. Resource Leak in Logging (`logging.py:23`)

File handle opened but never closed. Use filename string instead of file handle or implement proper cleanup.

2. Missing Error Handling for External Dependencies

activities.py:144-149 - No error handling for ask() call (network, auth, API limits)
ai.py:72, 187, 238 - OpenAI API calls lack error handling
Per CLAUDE.md: Use specific exceptions where possible

3. Token Management (`activities.py:136-141`)

If user provides GitLab URL but no GITLAB_TOKEN, token is None causing ask() to fail. Add validation with clear error messages.

⚠️ Important Issues

Type Safety

activities.py:89 - Missing type annotation on response parameter
ai.py:68 - Logger parameter lacks type hint
Per CLAUDE.md: Type hints required for all code

Global State (`ai.py:19`)

conversation_id global mutable state makes testing difficult. Consider passing through workflow context or documenting thread-safety.

Security - Path Traversal (`ai.py:62-64`)

get_prompt() reads files without validating filename. Add checks to prevent path traversal attacks.

📊 Test Coverage Gaps

Per CLAUDE.md: "New features require tests"

Missing tests for:

❌ activities.py (304 lines)
❌ ai.py (262 lines)
❌ workflow.py (113 lines)
❌ types.py (47 lines)
❌ logging.py (46 lines)

Only config.py and cli.py have tests. Need async tests using anyio as specified.

🔒 Security Considerations

Secrets in Logs - Request params logged may contain sensitive PRD/tech spec data
Path Traversal - Template loading needs filename validation

🎯 Code Quality

Positive:
✅ Proper Optional usage
✅ Docstrings on public functions
✅ Context managers for resources
✅ Structured logging

Needs Improvement:

Magic numbers (max_iterations=100)
Duplicate code in workflow loops
Mixed concerns (I/O in activities)
Settings instantiation repeated (per TODO: should memoize)

✅ Recommendations

Before Merge (Critical)

Fix resource leak in logging.py
Add error handling for external API calls
Add type hints to all function parameters
Fix token validation for GitLab/GitHub
Add path traversal protection

High Priority

Comprehensive test coverage for new modules
Refactor global state
Memoize Settings
Add retry logic for APIs

Medium Priority

Extract duplicate workflow logic
Separate I/O from business logic
Make max_iterations configurable
Add workflow documentation

Overall Assessment

Solid architectural progress with good separation of concerns. Main gaps:

Test coverage - Critical gap
Error handling - Essential for production
Resource management - File leak needs fixing

Recommendation: Address critical issues before merge, prioritize tests in follow-up.

🤖 Generated with Claude Code

…ompt

claude · 2025-10-09T18:22:55Z

Code Review: Workflows PR

This PR implements a significant refactoring that introduces a workflow-based approach for breaking down PRDs into user stories. The implementation is well-structured with good separation of concerns, but there are several areas that need attention before merging.

Critical Issues

1. Missing Tests for Core Functionality
The PR notes "New implementations don't have tests yet" - this is a critical gap. According to CLAUDE.md, new features require tests. Need unit tests for parse_stories_from_response, problem_break_down, define_acceptance_criteria, enrich_context, and mock-based tests for call_openai_api.

2. Exception Handling Issues

activities.py:96 - json.loads can raise JSONDecodeError or KeyError - should catch specific exceptions
activities.py:132 - list_tree has no error handling for network operations
ai.py:62-63 - File I/O without error handling for OSError/PermissionError

3. Resource Management Issue (logging.py:23)
File handle is never closed - creates a resource leak. Should use context manager.

4. Global Mutable State (ai.py:19)
conversation_id as global state makes testing difficult. Consider dependency injection or at minimum add reset function.

5. Type Safety Issues
Missing type hints for logger parameter in ai.py:67 and response parameter in activities.py:89

Security Concerns

1. Token Exposure Risk
Ensure tokens are never logged and ask-github library is vetted. Consider rate limiting.

2. Input Validation Missing
No validation for repo_url, user comments length, or file content sizes.

Performance Considerations

1. Sequential Story Processing
Stories processed one at a time - intentional but slow for large PRDs. Document performance characteristics.

2. Redundant Settings Creation
Settings() instantiated multiple times. Per TODO in config.py:6, should be memoized.

3. Unnecessary API Calls
Follow-up API call pattern doubles costs - consider more efficient approach.

Code Quality

Positives:

Good separation of concerns
Comprehensive docstrings
Structured logging
Clean dataclass usage

Issues:

Line length violations (run ruff check)
Magic numbers (activities.py:157, logging.py:22, ai.py:132)
Spinner thread safety concerns
Missing README documentation for workflow

Recommendations

Must Fix Before Merge:

Add comprehensive test coverage
Fix exception handling
Fix resource leak in logging.py
Add missing type hints
Run ruff check and pyright

Should Fix Before Merge:

Implement Settings memoization
Add input validation
Fix global state issue
Document workflow in README
Add network error handling

Recommendation: Request changes - primarily for test coverage and critical bug fixes.

The code shows good architecture but lacks tests which is blocking. Once addressed, this will be solid foundation.

ssrihari · 2025-10-09T19:44:24Z

I'm ignoring tests for this work, because it's very likely going to be rewritten in another form, and most of this code will be discarded. The workflow was created as a stepping stone. Similarly for handling exceptions, currently any exception will loudly just stop the program, I think.

ssrihari added 26 commits September 23, 2025 10:02

Add the skeleton workflow I'm targetting as pseudocode with some notes.

3c37f61

Add the models / dataclasses that I need, and extract story

5b4291c

- Moving them to another file so it can be used in any higher level code

Setting up the field for workflows

9cea449

- Separate calling ai from domain logic - Simplify parsing response using list comprehension - Extract tool call, might reuse it for other calls

Get the harness working with the workflow

7b59d1c

- CLI calls into workflow - Removing the older flow temporarily, not generating files, no target dir - Extracting reasoning effort into env, and refactoring settings to have the file name specified inside config.py itself

Add test-data directory to gitignore

996beca

Ignore local claude settings

6abcdb6

Implement the feedback activity, and use it in the workflow

531a42a

- Extract spinner as an activity so it can be used only around parts of workflow - Rename FeedbackResponse and status appropriately

Upgrade openai client library to support conversations

1e4cee7

Don't ruff check fix on every edit

1733699

- This was removing imports - Need to find a better hook for this, perhaps pre-push / pre-commit

Iterate on the list of stories based on feedback from user.

98e937f

- Use OpenAI's conversations API to manage state - Use global state with the assumption that there will only ever be 1 ongoing conversation per CLI run - Only prompt to use the tool, nothing else.

Simplify the problem break down prompt.

108496a

- Don't enforce the story structure too much, be a bit flexible - Add basic t-shirt size estimates - Add a note about ordering stories - Add a note about cutting pieces for CRUD tasks

Use new prompt for iterating on stories

2e35e5e

Define acceptance criteria for stories one by one.

7719687

- Add prompt, activity and extension to workflow

Add notes and todos from using the tool

eb9993a

Add structured logging to help with debugging

02d3583

- Aware that this pollutes the code a bit right now - Adding function and file context for debuggability, should ideally just use module name. Will refactor this later. - This is an init commit, will tweak logging as needed with progress

Remove unused code, and clean up

0bbee33

- Remove story_machine.py. It has been superseeded by current implementation. - CLI.py only has the basic cli setup. Workflow is extracted to its own. - New implementations don't have tests yet.

Minor refactor to clean up the workflow to be a bit more readable

e623b8b

Add breakdown, reflection instructions to the iteration prompt as well.

a9bb21f

Enrich context for each story based on the PRD and Tech Spec

2185145

- Add these details along with ACs, and then ask the user for feedback. - Use the same prompt when iterating too. This makes it confusing when there's mixed feedback for ACs and enriched context. Need to figure this out.

Add repo context as source to most prompts, wherever appropriate

1f492b7

- Continue the workflow with the added context of what's in the repo

Fix tests, add missing repo flag

2970237

Minor updates to readme

e0ce033

ssrihari marked this pull request as ready for review October 2, 2025 20:05

Use ask-github as a library instead of claude agent SDK

aad46fb

ssrihari added 2 commits October 6, 2025 15:40

Tweak ask-github prompt

65bfa13

- nudging it to make parallel tool calls - not read the full codebase

Upgrade ask-github to support gitlab too

23c03fe

Update Readme to use github/gitlab url instead of filesystem path

a4722bd

Finish the gitlab integration

f3c1ca8

- pass the right flags in - read the right configs - update README

ssrihari added 2 commits October 9, 2025 13:29

Upgrade ask-github

942132c

Remove claude agent support

584a914

Use ask-github to get directory listing, and add that to breakdown pr…

e07958c

…ompt

ssrihari merged commit a0d7c5e into main Oct 9, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Workflows #13

Workflows #13

Uh oh!

ssrihari commented Sep 23, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 2, 2025

Uh oh!

claude bot commented Oct 3, 2025

Uh oh!

claude bot commented Oct 7, 2025

Uh oh!

claude bot commented Oct 7, 2025

Uh oh!

claude bot commented Oct 7, 2025

Uh oh!

claude bot commented Oct 9, 2025

Uh oh!

claude bot commented Oct 9, 2025

Uh oh!

ssrihari commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Workflows #13

Workflows #13

Uh oh!

Conversation

ssrihari commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Workflow

TODOs

Notes

Uh oh!

claude bot commented Oct 2, 2025

Pull Request Review: Workflows

Overview

🎯 Code Quality & Best Practices

✅ Strengths

⚠️ Issues Found

Critical Issues

Code Quality Issues

🐛 Potential Bugs

🚀 Performance Considerations

🔒 Security Concerns

🧪 Test Coverage

Critical Gap

Required Tests (per CLAUDE.md)

📝 Documentation Issues

✨ Recommendations

High Priority

Medium Priority

Low Priority

🎨 Style Notes

📋 Before Merging Checklist

💭 Final Thoughts

Uh oh!

claude bot commented Oct 3, 2025

Code Review: Workflows PR

Overview

Critical Issues

1. Global Mutable State - src/storymachine/ai.py:20

2. Missing Test Coverage

3. Resource Leak - src/storymachine/logging.py:23

4. Unsafe Dynamic Attributes - src/storymachine/ai.py:100-103

Security Concerns

Environment Variable Mutation - src/storymachine/ai.py:276

API Key Validation Missing

Other Issues

Must Fix Before Merge

Conclusion

Uh oh!

claude bot commented Oct 7, 2025

Code Review for PR #13: Workflows

Strengths

Critical Issues

Security Concerns

Performance Issues

Test Coverage Gap

Recommendations

Summary

Uh oh!

claude bot commented Oct 7, 2025

Code Review for PR #13: Workflows

Overview

🔴 Critical Issues

1. Missing Tests for New Code

2. Global Mutable State

3. Broad Exception Handling Missing

🟡 Code Quality Issues

4. Inconsistent Type Annotations

5. Hardcoded Magic Values

6. Line Length Violations

7. Docstring Quality

🟢 Performance Considerations

8. Sequential Story Processing

9. File I/O on Every Log

🔒 Security Concerns

10. API Key Environment Variable Manipulation

11. Potential Prompt Injection

📊 Test Coverage

12. No Tests for New Modules

✅ Positive Aspects

📝 Recommendations Summary

ssrihari commented Sep 23, 2025 •

edited

Loading

1. Resource Leak in Logging (`logging.py:23`)

3. Token Management (`activities.py:136-141`)

Global State (`ai.py:19`)

Security - Path Traversal (`ai.py:62-64`)