Implementation context generation #12

tfidfwastaken · 2025-09-19T13:34:19Z

Summary

Add optional implementation-context block to generated stories.
New --repo CLI flag wires a local repo to context generation.
Rename story_machine → story_generator and update imports.
Enable Ruff rule BLE001 to forbid broad exceptions.
Ignore macOS .DS_Store.

Details

CLI
- get_context_enriched_stories(...) now accepts repo_path: str | None.
- When --repo is provided, stories include an <implementation_context> block rendered by render_context_enriched_story(...).
- Graceful fallback on repo errors with status unreadable.
Implementation engine
- New storymachine/implementation_context.py:
  - ImplementationContext dataclass with statuses: ok | none_identified | unreadable | disabled.
  - Tool loop using OpenAI Responses API with tools:
    - list_paths(directory)
    - read_file(path)
    - emit_implementation_context(files, notes)
  - Heuristics limit traversal depth and ignore common build/cache dirs.
Module rename
- story_machine.py → story_generator.py. All imports updated.
Quality
- pyproject.toml: [tool.ruff] lint.select = ["BLE001"].
- .gitignore: add .DS_Store.

Tests

New tests/test_implementation_context.py.
tests/test_cli.py:
- Verifies block inclusion when --repo is set.
- Verifies --repo is forwarded.
Rename tests to match module: test_story_generator.py.
Mocks cover tool-call flow and error handling.

Usage

storymachine --prd path/to/prd.md --tech-spec path/to/spec.md --target-dir out/
# with repo context
storymachine --prd prd.md --tech-spec spec.md --target-dir out/ --repo /path/to/repo

Renames the `story_machine` module to `story_generator` to better reflect its purpose. Updates all imports and references to the module across the codebase, including in the CLI and tests.

claude · 2025-09-19T13:37:21Z

Code Review: StoryMachine Implementation

I've completed a thorough review of the StoryMachine codebase. Here's my assessment:

✅ Strengths

Code Architecture & Quality

Clean, modular design with proper separation of concerns (CLI, story generation, implementation context)
Excellent use of type hints throughout the codebase - all functions have proper type annotations
Follows Python best practices with dataclasses and pydantic for configuration
Good error handling with specific exception catching (OSError, PermissionError, JSONDecodeError)
Thread-safe spinner implementation with proper cleanup

Development Practices

Comprehensive CI/CD pipeline with linting (ruff), type checking (pyright), and testing
Proper adherence to CLAUDE.md guidelines using uv for package management
Well-structured test suite with fixtures and proper mocking
Clear project documentation and README

🔍 Areas for Improvement

1. Security Considerations

API Key Handling: While using pydantic-settings is good, consider adding validation for the API key format
File Path Validation: In cli.py:98-104, the broad exception catch could mask security issues. Consider more specific error handling for repository access
Input Sanitization: The slugify() function properly sanitizes file names, but consider adding length limits to prevent overly long filenames

2. Performance Optimizations

Model Selection Logic: The supports_reasoning_parameters() function (story_generator.py:28-57) uses multiple string operations that could be optimized with a compiled regex or set operations
File I/O: Consider implementing async file operations for better performance when processing multiple stories
Repository Scanning: The implementation context generation could benefit from parallel file processing for large repositories

3. Test Coverage Gaps

Missing edge case tests for malformed PRD/tech spec content
No tests for concurrent story generation scenarios
Limited testing of error recovery paths in implementation context generation
Consider adding integration tests for the full CLI workflow

4. Minor Code Quality Issues

Line 121 in cli.py: # pyright: ignore[reportCallIssue] - This suppression should be addressed with proper typing
Exception handling at line 98 uses bare except Exception which violates CLAUDE.md guidelines about specific exception handling
The spinner function could benefit from being a separate utility module for reusability

5. Documentation & UX

Consider adding progress indicators for long-running operations
The error messages could be more informative (e.g., include expected formats)
Add validation for MODEL environment variable to ensure it's a supported model

🚀 Recommendations

Add input validation layer: Implement a validation module for PRD and tech spec files before processing
Implement retry logic: Add exponential backoff for OpenAI API calls to handle rate limits gracefully
Add logging: Implement structured logging for better debugging and monitoring
Consider caching: Cache processed stories to avoid redundant API calls for the same inputs
Add configuration validation: Validate the model name against supported models before making API calls

✨ Overall Assessment

This is a well-structured, production-ready codebase that follows modern Python best practices. The code is clean, properly typed, and includes comprehensive testing. The adherence to CLAUDE.md guidelines is excellent, particularly the use of uv for package management and proper formatting standards.

The main areas for improvement are around security hardening, performance optimization for large-scale usage, and expanding test coverage for edge cases. None of these are blocking issues, but addressing them would make the codebase even more robust.

Verdict: APPROVED with suggestions ✅

Great work on this implementation! The StoryMachine tool shows thoughtful design and solid engineering practices.

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

chatgpt-codex-connector · 2025-09-19T13:38:13Z

pyproject.toml

+[tool.ruff]
+lint.select = ["BLE001"]


[P1] Preserve existing Ruff lint set when enabling BLE001

The new [tool.ruff] section sets lint.select = ["BLE001"], which overrides Ruff’s default selection (E/F and other checks) and restricts linting to only broad-exception warnings. That effectively disables the rest of the linter and will let unused imports, syntax mistakes, etc. slip through, which seems unintended given the goal of merely enabling BLE001. Consider using extend-select or adding BLE001 to the existing selection instead of replacing it outright.

Useful? React with 👍 / 👎.

claude · 2025-10-02T17:45:27Z

PR Review: Implementation Context Generation

Thank you for this comprehensive implementation! Overall, the code is well-structured and the tests are thorough.

Strengths

Excellent test coverage - Tests cover all four status cases
Good error handling - Proper validation of repo paths with early failure
Security-conscious - Path traversal protection via _safe_join() and traversal depth limits
Clean separation - New functionality isolated in its own module
BLE001 enforcement - Good catch on broad exception handling

Potential Bugs & Issues

1. Exception Handling Inconsistency (cli.py:98-105)

The broad Exception catch violates BLE001 that you just enabled. Per CLAUDE.md, catch specific exceptions instead.

2. CLI Logic Duplication (cli.py:176-191)

Unnecessary if/else - the function already handles repo_path=None as default parameter.

3. Line Length - Check with ruff, some lines may exceed 120 chars

Security

Path traversal protection is good
File size limits prevent memory issues
Depth limits prevent infinite recursion
Consider adding .env explicitly to ignore list

Test Coverage

Strong coverage of status cases, tool loop, and errors. Consider adding tests for max_tool_calls limit and file truncation behavior.

Action Items

Must fix:

Run uv run --frozen ruff format .
Run uv run --frozen ruff check . --fix
Run uv run --frozen pyright
Fix broad Exception catch or document necessity
Remove duplicate CLI logic (lines 176-191)

Should consider:

Add test for max_tool_calls limit
Refactor generate_implementation_context (127 lines)

Overall

Solid PR with good architecture. Main concerns are CLAUDE.md compliance for line length and exception handling.

Recommendation: Request changes for must-fix items, then approve.

claude · 2025-10-02T19:56:25Z

PR Review: Implementation Context Generation

Thanks for this comprehensive PR! The implementation looks solid overall. Here is my detailed feedback:

Strengths

Well-structured implementation - Clean separation of concerns with the new implementation_context.py module
Comprehensive test coverage - Good tests covering all four status cases
Error handling - Proper exception handling with graceful degradation
Security considerations - Good path validation with _safe_join() to prevent directory traversal

Potential Bugs and Issues

1. CLI Conditional Logic Redundancy (cli.py:176-191)
The branching logic checking args.repo is None is redundant since get_context_enriched_stories already handles None as default.

2. Exception Handling Too Broad (cli.py:98)
Catching bare Exception could hide bugs. Consider more specific exceptions or add logging.

3. Assertion Without Context (implementation_context.py:42)
Using assert can fail with python -O. Use explicit error: if impl_ctx is None: raise ValueError(...)

4. Tool Call Loop Edge Cases (implementation_context.py:119-192)
Missing call_id handling could cause infinite loops. json.JSONDecodeError continues with empty args causing silent failures.

5. Ruff Configuration (pyproject.toml)
CRITICAL: lint.select replaces defaults instead of extending. Should use extend-select to avoid losing other checks.

Performance Considerations

File walking continues after max_results - use dirs[:] = [] to stop traversal
Large file reading loads entire file before truncation - read max_bytes directly in binary mode
Heuristic candidates could short-circuit earlier

Security

Strong security posture with path traversal protection, file size limits, and depth limits. Consider adding timeout for tool-calling loop to prevent runaway costs.

Test Coverage

Good coverage of main paths but missing edge cases: max_tool_calls exhaustion, permission errors, malformed JSON, path traversal attempts, and integration tests.

Code Style

Mostly compliant with CLAUDE.md. Missing return type on _build_impl_prompt.

Recommendations Priority

HIGH: Fix Ruff config, replace assertion, add call_id validation
MEDIUM: Improve file reading performance, add timeout, narrow exceptions
LOW: Add edge case tests, optimize file walking

Overall Assessment

Quality: 8/10 | Security: 9/10 | Testing: 7/10 | Performance: 7/10

Solid implementation ready for merge after addressing high-priority issues, especially the Ruff configuration.

tfidfwastaken added 2 commits September 19, 2025 19:01

refactor: Rename story_machine module to story_generator

83bc854

Renames the `story_machine` module to `story_generator` to better reflect its purpose. Updates all imports and references to the module across the codebase, including in the CLI and tests.

Add implementation context support

afc7786

chatgpt-codex-connector bot reviewed Sep 19, 2025

View reviewed changes

tfidfwastaken closed this Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation context generation #12

Implementation context generation #12

Uh oh!

tfidfwastaken commented Sep 19, 2025

Uh oh!

claude bot commented Sep 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Sep 19, 2025

Uh oh!

claude bot commented Oct 2, 2025

Uh oh!

claude bot commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implementation context generation #12

Implementation context generation #12

Uh oh!

Conversation

tfidfwastaken commented Sep 19, 2025

Summary

Details

Tests

Usage

Uh oh!

claude bot commented Sep 19, 2025

Code Review: StoryMachine Implementation

✅ Strengths

🔍 Areas for Improvement

🚀 Recommendations

✨ Overall Assessment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Oct 2, 2025

PR Review: Implementation Context Generation

Strengths

Potential Bugs & Issues

Security

Test Coverage

Action Items

Overall

Uh oh!

claude bot commented Oct 2, 2025

PR Review: Implementation Context Generation

Strengths

Potential Bugs and Issues

Performance Considerations

Security

Test Coverage

Code Style

Recommendations Priority

Overall Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants