Skip to content

Conversation

@phernandez
Copy link
Member

Summary

Fixes #390 - Prevents duplicate section headers when using edit_note with replace_section operation.

When LLMs include the section header in the content parameter (which they shouldn't, but sometimes do), the operation now detects and strips the duplicate header before appending the content.

Changes

  • Modified replace_section_content() in entity_service.py to strip duplicate headers
  • Added regression test test_edit_entity_replace_section_strips_duplicate_header()
  • Updated docstring to clarify expected behavior

Test Plan

  • Existing tests pass
  • New regression test passes
  • Manual testing with LLM confirms fix

Generated with Claude Code

Fixes #390

When using edit_note with replace_section operation, if the LLM includes
the section header in the content parameter (e.g., "## Testing\nNew content"),
it would create duplicate headers in the resulting document.

This fix detects when new_content starts with the same header that's being
replaced and strips it before appending the content.

Changes:
- Modified replace_section_content() to strip duplicate headers from new_content
- Added test test_edit_entity_replace_section_strips_duplicate_header()
- Updated docstring to clarify content should not include the header

Co-authored-by: Drew Cain <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 22, 2025

Claude finished @phernandez's task —— View job


PR Review: fix: Strip duplicate headers in edit_note replace_section

Reviewing PR #396 against Basic Memory team checklist...

Todo List

  • Read changed files to understand implementation
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture alignment
  • Review security & performance
  • Provide summary and recommendations

Review Summary

This is a well-executed defensive fix for a real-world LLM behavior issue. The implementation is clean, well-tested, and follows all project conventions. ✅

Code Quality & Standards ✅

Strengths:

  • Clean implementation: The duplicate header detection logic in entity_service.py:638-643 is straightforward and easy to understand
  • Type annotations: Fully compliant with Python 3.12+ standards
  • Line length: All lines under 100 characters ✅
  • No commented code: Clean, production-ready code
  • Follows conventions: Matches Basic Memory's coding style (CLAUDE.md compliant)

Code Review:

# src/basic_memory/services/entity_service.py:638-643
new_content_lines = new_content.lstrip().split("\n")
if new_content_lines and new_content_lines[0].strip() == section_header.strip():
    # Remove the duplicate header line
    new_content = "\n".join(new_content_lines[1:]).lstrip()

The logic is sound and handles edge cases well:

  • Uses .lstrip() before splitting to handle leading whitespace
  • Strips both sides when comparing headers (handles spacing variations)
  • Re-applies .lstrip() after rejoining to maintain clean formatting

Testing & Documentation ✅

Test Coverage:

  • Regression test added: test_edit_entity_replace_section_strips_duplicate_header() (lines 1279-1323)
  • Comprehensive test: Validates both the fix (no duplicate headers) AND existing behavior (section replacement still works)
  • Clear test scenario: Mirrors real LLM behavior patterns

Documentation:

  • Updated docstring: Clarifies expected behavior at line 626
  • Inline comments: Excellent context about why this fix is needed (LLM behavior)
  • Issue reference: Properly references [BUG] edit_note creates duplicate headings #390 in code comments

Architecture Alignment ✅

Preserves Basic Memory Principles:

  • Atomic operation: No side effects, pure content transformation
  • Local-first: File-based operation, no network dependencies
  • Knowledge graph integrity: Doesn't affect relations or observations
  • MCP tool compatibility: Defensive fix improves MCP tool robustness

Design Pattern:
This follows the defensive programming pattern appropriate for LLM interactions - the tool gracefully handles incorrect LLM behavior rather than failing or producing malformed output.

Security & Performance ✅

Security:

  • ✅ No user input vulnerabilities
  • ✅ No injection risks
  • ✅ No secrets or credentials

Performance:

  • ✅ Minimal overhead (single string split and comparison)
  • ✅ O(n) complexity where n = content length (same as before)
  • ✅ No memory leaks or excessive allocations

Minor Observations

Edge Cases to Consider (already handled):

  1. ✅ Empty content after stripping header - handled by .lstrip() and list slicing
  2. ✅ Headers with varying whitespace - handled by .strip() on both sides
  3. ✅ Case sensitivity - uses exact match (appropriate for markdown headers)
  4. ✅ Multiple headers in content - only strips first line (correct behavior)

Potential Enhancement (not required):
The current implementation only strips the header if it's on the first line after lstrip. Consider this edge case:

content = "  \n  \n## Testing\nNew content"  # Multiple blank lines before header

Current behavior: Won't strip (header not on first line after lstrip)
Expected: Might want to strip even with blank lines above

However, this is likely acceptable because:

  • LLMs typically generate ## Testing\nContent, not with leading blank lines
  • The current behavior is safer (less aggressive stripping)
  • Can be addressed in a future PR if it becomes an issue

Test Plan Verification

From PR Description:

  • Existing tests pass - Needs human verification (run just test)
  • New regression test passes - Code review confirms test is well-written
  • Manual testing with LLM confirms fix - Needs human verification

Checklist Results

Code Quality & Standards ✅

  • Follows Basic Memory's coding conventions in CLAUDE.md
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices (N/A - no DB changes)
  • FastAPI and Typer conventions followed (N/A - service layer only)
  • 100-character line length limit maintained
  • No commented-out code blocks

Testing & Documentation ✅

  • Unit tests for new functions/methods
  • Integration tests for new MCP tools (N/A - existing tool enhanced)
  • Test coverage for edge cases
  • Documentation updated (README, docstrings)
  • CLAUDE.md updated if conventions change (N/A - no convention changes)

Basic Memory Architecture ✅

  • MCP tools follow atomic, composable design
  • Database changes include Alembic migrations (N/A - no schema changes)
  • Preserves local-first architecture principles
  • Knowledge graph operations maintain consistency
  • Markdown file handling preserves integrity
  • AI-human collaboration patterns followed

Security & Performance ✅

  • No hardcoded secrets or credentials
  • Input validation for MCP tools
  • Proper error handling and logging
  • Performance considerations addressed
  • No sensitive data in logs or commits

Recommendation

APPROVE

This PR is production-ready and should be merged after:

  1. ✅ Existing tests pass (run just test)
  2. ✅ Manual verification with LLM interaction (optional but recommended)

The implementation is exactly the right approach for this issue - defensive, well-tested, and minimal impact on existing functionality.

--- | Branch: claude/issue-390-20251021-1529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] edit_note creates duplicate headings

2 participants