fix: Strip duplicate headers in edit_note replace_section #396

phernandez · 2025-10-22T19:41:17Z

Summary

Fixes #390 - Prevents duplicate section headers when using edit_note with replace_section operation.

When LLMs include the section header in the content parameter (which they shouldn't, but sometimes do), the operation now detects and strips the duplicate header before appending the content.

Changes

Modified replace_section_content() in entity_service.py to strip duplicate headers
Added regression test test_edit_entity_replace_section_strips_duplicate_header()
Updated docstring to clarify expected behavior

Test Plan

Existing tests pass
New regression test passes
Manual testing with LLM confirms fix

Generated with Claude Code

Fixes #390 When using edit_note with replace_section operation, if the LLM includes the section header in the content parameter (e.g., "## Testing\nNew content"), it would create duplicate headers in the resulting document. This fix detects when new_content starts with the same header that's being replaced and strips it before appending the content. Changes: - Modified replace_section_content() to strip duplicate headers from new_content - Added test test_edit_entity_replace_section_strips_duplicate_header() - Updated docstring to clarify content should not include the header Co-authored-by: Drew Cain <[email protected]>

github-actions · 2025-10-22T19:41:51Z

Claude finished @phernandez's task —— View job

PR Review: fix: Strip duplicate headers in edit_note replace_section

Reviewing PR #396 against Basic Memory team checklist...

Todo List

Read changed files to understand implementation
Review code quality & standards
Review testing & documentation
Review architecture alignment
Review security & performance
Provide summary and recommendations

Review Summary

This is a well-executed defensive fix for a real-world LLM behavior issue. The implementation is clean, well-tested, and follows all project conventions. ✅

Code Quality & Standards ✅

Strengths:

Clean implementation: The duplicate header detection logic in entity_service.py:638-643 is straightforward and easy to understand
Type annotations: Fully compliant with Python 3.12+ standards
Line length: All lines under 100 characters ✅
No commented code: Clean, production-ready code
Follows conventions: Matches Basic Memory's coding style (CLAUDE.md compliant)

Code Review:

# src/basic_memory/services/entity_service.py:638-643
new_content_lines = new_content.lstrip().split("\n")
if new_content_lines and new_content_lines[0].strip() == section_header.strip():
    # Remove the duplicate header line
    new_content = "\n".join(new_content_lines[1:]).lstrip()

The logic is sound and handles edge cases well:

Uses .lstrip() before splitting to handle leading whitespace
Strips both sides when comparing headers (handles spacing variations)
Re-applies .lstrip() after rejoining to maintain clean formatting

Testing & Documentation ✅

Test Coverage:

Regression test added: test_edit_entity_replace_section_strips_duplicate_header() (lines 1279-1323)
Comprehensive test: Validates both the fix (no duplicate headers) AND existing behavior (section replacement still works)
Clear test scenario: Mirrors real LLM behavior patterns

Documentation:

Updated docstring: Clarifies expected behavior at line 626
Inline comments: Excellent context about why this fix is needed (LLM behavior)
Issue reference: Properly references [BUG] edit_note creates duplicate headings #390 in code comments

Architecture Alignment ✅

Preserves Basic Memory Principles:

✅ Atomic operation: No side effects, pure content transformation
✅ Local-first: File-based operation, no network dependencies
✅ Knowledge graph integrity: Doesn't affect relations or observations
✅ MCP tool compatibility: Defensive fix improves MCP tool robustness

Design Pattern:
This follows the defensive programming pattern appropriate for LLM interactions - the tool gracefully handles incorrect LLM behavior rather than failing or producing malformed output.

Security & Performance ✅

Security:

✅ No user input vulnerabilities
✅ No injection risks
✅ No secrets or credentials

Performance:

✅ Minimal overhead (single string split and comparison)
✅ O(n) complexity where n = content length (same as before)
✅ No memory leaks or excessive allocations

Minor Observations

Edge Cases to Consider (already handled):

✅ Empty content after stripping header - handled by .lstrip() and list slicing
✅ Headers with varying whitespace - handled by .strip() on both sides
✅ Case sensitivity - uses exact match (appropriate for markdown headers)
✅ Multiple headers in content - only strips first line (correct behavior)

Potential Enhancement (not required):
The current implementation only strips the header if it's on the first line after lstrip. Consider this edge case:

content = "  \n  \n## Testing\nNew content"  # Multiple blank lines before header

Current behavior: Won't strip (header not on first line after lstrip)
Expected: Might want to strip even with blank lines above

However, this is likely acceptable because:

LLMs typically generate ## Testing\nContent, not with leading blank lines
The current behavior is safer (less aggressive stripping)
Can be addressed in a future PR if it becomes an issue

Test Plan Verification

From PR Description:

Existing tests pass - Needs human verification (run just test)
New regression test passes - Code review confirms test is well-written
Manual testing with LLM confirms fix - Needs human verification

Checklist Results

Code Quality & Standards ✅

Follows Basic Memory's coding conventions in CLAUDE.md
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices (N/A - no DB changes)
FastAPI and Typer conventions followed (N/A - service layer only)
100-character line length limit maintained
No commented-out code blocks

Testing & Documentation ✅

Unit tests for new functions/methods
Integration tests for new MCP tools (N/A - existing tool enhanced)
Test coverage for edge cases
Documentation updated (README, docstrings)
CLAUDE.md updated if conventions change (N/A - no convention changes)

Basic Memory Architecture ✅

MCP tools follow atomic, composable design
Database changes include Alembic migrations (N/A - no schema changes)
Preserves local-first architecture principles
Knowledge graph operations maintain consistency
Markdown file handling preserves integrity
AI-human collaboration patterns followed

Security & Performance ✅

No hardcoded secrets or credentials
Input validation for MCP tools
Proper error handling and logging
Performance considerations addressed
No sensitive data in logs or commits

Recommendation

APPROVE ✅

This PR is production-ready and should be merged after:

✅ Existing tests pass (run just test)
✅ Manual verification with LLM interaction (optional but recommended)

The implementation is exactly the right approach for this issue - defensive, well-tested, and minimal impact on existing functionality.

--- | Branch: claude/issue-390-20251021-1529

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Strip duplicate headers in edit_note replace_section #396

fix: Strip duplicate headers in edit_note replace_section #396

Uh oh!

phernandez commented Oct 22, 2025

Uh oh!

github-actions bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Strip duplicate headers in edit_note replace_section #396

Are you sure you want to change the base?

fix: Strip duplicate headers in edit_note replace_section #396

Uh oh!

Conversation

phernandez commented Oct 22, 2025

Summary

Changes

Test Plan

Uh oh!

github-actions bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix: Strip duplicate headers in edit_note replace_section

Todo List

Review Summary

Code Quality & Standards ✅

Testing & Documentation ✅

Architecture Alignment ✅

Security & Performance ✅

Minor Observations

Test Plan Verification

Checklist Results

Code Quality & Standards ✅

Testing & Documentation ✅

Basic Memory Architecture ✅

Security & Performance ✅

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Oct 22, 2025 •

edited

Loading