Feature/mistral support and cli enhancements #156

richbodo · 2025-11-16T12:16:14Z

Contributor License Statement

By submitting this pull request, I confirm that:

I have the rights to contribute this code.
I license this contribution under the GPLv3 license.
I grant a patent license consistent with the GPLv3 terms.

Signed-off-by: (type your GitHub handle)

Note

Updates default LLM selection to prefer gpt-oss:20b and supported models, expands CLI/TUI usage docs, and adds a spec plus a repro test for the llama3-8b-local communication issue.

LLM selection:
- Update prt_src/llm_model_registry.get_default_model() to prioritize gpt-oss:20b, then officially supported models, then any available model.
Docs:
- README and docs/DEV_SETUP.md significantly expand CLI/TUI usage, debugging commands, and daily workflow, clarifying TUI as default and adding AI testing commands (prt-debug-info, list-models, chat examples).
Testing/Specs:
- Add specs/bug_llama3_8b_local_communication_error.md detailing an Ollama 400 error with llama3-8b-local and steps to fix/validate.
- Introduce test_llama3_communication.py to reproduce and compare model communication behavior.
Scripts:
- Minor cleanup in init.sh (mac/Linux setup messaging).

^{Written by Cursor Bugbot for commit 9efc8ed. This will update automatically on new commits. Configure here.}

- Replace vague 'ALL SQL queries require confirm=true' with explicit parameter specification - Add concrete example showing exact execute_sql tool call syntax - Clarify that tool rejects ANY SQL query without the confirm=true parameter - Specify boolean data type and automatic execution behavior - Eliminates ambiguity for different LLM providers about confirmation requirements

… refactoring, enhanced CLI.

Phase 1: Extract Help System - Create centralized help system in cli_modules/help.py - Move help content to markdown file for single source of truth - Update cli.py to use new help module - Maintain backward compatibility Phase 2: Extract Services - Extract 5 service modules with zero UI dependencies: * export.py: Export and JSON serialization services * images.py: Profile image export services * directory.py: Interactive directory generation * import_google.py: Google import services * llm.py: LLM chat integration services - Update imports in cli.py to use extracted services - Remove duplicate function definitions - All services are now independently testable This refactoring improves maintainability and separation of concerns while preserving all existing functionality.

… fixes This commit completes the CLI modularization refactoring and resolves critical compatibility issues that were causing test failures. CLI Modularization (Phases 1-3): - Split monolithic CLI into specialized modules - Maintained backward compatibility layer - Added enhanced functionality with new flags - Fixed missing imports and test compatibility issues Key Features: - --prt-debug-info flag for system diagnostics - --classic flag for forced CLI mode - Complete modular architecture with 26+ new files - Zero test regressions, all functionality preserved Validation: All CLI tests pass (8/8), debug info tests pass (13/13)

- Remove custom help flag handling in favor of Typer's native --help - Eliminate print_custom_help() import and manual help processing - Streamline command interface by letting Typer handle help automatically - Reduces code complexity while maintaining same functionality This is a cleanup following the CLI modularization in PR #153.

- Add specific RuntimeError handling for critical errors (permissions, disk space) - Provide informative stderr warnings when credential setup fails - Differentiate between critical errors and unexpected errors - Improve user guidance about potential system issues - Continue gracefully while preserving error context This improves robustness of the credential system introduced in the CLI modularization, ensuring users get clear feedback about configuration issues.

Security improvements: - Replace random.choices() with secrets.choice() for Mistral tool call ID generation - Use cryptographically secure random number generation for security-sensitive operations Robustness improvements: - Improve base URL processing to avoid corrupting domains like v1.example.com - Use safer string manipulation for /v1 suffix removal - Add explicit imports for security modules These changes strengthen the security posture of the Mistral LLM integration added in PR #153 and improve reliability of URL handling.

prt_src/cli_modules/commands/main.py

Use removesuffix() instead of rstrip() for URL handling to fix potential issues with domains containing 'v1' in their name. This aligns with the URL handling improvements from main branch.

chatgpt-codex-connector · 2025-11-16T12:20:49Z

💡 Codex Review

prt/prt_src/llm_llamacpp.py

Lines 27 to 79 in ab7ad25

    
           class LlamaCppLLM(BaseLLM): 
        
               """LlamaCpp LLM client with tool calling support.""" 
        
               def __init__( 
        
                   self, 
        
                   api: PRTAPI, 
        
                   model_path: Optional[str] = None, 
        
                   n_ctx: int = 4096, 
        
                   n_gpu_layers: int = 0, 
        
                   n_threads: Optional[int] = None, 
        
                   timeout: Optional[int] = None, 
        
                   config_manager: Optional[LLMConfigManager] = None, 
        
               ): 
        
                   """Initialize LlamaCpp LLM client. 
        
                   Args: 
        
                       api: PRTAPI instance for database operations 
        
                       model_path: Path to .gguf model file 
        
                       n_ctx: Context window size (default: 4096) 
        
                       n_gpu_layers: Number of layers to offload to GPU (default: 0 = CPU only) 
        
                       n_threads: Number of CPU threads (default: None = auto-detect) 
        
                       timeout: Request timeout in seconds 
        
                       config_manager: LLMConfigManager instance. If None, loads config automatically. 
        
                   """ 
        
                   # Load configuration 
        
                   if config_manager is None: 
        
                       config_manager = LLMConfigManager() 
        
                   # Call parent constructor (gets tools automatically from registry) 
        
                   super().__init__(api, config_manager) 
        
                   # Use config values, falling back to explicit parameters 
        
                   self.model_path = model_path or config_manager.llm.model_path 
        
                   self.n_ctx = n_ctx if n_ctx != 4096 else getattr(config_manager.llm, "n_ctx", 4096) 
        
                   self.n_gpu_layers = ( 
        
                       n_gpu_layers if n_gpu_layers != 0 else getattr(config_manager.llm, "n_gpu_layers", 0) 
        
                   ) 
        
                   self.n_threads = n_threads or getattr(config_manager.llm, "n_threads", None) 
        
                   self.timeout = timeout if timeout is not None else config_manager.llm.timeout 
        
                   self.temperature = config_manager.llm.temperature 
        
                   # Validate model path 
        
                   if not self.model_path: 
        
                       raise ValueError("model_path is required for LlamaCpp provider") 
        
                   model_file = Path(self.model_path) 
        
                   if not model_file.exists(): 
        
                       raise FileNotFoundError(f"Model file not found: {self.model_path}") 
        
                   # Extract friendly model name for display (filename without extension) 
        
                   self.model = model_file.stem  # e.g., "Meta-Llama-3-8B-Instruct-Q4_K_M" 
        
                   # Initialize llama-cpp-python

Implement _get_model_name in LlamaCppLLM

After inheriting from BaseLLM, LlamaCppLLM no longer provides an implementation of _get_model_name. Because BaseLLM marks this method as @abstractmethod, attempting to instantiate LlamaCppLLM now raises TypeError: Can't instantiate abstract class ... with abstract method _get_model_name, so the llama-cpp backend becomes unusable. Add a concrete _get_model_name implementation that returns the loaded model’s identifier.

prt/prt_src/llm_tools.py

Lines 324 to 338 in ab7ad25

    
           def get_write_tool_names() -> Set[str]: 
        
               """Get set of tool names that perform write operations.""" 
        
               return { 
        
                   "add_tag_to_contact", 
        
                   "remove_tag_from_contact", 
        
                   "create_tag", 
        
                   "delete_tag", 
        
                   "add_note_to_contact", 
        
                   "remove_note_from_contact", 
        
                   "create_note", 
        
                   "update_note", 
        
                   "delete_note", 
        
                   "add_contact_relationship", 
        
                   "remove_contact_relationship", 
        
                   "execute_sql",  # Provider-specific implementation

Register execute_sql tool in shared registry

LLMToolRegistry.get_write_tool_names() advertises an execute_sql operation, but neither _create_read_tools nor _create_write_tools define a corresponding Tool. With the new BaseLLM initialization relying solely on this registry, any model that tries to call execute_sql receives Tool 'execute_sql' not found and SQL queries can no longer be executed via tool calling. Add a Tool definition for execute_sql (or remove it from the set) to restore the previous capability.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Fix README.md CLI examples from outdated 'python -m prt_src.cli' to current 'python -m prt_src' structure - Add comprehensive CLI testing and debugging examples including: * prt-debug-info for system health checks * CLI chat mode for testing AI functionality * Model testing and selection commands - Update docs/DEV_SETUP.md with current CLI patterns and troubleshooting workflows - Add dedicated CLI Testing & Debugging Commands section with practical examples - Modernize interface descriptions to reflect TUI as default with CLI as power-user tool - Include specific examples mentioned for development: chat testing, directory generation, system diagnostics The CLI is now properly documented as an excellent testing and debugging tool for developers.

Changes the model selection logic in OllamaModelRegistry.get_default_model() to prioritize gpt-oss:20b as the primary default, followed by any officially supported models, then any available models. New priority order: 1. gpt-oss:20b (preferred default for tool calling support) 2. Any officially supported model (mistral:7b-instruct, llama3:8b, etc.) 3. Any available model (fallback) This ensures users get the recommended model for PRT's tool calling features when no specific model is requested via command line.

- Remove outdated SQLCipher comments from init.sh (Issue #41 migration complete) - Add bug spec for llama3-8b-local communication error with Ollama - Add test script to reproduce llama3 communication issues for debugging These changes support the ongoing LLM model registry and Ollama integration work.

claude · 2025-11-16T13:16:14Z

Code Review - PR #156

I've reviewed this pull request and have the following feedback:

✅ Strengths

Clear Documentation Improvements: The README and DEV_SETUP updates significantly improve discoverability of CLI commands and testing workflows. The new command tables are well-organized and helpful.
Logical Model Priority: Changing the default model selection to prefer gpt-oss:20b and then fall back to supported models is a sensible approach that aligns with the existing supported models registry.
Consistent Docstring Updates: The docstring for get_default_model() has been properly updated to reflect the new priority order.

⚠️ Concerns & Recommendations

1. Potential Test Failures

The existing test test_get_default_model_first_available in tests/unit/test_llm_model_registry.py may need updating. The test fixture creates models in this order:

llama3:8b
gpt-oss:20b
mistral:7b

With the new logic, the test should expect gpt-oss:20b to be returned (priority 1), not just "any cached model." The test assertion at line 517 is too loose:

# Current (may be incorrect now):
assert default in cached_models

# Should probably be:
assert default == "gpt-oss:20b"

Recommendation: Add a specific test case that verifies the priority order works correctly when gpt-oss:20b is available.

2. Dictionary Iteration Order

At line 419, the code iterates over supported_models dictionary:

for model_name in supported_models:

While Python 3.7+ guarantees dict insertion order, the actual priority might not be what you expect. The SUPPORTED_MODELS dict in llm_supported_models.py has gpt-oss:20b first, but if you want guaranteed priority, consider using a list or explicitly ordering by support_status.

Recommendation: Either document that the dictionary order matters, or be more explicit about prioritization logic (e.g., prefer "official" status models over "experimental").

3. Import Placement

The import of get_supported_models is placed inside the function at line 404:

def get_default_model(self) -> Optional[str]:
    from prt_src.llm_supported_models import get_supported_models

While this avoids circular imports, it's inconsistent with the module's import style (see lines 8-19 where all imports are at the top).

Recommendation: If this is intentional to avoid circular imports, add a comment explaining why. Otherwise, move it to the top-level imports.

4. Missing Test Coverage for New Behavior

I don't see tests that verify:

The new priority logic selects gpt-oss:20b when available
The fallback to supported models works correctly
The integration with get_supported_models() functions properly

Recommendation: Add test cases like:

def test_get_default_model_prefers_gpt_oss():
    """Test that gpt-oss:20b is preferred when available."""
    ...

def test_get_default_model_fallback_to_supported():
    """Test fallback to supported models when gpt-oss:20b unavailable."""
    ...

📝 Minor Documentation Issues

README.md Line 91: "AI-Powered Chat (Great for Testing & Development)" - Consider adding a note that Ollama must be running with models installed for these commands to work.
DEV_SETUP.md Line 70: The "Notes" column could clarify that prt-debug-info is the first command to run when troubleshooting (it's mentioned in Pro Tips but could be in the table too).

🔒 Security & Performance

✅ No security concerns identified
✅ No performance issues (caching is already in place)
✅ No breaking changes to public APIs

✅ Overall Assessment

This is a solid improvement to the codebase. The documentation updates are valuable, and the model selection logic is clearer. However, the existing tests may need updating to match the new behavior, and additional test coverage would strengthen confidence in the priority fallback logic.

Recommendation:

Update/add tests for the new default model selection logic
Consider clarifying the dictionary iteration order behavior
Run the full test suite to verify no regressions

Would you like me to help write the additional test cases?

claude · 2025-11-16T13:18:03Z

Code Review: Feature/mistral support and cli enhancements

Overall Assessment

This PR makes several valuable improvements to LLM model selection, documentation, and testing infrastructure. The changes are well-structured and align with the project's goals. However, there are some concerns around code organization, test coverage, and potential issues that should be addressed.

Summary:

✅ Documentation improvements are excellent and significantly improve usability
✅ Model selection logic is improved with better defaults
⚠️ Test file placement violates project structure conventions
⚠️ Spec file is extensive but describes unimplemented work
⚠️ Import optimization in model registry could cause issues

Detailed Feedback

1. Code Quality & Best Practices

✅ Good: Model Selection Logic (prt_src/llm_model_registry.py:393-426)

The new get_default_model() implementation is a clear improvement:

Well-documented priority system (gpt-oss:20b → supported models → any model)
Good use of sets for O(1) lookups (model_names = {model.name for model in models})
Appropriate logging at each decision point

Minor suggestion: The import at line 404 is inside the method:

from prt_src.llm_supported_models import get_supported_models

While this avoids circular imports, it's unconventional. Consider:

Moving to module-level if no circular dependency exists
If circular dependency is unavoidable, add a comment explaining why:

# Import here to avoid circular dependency between registry and supported_models
from prt_src.llm_supported_models import get_supported_models

⚠️ Concern: llama3:8b Still Experimental (prt_src/llm_supported_models.py:87-102)

The PR description mentions "mistral support" but doesn't actually change llama3:8b from "experimental" to "official". The spec file specs/bug_llama3_8b_local_communication_error.md describes a plan to fix llama3-8b-local, but no implementation is included.

Recommendation: Either:

Implement the llama3 fixes described in the spec, OR
Remove the spec file and test file from this PR and create a separate issue/PR for llama3 support

2. Test Coverage & Testing Concerns

❌ Critical: Test File in Wrong Location (test_llama3_communication.py)

The file test_llama3_communication.py is placed at the repository root instead of the tests/ directory. This violates the project structure as documented in CLAUDE.md.

Per CLAUDE.md testing guidelines:

# Run all tests
./prt_env/bin/pytest tests/ -v

Required fix:

Move to tests/integration/test_llama3_communication.py
Add proper pytest markers (@pytest.mark.integration)
Follow existing test patterns in tests/integration/

⚠️ Concern: Test File Tests Broken Functionality

The test file appears to be a reproduction script for a bug, not a proper test:

It prints to stdout instead of using assertions
It returns True/False instead of asserting expected behavior
The llama3-8b-local model is not in the supported models list

Expected behavior:
Either this should be:

A test that validates the fix (but the fix isn't in this PR), OR
A diagnostic script in utils/ or tools/ folder

⚠️ Missing Test Coverage

No tests were added for the new get_default_model() logic. Given that this is critical functionality, tests should verify:

Returns gpt-oss:20b when available
Falls back to supported models
Falls back to any model
Returns None when no models exist

Example test structure:

def test_get_default_model_prefers_gpt_oss_20b(mock_ollama_models):
    registry = OllamaModelRegistry()
    # Mock ollama to return gpt-oss:20b and other models
    ...
    assert registry.get_default_model() == "gpt-oss:20b"

3. Documentation Quality

✅ Excellent: README.md Updates (README.md:43-141)

The documentation improvements are outstanding:

Clear examples of TUI vs CLI usage
Practical debugging commands (prt-debug-info, list-models)
Good structure with purpose-driven organization
Helpful for both users and developers

Minor suggestion: The AI chat examples could specify when to use which model:

# For general queries (best tool calling)
python -m prt_src --model gpt-oss-20b --chat "find friends"

# For lower resource systems
python -m prt_src --model mistral-7b-instruct --chat "count contacts"

✅ Excellent: DEV_SETUP.md Enhancements (docs/DEV_SETUP.md:43-226)

The developer workflow table is very helpful. Great addition for onboarding and daily development.

4. Spec File Concerns

⚠️ Large Spec for Unimplemented Work (specs/bug_llama3_8b_local_communication_error.md)

This 157-line spec describes:

A bug analysis
Root cause investigation
Step-by-step implementation tasks
Validation commands

However, none of this work is implemented in the PR.

Recommendation:

Option A (Preferred): Remove the spec file from this PR. Create a separate issue referencing the spec, then implement in a future PR.
Option B: Implement the spec in this PR (but this would make it much larger and change the scope significantly).

Per CLAUDE.md:

Do what has been asked; nothing more, nothing less.

The spec describes work that hasn't been done yet, creating confusion about what this PR actually delivers.

5. Security & Performance

✅ No Security Concerns

The changes don't introduce security issues:

No new external dependencies
No unsafe data handling
Documentation-only changes are safe

✅ Performance: Model Selection Optimization

The use of sets for model name lookups is good:

model_names = {model.name for model in models}  # O(1) lookups

However, get_supported_models() returns a Dict.copy() which is unnecessary if we only need the keys. Minor optimization opportunity:

supported_model_names = get_supported_models().keys()
for model_name in supported_model_names:
    if model_name in model_names:
        ...

6. Code Organization

✅ Good: Cleanup in init.sh

Removing obsolete SQLCipher comments (lines removed at 35 and 49) is good housekeeping.

Recommended Actions

Must Fix (Blocking)

Move test file: test_llama3_communication.py → tests/integration/test_llama3_communication.py
Clarify scope: Either implement llama3 fixes OR remove spec/test files and create separate issue

Should Fix (Recommended)

Add tests for get_default_model() priority logic
Document import placement in llm_model_registry.py line 404 if it's intentional to avoid circular imports
Update test file to use pytest patterns (assertions, markers) instead of print statements

Nice to Have

Add model selection guidance to README examples
Minor performance optimization in get_default_model()

Testing Checklist

Before merging, verify:

All existing tests pass: ./prt_env/bin/pytest tests/ -v
New test file runs successfully in tests/ directory
Documentation examples work as written
Default model selection works with various Ollama configurations
No regressions in existing model selection

Conclusion

This PR delivers valuable improvements to documentation and model selection. The main issues are:

Scope confusion: The spec describes llama3 fixes that aren't implemented
Test organization: Test file in wrong location and doesn't follow pytest conventions
Missing coverage: No tests for the new model selection logic

Recommendation: Address the test file placement and either implement the llama3 work or remove those files. The documentation improvements are excellent and ready to merge once these issues are resolved.

cursor · 2025-11-16T13:18:49Z

prt_src/llm_model_registry.py

+        for model_name in supported_models:
+            if model_name in model_names:
+                logger.debug(f"Found supported default model: {model_name}")
+                return model_name


Bug: Default Model Selection Ignores Documented Priority

The get_default_model() method claims to prioritize officially supported models but actually iterates through all models in the registry regardless of support status. The loop at line 419 iterates over supported_models dictionary keys without filtering by support_status, so it can return experimental or deprecated models before checking other official ones. This contradicts the documented priority and could select a lower-quality experimental model when better official alternatives exist.

cursor · 2025-11-16T13:18:49Z

test_llama3_communication.py

+import traceback
+
+# Add prt_src to path
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "prt_src"))


Bug: Module Import Path Conflict

The path manipulation adds the prt_src subdirectory to sys.path, but the imports use from prt_src.api and from prt_src.llm_factory, which would look for modules at prt_src/prt_src/api.py. The correct fix is either to add the parent directory instead (sys.path.insert(0, os.path.dirname(__file__))) or change imports to from api import PRTAPI and from llm_factory import create_llm.

richbodo added 7 commits November 16, 2025 02:08

Add support for mistral-7b-instruct, fixed up system prompt with some…

49ae705

… refactoring, enhanced CLI.

cursor bot reviewed Nov 16, 2025

View reviewed changes

prt_src/cli_modules/commands/main.py Show resolved Hide resolved

Resolve merge conflict in llm_ollama.py

91332de

Use removesuffix() instead of rstrip() for URL handling to fix potential issues with domains containing 'v1' in their name. This aligns with the URL handling improvements from main branch.

richbodo added 3 commits November 17, 2025 01:37

richbodo merged commit 113a030 into main Nov 16, 2025
3 of 7 checks passed

cursor bot reviewed Nov 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/mistral support and cli enhancements #156

Feature/mistral support and cli enhancements #156

Uh oh!

richbodo commented Nov 16, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Nov 16, 2025

Uh oh!

claude bot commented Nov 16, 2025

Uh oh!

Uh oh!

claude bot commented Nov 16, 2025

Uh oh!

cursor bot Nov 16, 2025

Uh oh!

cursor bot Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature/mistral support and cli enhancements #156

Feature/mistral support and cli enhancements #156

Uh oh!

Conversation

richbodo commented Nov 16, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributor License Statement

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Nov 16, 2025

💡 Codex Review

Uh oh!

claude bot commented Nov 16, 2025

Code Review - PR #156

✅ Strengths

⚠️ Concerns & Recommendations

1. Potential Test Failures

2. Dictionary Iteration Order

3. Import Placement

4. Missing Test Coverage for New Behavior

📝 Minor Documentation Issues

🔒 Security & Performance

✅ Overall Assessment

Uh oh!

Uh oh!

claude bot commented Nov 16, 2025

Code Review: Feature/mistral support and cli enhancements

Overall Assessment

Detailed Feedback

1. Code Quality & Best Practices

✅ Good: Model Selection Logic (prt_src/llm_model_registry.py:393-426)

⚠️ Concern: llama3:8b Still Experimental (prt_src/llm_supported_models.py:87-102)

2. Test Coverage & Testing Concerns

❌ Critical: Test File in Wrong Location (test_llama3_communication.py)

⚠️ Concern: Test File Tests Broken Functionality

⚠️ Missing Test Coverage

3. Documentation Quality

✅ Excellent: README.md Updates (README.md:43-141)

✅ Excellent: DEV_SETUP.md Enhancements (docs/DEV_SETUP.md:43-226)

4. Spec File Concerns

⚠️ Large Spec for Unimplemented Work (specs/bug_llama3_8b_local_communication_error.md)

5. Security & Performance

✅ No Security Concerns

✅ Performance: Model Selection Optimization

6. Code Organization

✅ Good: Cleanup in init.sh

Recommended Actions

Must Fix (Blocking)

Should Fix (Recommended)

Nice to Have

Testing Checklist

Conclusion

Uh oh!

cursor bot Nov 16, 2025

Choose a reason for hiding this comment

Bug: Default Model Selection Ignores Documented Priority

Uh oh!

cursor bot Nov 16, 2025

Choose a reason for hiding this comment

Bug: Module Import Path Conflict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richbodo commented Nov 16, 2025 •

edited by cursor bot

Loading