-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add Mistral AI provider with dedicated OCR support #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds Mistral as a new LLM provider alongside Gemini, OpenAI, Claude, and Grok. Includes dedicated OCR capabilities via mistral-ocr-latest model for high-accuracy document text extraction, plus Pixtral vision models for interactive image analysis. Key additions: - 7 Mistral models (mistral-large, mistral-small, pixtral-large, pixtral-12b, codestral, ministral-8b, mistral-ocr-latest) - 6 MCP tools (ask, analyze_image, process_ocr, list_models, server_info, test_connection) - Complete test coverage (unit + integration tests) - Model loader support for Mistral provider - Configuration and dependency updates The dedicated OCR model provides 2000 pages/min processing at $1 per 1,000 pages, ideal for research document extraction. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
@copilot resolve conflicts |
|
@williamjameshandley I've opened a new pull request, #107, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| file_content = file_path.read_bytes() | ||
| encoded_content = base64.b64encode(file_content).decode() | ||
| content_parts.append({ | ||
| "type": "image_url", | ||
| "image_url": f"data:image/jpeg;base64,{encoded_content}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve MIME type when attaching files
Non‑text files are always wrapped as image_url with a hard‑coded data:image/jpeg prefix (lines 88‑92), so passing a PNG, PDF, or any other binary via the ask tool sends bytes with the wrong MIME. Mistral will either reject the attachment or misinterpret it as a broken JPEG, meaning attached context is lost for any non‑text input. Derive the data URI from the actual file type instead of forcing JPEG.
Useful? React with 👍 / 👎.
| messages = [] | ||
| if system_instruction: | ||
| messages.append({ | ||
| "role": "system", | ||
| "content": system_instruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Carry conversation history into image analysis
The analyze_image adapter accepts a history parameter for agent memory but builds the messages list from scratch with only the current system and user entries (lines 210‑214), never appending the prior conversation. Any caller using agent_name to maintain context for vision queries will silently lose past turns, so the model responds without the expected memory. Append the history entries before sending the request.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive Mistral AI provider support to the MCP Handley Lab toolkit, introducing 7 models with specialized OCR capabilities alongside general-purpose and vision models. The implementation follows the established patterns from other providers (Gemini, OpenAI, Claude, Grok) with complete test coverage and proper configuration management.
Key Changes:
- New Mistral provider with 7 models including dedicated OCR model (mistral-ocr-latest)
- 6 MCP tools for text generation, image analysis, OCR processing, and management
- Complete test suite with unit and integration tests
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/mcp_handley_lab/llm/mistral/tool.py |
Core implementation of Mistral provider with tools for ask, analyze_image, process_ocr, list_models, server_info, and test_connection |
src/mcp_handley_lab/llm/mistral/models.yaml |
Model configurations for 7 Mistral models with pricing, capabilities, and metadata |
src/mcp_handley_lab/llm/mistral/__init__.py |
Package initialization file |
src/mcp_handley_lab/llm/model_loader.py |
Added Mistral provider support to model loading infrastructure |
src/mcp_handley_lab/common/config.py |
Added mistral_api_key configuration field |
tests/unit/test_mistral_unit.py |
Unit tests for model configuration and helper functions |
tests/integration/test_mistral_integration.py |
Integration tests for all Mistral tools with API calls |
pyproject.toml |
Updated version to 0.13.0, added mistralai>=1.9.0 dependency and mcp-mistral entry point |
PKGBUILD |
Version bump to 0.13.0 for Arch Linux packaging |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| # Generate session ID once at module load time | ||
| _SESSION_ID = f"_session_{os.getpid()}_{int(time.time())}" |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable _SESSION_ID is defined but never used in this file. The _get_session_id() function calls get_session_id(mcp) which generates the session ID dynamically. Consider removing this unused module-level variable to keep the code clean.
Note: Other providers (OpenAI, Claude) don't define this variable, while Gemini has it but it's also unused there.
| _SESSION_ID = f"_session_{os.getpid()}_{int(time.time())}" |
| MODEL_CONFIGS, DEFAULT_MODEL, _get_model_config_from_loader = load_provider_models("mistral") | ||
|
|
||
|
|
||
| def _get_session_id() -> LLMResult: |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type annotation for _get_session_id() is incorrect. It should return str (as returned by get_session_id(mcp)), not LLMResult.
This will cause type checking errors and misleads developers about the function's return value.
| def _get_session_id() -> LLMResult: | |
| def _get_session_id() -> str: |
| # For images, encode as base64 | ||
| file_content = file_path.read_bytes() | ||
| encoded_content = base64.b64encode(file_content).decode() | ||
| content_parts.append({ | ||
| "type": "image_url", | ||
| "image_url": f"data:image/jpeg;base64,{encoded_content}" |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MIME type for non-text files is hardcoded to image/jpeg, which is incorrect for PNG, WebP, and other image formats. This could cause issues with the Mistral API or lead to incorrect content type headers.
Consider using determine_mime_type(file_path) from mcp_handley_lab.llm.common (already imported) to determine the actual MIME type, similar to how it's done in the process_ocr function below (lines 426-436).
| # For images, encode as base64 | |
| file_content = file_path.read_bytes() | |
| encoded_content = base64.b64encode(file_content).decode() | |
| content_parts.append({ | |
| "type": "image_url", | |
| "image_url": f"data:image/jpeg;base64,{encoded_content}" | |
| # For images or other binary files, encode as base64 and use correct MIME type | |
| file_content = file_path.read_bytes() | |
| encoded_content = base64.b64encode(file_content).decode() | |
| mime_type = determine_mime_type(file_path) | |
| content_parts.append({ | |
| "type": "image_url", | |
| "image_url": f"data:{mime_type};base64,{encoded_content}" |
| if output_file == "-": | ||
| import json | ||
| print(json.dumps(result, indent=2)) | ||
| else: | ||
| import json | ||
| Path(output_file).write_text(json.dumps(result, indent=2)) |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json module is imported twice within the function scope (lines 479 and 482). Move the import to the top of the function or to the module level to avoid redundant imports.
| for image_item in images: | ||
| image_bytes = resolve_image_data(image_item) | ||
| encoded_image = base64.b64encode(image_bytes).decode() | ||
| content.append({ | ||
| "type": "image_url", | ||
| "image_url": f"data:image/jpeg;base64,{encoded_image}" | ||
| }) |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image MIME type is hardcoded to image/jpeg for all images. This is incorrect for PNG, WebP, GIF, and other image formats. The actual MIME type should be determined from the image data.
Consider detecting the MIME type from the image bytes or file extension, similar to how it's done in the process_ocr function. The resolve_image_data function returns bytes, so you could use libraries like imghdr or inspect the file signature to determine the correct MIME type.
| focus: str = Field( | ||
| default="general", | ||
| description="The area of focus for the analysis (e.g., 'ocr', 'objects'). Note: This is a placeholder parameter in the current implementation.", | ||
| ), |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The focus parameter is documented as "a placeholder parameter in the current implementation" but is still passed to process_llm_request (line 367). This creates an unused parameter that serves no functional purpose.
Consider either:
- Removing the parameter entirely if it's not implemented
- Implementing the functionality if it's needed
- Removing it from the function call if it's truly just for future compatibility
Unused parameters can confuse users about what functionality is actually available.
| focus: str = Field( | |
| default="general", | |
| description="The area of focus for the analysis (e.g., 'ocr', 'objects'). Note: This is a placeholder parameter in the current implementation.", | |
| ), |
Merge master into feat/mistral-provider and update Mistral tool to match the new LLM tool patterns from PR #105: - Remove max_output_tokens parameter from ask() and analyze_image() - Make output_file a required parameter (no default) - Add "Only change if user explicitly requests" guidance for model/temperature - Update process_ocr to write directly to file (no stdout option) - Bump version to 0.14.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Rename --tokens to --token-format (CLI breaking change) - Update valid token format options: format, raw (removed only, none) - Update valid sort options: name_asc, name_desc, date_asc, date_desc (removed tokens_asc, tokens_desc) - Remove include_priority parameter (no longer supported) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Rename --tokens to --token-format (code2prompt CLI change) - Update token format options: format, raw - Update sort options: name_asc, name_desc, date_asc, date_desc - Remove include_priority parameter (no longer supported) - Fix Mistral unit tests to not check for supports_vision in MODEL_CONFIGS (model loader only includes output_tokens) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Remove mail-parser (unused - only stdlib email.parser is used) - Remove pyfzf (unused) - Move chromadb to optional dependency (unused, for future semantic features) - Fix python-nbclient -> jupyter-nbclient (correct Arch package name) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add new Mistral AI tools: - transcribe_audio(): Voxtral audio transcription with timestamps - get_embeddings(): text/code embeddings via mistral-embed and codestral-embed - moderate_content(): content safety analysis - fill_in_middle(): FIM code completion for Codestral Update model catalog to v25.12: - Frontier: Mistral Large 3, Medium 3.1, Small 3.2 - Edge: Ministral 3B/8B with vision - Reasoning: Magistral Medium/Small 1.2 (40k output tokens) - Coding: Codestral, Devstral - Audio: Voxtral Small/Mini - Vision: Pixtral Large/12B - Specialist: OCR, Moderation - Embeddings: mistral-embed, codestral-embed Add capability flags to model_loader for Mistral provider. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Iteration 2 fixes: - Use mimetypes.guess_type() for proper MIME detection in _resolve_files - Add conversation history to analyze_image adapter for multi-turn context - Add supports_grounding to model_loader capability flags - Remove unused _SESSION_ID and its imports (os, time) Iteration 3 fixes: - Fail fast with clear error for unsupported file types (non-text, non-image) - Add max 16 texts validation in get_embeddings before API call - Use model_dump() for Pydantic models in moderation category extraction - Update test to expect ValueError for unsupported file types Reviewed and APPROVED by GPT-5 iterative review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #106 +/- ##
=========================================
Coverage ? 42.60%
=========================================
Files ? 45
Lines ? 5049
Branches ? 0
=========================================
Hits ? 2151
Misses ? 2898
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The Mistral SDK returns categories as a plain dict, not a Pydantic model. Add isinstance(cats, dict) check before trying model_dump or vars(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Magistral models return ThinkChunk/TextChunk lists instead of plain strings. Add _extract_text_content() helper to handle both formats, wrapping thinking content in <thinking> tags. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
8a2055c to
582cdd0
Compare
Add include_thinking parameter to ask() to control whether reasoning model thinking is included in output. Defaults to False (final answer only). Set True to see step-by-step reasoning wrapped in <thinking> tags. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Adds Mistral as a new LLM provider alongside Gemini, OpenAI, Claude, and Grok. Includes dedicated OCR capabilities via mistral-ocr-latest model for high-accuracy document text extraction, plus Pixtral vision models for interactive image analysis.
Key additions:
The dedicated OCR model provides 2000 pages/min processing at $1 per 1,000 pages, ideal for research document extraction.
🤖 Generated with Claude Code