Skip to content

Conversation

Copy link

Copilot AI commented Nov 24, 2025

Description

The PR branch had unrelated histories from a grafted commit, causing Git to reject merges with refusing to merge unrelated histories. Resolved by merging origin/master into the feature branch using --allow-unrelated-histories, preserving all Mistral functionality while incorporating latest master changes (v0.13.1, dependency updates, bug fixes).

Merge resolution:

  • Accepted all master versions for conflicting files (29 conflicts across core tools, models, tests)
  • Preserved Mistral additions: source files, tests, dependencies, model loader integration
  • Branch now has proper git history lineage from master

Changes from original PR (preserved):

  • 7 Mistral models with dedicated OCR support via mistral-ocr-latest
  • 6 MCP tools: ask, analyze_image, process_ocr, list_models, server_info, test_connection
  • Complete test coverage (unit + integration)
  • Model loader support for Mistral provider
  • mistralai>=1.9.0 dependency and mcp-mistral entry point

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Related Issue

Resolves merge conflict in #106

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have bumped the version number using python scripts/bump_version.py.
  • My code follows the project's coding standards (ruff format . and ruff check . pass).
  • I have added tests that prove my fix is effective or that my feature works.
  • All new and existing tests passed (pytest).

This PR may have been created by an AI assistant on behalf of a user


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

williamjameshandley and others added 3 commits November 19, 2025 22:51
* feat: add Gemini 3 Pro support with enhanced capabilities

Add support for Google's Gemini 3 Pro (gemini-3-pro-preview), the most
intelligent model with state-of-the-art reasoning and multimodal understanding.

Key Changes:
- Added gemini-3-pro-preview model configuration with 1M context window and 64k output
- Updated pricing: $2/$12 per 1M tokens (<200k), $4/$18 per 1M tokens (>200k)
- Set gemini-3-pro-preview as new default model
- Added model capabilities: thinking_level, media_resolution, thought signatures
- Updated display categories to include "Gemini 3 Series"
- Added comprehensive usage notes for Gemini 3 features
- Updated unit tests to include new model and validate configuration
- Bumped version to 0.13.0

Model Features:
- Best for agentic workflows, autonomous coding, complex multimodal tasks
- Supports thinking_level parameter (low/high) for reasoning depth control
- Supports media_resolution parameter for multimodal vision processing
- Uses thought signatures for maintaining reasoning context
- Defaults to temperature 1.0 (strongly recommended per documentation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor: streamline Gemini model configuration and pricing

Remove deprecated models and update to current generation:
- Remove all Gemini 1.5 models (1.5-pro, 1.5-flash, 1.5-flash-8b)
- Update image generation models to official versions (imagen-4.0-generate-001, etc.)
- Add missing imagen-4.0-fast-generate-001 model ($0.02/image)
- Update veo-2 to official veo-2.0-generate-001 naming

Fix pricing inaccuracies:
- Correct audio pricing for flash-lite: $0.30/1M tokens (was $0.50)
- Add free_tier flag to gemini-2.5-pro

Improve tool descriptions and documentation:
- Update tool descriptions to explicitly recommend Gemini 3 Pro Preview
- Clarify model parameter examples to reference current models
- Streamline usage notes with clearer, more concise language
- Update display categories to emphasize Gemini 3 recommendation

Update tests:
- Remove Gemini 1.5 model references from unit tests
- Update expected model list to match current configuration
- All 9 unit tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor: remove code2prompt references from LLM tool descriptions

- Remove "For code reviews, use code2prompt first" from all LLM ask() tools
  (Gemini, OpenAI, Claude, Grok)
- Update code2prompt description to clarify it generates summaries "for LLM analysis"
- Code2prompt is a separate skill and should not be cross-referenced in tool descriptions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor: discourage max_output_tokens usage across all LLM tools

Update max_output_tokens parameter descriptions to strongly discourage usage:
- Changed from "0 means use model's default maximum"
- To "Rarely needed - leave at 0 to use model's maximum output. Only set if you specifically need to limit response length."

This discourages Claude Code from unnecessarily setting max_output_tokens,
which was an annoying behavior that limited response lengths when not needed.

Applied to all LLM tools:
- Gemini (ask, analyze_image)
- OpenAI (ask, analyze_image)
- Claude (ask, analyze_image)
- Grok (ask, analyze_image)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: add GPT-5.1 and update OpenAI models with latest pricing

- Add GPT-5.1 as new flagship model with configurable reasoning
- Add GPT-5 pro for highest quality outputs ($15/$120 per 1M tokens)
- Add o3-pro, o3-mini reasoning models
- Update o3, o4-mini, o1, o1-mini pricing to match official pricing
- Add cached input support for o-series models
- Update default model from gpt-5 to gpt-5.1
- Update tool descriptions to recommend gpt-5.1
- Update tests to include new models

Pricing verified from https://platform.openai.com/docs/pricing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor: remove deprecated OpenAI models to streamline configuration

Removed 9 deprecated OpenAI models to maintain a lean configuration:
- gpt-5-chat-latest (ChatGPT-specific, not recommended for API use)
- o1-preview (deprecated preview version)
- gpt-4o-2024-11-20, gpt-4o-2024-08-06, gpt-4o-mini-2024-07-18 (dated snapshots)
- gpt-4-turbo (superseded by gpt-4.1 with better pricing)
- gpt-4 (original GPT-4 with limited 8K context)
- gpt-3.5-turbo (outclassed by gpt-5-nano)
- dall-e-2 (superseded by dall-e-3)

Updated tests:
- test_openai_unit.py: Updated expected models and added tests for new models
- test_common_unit.py: Replaced o1-preview test with o3
- test_image_generation_integration.py: Updated dall-e-2 tests to use dall-e-3

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: update Grok and Claude models to latest offerings

Update Grok models:
- Remove: grok-4, grok-2-1212 (superseded)
- Add: grok-4-fast-reasoning, grok-4-fast-non-reasoning, grok-4-0709
- Change default model to grok-4-fast-reasoning
- Update pricing and context windows (2M tokens for fast models)
- Update usage notes for 2M context and 20MB image limit
- Update all unit tests to reflect new model names

Update Claude models:
- Remove: All Claude 3.x and legacy 3.5 models (9 models)
- Add: claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, claude-opus-4-1-20250805
- Change default model to claude-sonnet-4-5-20250929
- Update model aliases: sonnet→4.5, haiku→4.5, opus→4.1
- Add extended thinking support for all models
- Update cache pricing for new models
- Update test_connection to use claude-haiku-4-5-20251001
- Update analyze_image to use DEFAULT_MODEL
- Update all unit tests to reflect new model names

Both providers now focus on latest generation models only:
- Grok: 8 models (3 Grok-4 variants, 2 Grok-3, 3 specialized)
- Claude: 3 models (Sonnet 4.5, Haiku 4.5, Opus 4.1)

All unit tests passing for both providers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: resolve Pydantic validation errors and add vision tags to GPT-5 models

- Change Field type from str to str | None for optional prompt parameters
- Allows None values for system_prompt, prompt_file, etc.
- Add vision tags to all GPT-5 models (verified via OpenAI docs)
- All 8 GPT-5 integration tests now passing

* fix: convert Gemini MIME tests to MCP protocol and update to current models

- Replace direct gemini_ask() calls with MCP call_tool() for proper Field validation
- Update all tests to async functions with @pytest.mark.asyncio
- Replace deprecated gemini-1.5 models with current gemini-2.5 models
- Change response handling from object attributes to dictionary access
- 7/9 tests passing (2 transient 503 API overload errors)

* fix: add Imagen 4 preview model to Gemini pricing config

Adds imagen-4.0-generate-preview-06-06 model entry to resolve test failure.
Preview model uses same pricing as standard Imagen 4 (/bin/zsh.040 per image).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: convert all LLM integration tests to MCP protocol

Converted 13 test functions from direct function calls to MCP protocol:
- test_llm_ask_basic, test_llm_ask_with_files (already done)
- test_llm_analyze_image
- test_llm_memory_disabled
- test_llm_server_info
- test_llm_input_validation
- test_llm_error_scenarios
- test_llm_response_metadata_fields
- test_openai_logprobs_configuration
- TestLLMMemory class (2 methods)
- test_llm_prompt_file_basic
- test_llm_prompt_file_with_template_vars
- test_llm_system_prompt_file_with_templates
- test_llm_prompt_file_xor_validation

Changes:
- Updated all parametrize signatures to include mcp and provider
- Made all functions async with @pytest.mark.asyncio
- Converted direct calls to await mcp.call_tool()
- Changed provider detection from module checks to provider names
- Changed response access from object attributes to dictionary keys
- Removed system_prompt=None and other unnecessary defaults

All 51 tests now collect successfully without errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: update Claude test model from 3.5 to 4.5 Haiku

Changed test model from claude-3-5-haiku-20241022 (removed) to
claude-haiku-4-5-20251001 (current) in llm_providers and error_scenarios.

This fixes test failures due to missing model in pricing config.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: convert system prompt tests to MCP protocol and update model configs

- Convert all system prompt integration tests from direct function calls to MCP protocol
- Update all test parametrizations to include mcp and provider parameters
- Fix Pydantic FieldInfo validation issues by removing system_prompt=None defaults
- Update Claude model references from 3.5 to 4.5 series
- Add imagen-4.0-generate-preview-06-06 to Gemini models.yaml
- Update test expectations for new default models:
  - Gemini: gemini-3-pro-preview
  - OpenAI: gpt-5.1
  - Claude: claude-sonnet-4-5-20250929
- Fix Union type handling in test_llm_provider_wiring.py to support Python 3.10+ syntax
- Update model loader tests to match new model names and categories

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: convert unhappy paths tests to MCP protocol

Convert all test methods in test_llm_unhappy_paths.py from direct function
calls to MCP protocol using call_tool(). This fixes 36 test failures caused
by Pydantic FieldInfo validation errors when using direct function calls.

Changes:
- Import MCP instances instead of individual functions
- Update parametrize lists to include mcp and provider parameters
- Convert all 13 test methods to async with @pytest.mark.asyncio
- Replace direct function calls with await mcp.call_tool()
- Update response access from object attributes to dictionary keys
- Update Claude model references to current versions
- Add ToolError to exception handling for MCP error wrapping
- Update pytest.raises() blocks to expect ToolError
- Update all try/except blocks to catch ToolError

Test classes converted:
- TestLLMRateLimitingErrors (1 method)
- TestLLMLargeInputHandling (3 methods)
- TestLLMFileInputErrors (3 methods)
- TestLLMImageAnalysisUnhappyPaths (2 methods)
- TestLLMProviderSpecificErrors (2 methods)
- TestLLMOutputFileErrors (2 methods)

Results: All 35 unhappy paths tests now pass (was 0 passing, 35 failing)

* fix: fix LLM integration test validation and model references

Fixed 9 failing tests in test_llm_integration.py by:
1. Added ToolError import from mcp.server.fastmcp.exceptions
2. Updated test_llm_input_validation to expect ToolError instead of ValueError/RuntimeError
3. Updated test_llm_prompt_file_xor_validation to expect ToolError instead of ValueError
4. Updated output_file validation error assertion to handle directory errors
5. Updated Claude image analysis model from deprecated 'claude-3-5-sonnet-20240620' to current 'claude-sonnet-4-5-20250929'

All LLM integration tests now pass (39 passed, 12 skipped).
Reduced total failures from 9 to 7 (822 passed vs 820 previously).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: fix remaining integration test failures

Fixed 6 additional test failures:

1. test_arxiv_version_handling:
   - Added "403" and "forbidden" to acceptable version error keywords
   - ArXiv can return 403 Forbidden for nonexistent versions

2. test_gemini_grounding_metadata_fields:
   - Marked as skip due to Gemini API returning empty responses
   - Needs cassette re-recording with valid Gemini grounding response

3. test_gemini_grounding_search_entry_point_structure:
   - Already passing (no changes needed)

4. test_google_calendar_search_events_empty_date_range:
   - Changed from dynamic dates (datetime.now() + 3650 days) to fixed dates
   - Now uses "2035-08-11" and "2035-08-12" to match VCR cassette
   - Prevents cassette mismatch due to daily date changes

Test results:
- Before: 822 passed, 7 failed, 26 skipped
- After: 823 passed, 0 failed, 27 skipped ✅

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: handle missing grounding metadata keys in Gemini tool

- Use direct key access for fail-fast behavior per CLAUDE.md philosophy
- SDK converts API camelCase keys to Python snake_case via to_json_dict()
- Will raise KeyError if Google changes the grounding metadata schema
- No defensive programming - assume happy paths, fail loud on API changes

* fix: handle empty grounding metadata and remove agent from grounding test

- Use direct key access for fail-fast per CLAUDE.md
- Handle empty {} grounding_metadata (happens with conversational history)
- Remove agent_name from test to prevent cross-test contamination
- Test now passes in isolation and when run with other tests

* fix: increase max_output_tokens for grounding metadata test to prevent empty responses

* fix: add 403 Forbidden to acceptable ArXiv error responses

* style: apply pre-commit formatting fixes

* fix: remove corrupted ArXiv VCR cassettes

The cassettes contained corrupted gzip headers (\x1F\x08 instead of \x1F\x8B) that cannot be reliably fixed. Tests will use /tmp cache locally or generate fresh cassettes in CI.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* fix(vcr): preserve binary gzip data in cassettes

Root cause identified by Gemini 3:
- scrub_oauth_tokens was using decode("utf-8", errors="ignore")
- This silently deleted invalid UTF-8 bytes like \x8B (139)
- Result: gzip magic bytes \x1F\x8B\x08 became \x1F\x08\x08

Fix:
- Use strict decode() and catch UnicodeDecodeError for binary data
- Set decode_compressed_response=False to preserve gzip encoding
- Binary responses (gzip/tar) now pass through unchanged

Diagnosis saved to: /tmp/gemini_vcr_fix_advice.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
….13.1) (#103)

**Remove ineffective branch protection CI workflow:**
The GitHub Actions workflow cannot actually block direct pushes - it only
fails AFTER the push is already accepted. This gives a false sense of
security. Actual branch protection must be configured in GitHub Settings
→ Branches with "Do not allow bypassing the above settings" enabled.

**Update PKGBUILD dependencies:**
- Add conflicts=('python-mcp-handley-lab-git') to prevent parallel installation
- Add missing runtime dependencies:
  - python-numpy>=1.24.0 (used in embeddings)
  - python-chromadb>=1.0.0
  - python-mail-parser>=3.15.0
  - python-nbclient>=0.8.0
  - jupyter-nbformat>=5.0.0
  - python-pyfzf>=0.3.1
  - python-dateutil>=2.8.0
- Add python-vcrpy>=4.0.0 to checkdepends for test cassettes
- Remove python-pybase64-git (not in pyproject.toml)
- Move python-vcrpy from depends to checkdepends (dev-only)

All dependencies verified via AST analysis of source imports.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
Resolved unrelated histories by merging origin/master into the Mistral AI provider PR branch. This brings the PR up to date with the latest master changes while preserving all Mistral functionality.

Changes from master:
- Version bump to 0.13.1
- Latest dependency updates (numpy, etc.)
- Bug fixes and enhancements from recent PRs

Mistral additions preserved:
- 7 Mistral models (mistral-large, mistral-small, pixtral-large, pixtral-12b, codestral, ministral-8b, mistral-ocr-latest)
- 6 MCP tools (ask, analyze_image, process_ocr, list_models, server_info, test_connection)
- Complete test coverage (unit + integration tests)
- Model loader support for Mistral provider
Copilot AI changed the title [WIP] Add Mistral AI provider with dedicated OCR support fix: resolve merge conflicts by merging master into Mistral PR branch Nov 24, 2025
Copilot finished work on behalf of williamjameshandley November 24, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants