Skip to content

Conversation

@spartandingo
Copy link

Summary

Adds production-grade streaming support for Vertex AI Gemini 3.0 models (Pro and Flash) to the rig-vertexai integration. This implementation follows Rig framework conventions and patterns, using the generic HttpClientExt trait abstraction and GenericEventSource for SSE parsing.

Key Features

  • StreamingCompletionModel: Generic HTTP client support via HttpClientExt trait
  • SSE Parsing: Uses Rig's GenericEventSource for reliable event stream handling with automatic retry
  • Gemini 3.0 Only: Version-gated to support gemini-3-pro and gemini-3-flash with clear error messages for unsupported models
  • Extended Thinking: Support for thoughtSignature metadata from Gemini 3.0 models
  • Tool Calling: Full function call support with proper argument passing
  • Token Tracking: Comprehensive token usage tracking including cached content, candidates, and thought tokens
  • Model Constants: Added GEMINI_3_PRO and GEMINI_3_FLASH constants

Implementation Details

  • Streaming format differs between Gemini 2.5 and 3.0, so streaming is only enabled for Gemini 3.0+
  • Uses http::Request builder instead of hardcoding specific HTTP client implementations
  • Removes model constants for Gemini 2.5 and below (they don't support streaming)
  • Aligns with Rig's own Gemini provider streaming pattern in rig-core
  • Added to workspace dependencies: http = "1.3.1"

Testing

  • 4 new unit tests in streaming module covering:
    • Text response deserialization
    • Function call deserialization with thoughtSignature
    • Token usage calculation
    • Final response token usage tracking
  • All 22 existing and new tests passing
  • No breaking changes to existing non-streaming completion API

Changes

  • rig-integrations/rig-vertexai/src/streaming.rs - New streaming module (415 lines)
  • rig-integrations/rig-vertexai/src/lib.rs - Export StreamingCompletionModel
  • rig-integrations/rig-vertexai/src/completion.rs - Add Gemini 3.0 model constants
  • rig-integrations/rig-vertexai/Cargo.toml - Add http dependency
  • Cargo.toml - Add http to workspace dependencies

- Implement StreamingCompletionModel<HttpClient: HttpClientExt> for generic HTTP client support
- Uses Rig's GenericEventSource for SSE parsing with automatic retry handling
- Support Gemini 3.0 models (gemini-3-pro, gemini-3-flash) with extended thinking
- Tool calling support with function calls and thoughtSignature metadata
- Comprehensive token usage tracking (input, output, cached, thoughts)
- Version gating: only Gemini 3.0+ models supported with clear error messages
- 4 unit tests covering deserialization, tool calls, and token counting
- Remove model constants for Gemini 2.5 and lower (streaming unsupported)
- Add model constants for Gemini 3.0 variants
- Add 'http' to workspace dependencies for Request builder
- Pattern aligns with Rig's own Gemini provider implementation

update

chore: Use direct http dependency for rig-vertexai instead of workspace
@spartandingo spartandingo force-pushed the feat/gemini-3-streaming branch from 765d58f to 8d8e151 Compare December 22, 2025 07:51
- streaming_endpoint() now returns full https://... URLs instead of relative paths
- Fixes 'RelativeUrlWithoutBase' error when creating HTTP requests
- Properly handles both global (Gemini 3) and regional endpoints
- All 22 tests passing
- Gemini 3 streaming uses aiplatform.googleapis.com, not {region}-aiplatform.googleapis.com
- Matches endpoint structure from working implementation
- Regional endpoints only for non-Gemini-3 models
- All 22 tests passing
…ertex AI streaming

- Expose credentials() method in VertexAI Client for manual authentication
- Clarify authentication requirements for StreamingCompletionModel
- Callers should pass authenticated HTTP clients with GCP Bearer tokens

This enables integrations to handle authentication via interceptors, middleware,
or pre-configured auth headers for Vertex AI API requests.
- Implement GcpAuthMiddleware for injecting Bearer tokens via reqwest-middleware
- Add BearerToken type for managing GCP access tokens
- Update streaming.rs to clarify auth requirements
- Follows Rig's pattern of internal auth handling for clean provider APIs

This enables StreamingCompletionModel to work with authenticated HTTP clients
that inject GCP Bearer tokens automatically for all Vertex AI requests.
- Remove BearerToken requirement from GcpAuthMiddleware constructor
- Implement dynamic token fetching with caching on each request
- Add token-source dependency for token management
- Add Default trait implementation for convenience
- Simplify GcpAuthMiddleware to placeholder for future enhancements
- Document authentication requirements for StreamingCompletionModel
- Users should configure auth via reqwest-middleware or ADC
- All 22 tests passing in rig-vertexai
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant