feat: Add Gemini 3.0 streaming support for Vertex AI #1185

spartandingo · 2025-12-22T07:37:09Z

Summary

Adds production-grade streaming support for Vertex AI Gemini 3.0 models (Pro and Flash) to the rig-vertexai integration. This implementation follows Rig framework conventions and patterns, using the generic HttpClientExt trait abstraction and GenericEventSource for SSE parsing.

Key Features

StreamingCompletionModel: Generic HTTP client support via HttpClientExt trait
SSE Parsing: Uses Rig's GenericEventSource for reliable event stream handling with automatic retry
Gemini 3.0 Only: Version-gated to support gemini-3-pro and gemini-3-flash with clear error messages for unsupported models
Extended Thinking: Support for thoughtSignature metadata from Gemini 3.0 models
Tool Calling: Full function call support with proper argument passing
Token Tracking: Comprehensive token usage tracking including cached content, candidates, and thought tokens
Model Constants: Added GEMINI_3_PRO and GEMINI_3_FLASH constants

Implementation Details

Streaming format differs between Gemini 2.5 and 3.0, so streaming is only enabled for Gemini 3.0+
Uses http::Request builder instead of hardcoding specific HTTP client implementations
Removes model constants for Gemini 2.5 and below (they don't support streaming)
Aligns with Rig's own Gemini provider streaming pattern in rig-core
Added to workspace dependencies: http = "1.3.1"

Testing

4 new unit tests in streaming module covering:
- Text response deserialization
- Function call deserialization with thoughtSignature
- Token usage calculation
- Final response token usage tracking
All 22 existing and new tests passing
No breaking changes to existing non-streaming completion API

Changes

rig-integrations/rig-vertexai/src/streaming.rs - New streaming module (415 lines)
rig-integrations/rig-vertexai/src/lib.rs - Export StreamingCompletionModel
rig-integrations/rig-vertexai/src/completion.rs - Add Gemini 3.0 model constants
rig-integrations/rig-vertexai/Cargo.toml - Add http dependency
Cargo.toml - Add http to workspace dependencies

- Implement StreamingCompletionModel<HttpClient: HttpClientExt> for generic HTTP client support - Uses Rig's GenericEventSource for SSE parsing with automatic retry handling - Support Gemini 3.0 models (gemini-3-pro, gemini-3-flash) with extended thinking - Tool calling support with function calls and thoughtSignature metadata - Comprehensive token usage tracking (input, output, cached, thoughts) - Version gating: only Gemini 3.0+ models supported with clear error messages - 4 unit tests covering deserialization, tool calls, and token counting - Remove model constants for Gemini 2.5 and lower (streaming unsupported) - Add model constants for Gemini 3.0 variants - Add 'http' to workspace dependencies for Request builder - Pattern aligns with Rig's own Gemini provider implementation update chore: Use direct http dependency for rig-vertexai instead of workspace

- streaming_endpoint() now returns full https://... URLs instead of relative paths - Fixes 'RelativeUrlWithoutBase' error when creating HTTP requests - Properly handles both global (Gemini 3) and regional endpoints - All 22 tests passing

- Gemini 3 streaming uses aiplatform.googleapis.com, not {region}-aiplatform.googleapis.com - Matches endpoint structure from working implementation - Regional endpoints only for non-Gemini-3 models - All 22 tests passing

…ertex AI streaming - Expose credentials() method in VertexAI Client for manual authentication - Clarify authentication requirements for StreamingCompletionModel - Callers should pass authenticated HTTP clients with GCP Bearer tokens This enables integrations to handle authentication via interceptors, middleware, or pre-configured auth headers for Vertex AI API requests.

- Implement GcpAuthMiddleware for injecting Bearer tokens via reqwest-middleware - Add BearerToken type for managing GCP access tokens - Update streaming.rs to clarify auth requirements - Follows Rig's pattern of internal auth handling for clean provider APIs This enables StreamingCompletionModel to work with authenticated HTTP clients that inject GCP Bearer tokens automatically for all Vertex AI requests.

- Remove BearerToken requirement from GcpAuthMiddleware constructor - Implement dynamic token fetching with caching on each request - Add token-source dependency for token management - Add Default trait implementation for convenience

- Simplify GcpAuthMiddleware to placeholder for future enhancements - Document authentication requirements for StreamingCompletionModel - Users should configure auth via reqwest-middleware or ADC - All 22 tests passing in rig-vertexai

spartandingo force-pushed the feat/gemini-3-streaming branch from 765d58f to 8d8e151 Compare December 22, 2025 07:51

spartandingo added 6 commits December 22, 2025 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Gemini 3.0 streaming support for Vertex AI #1185

feat: Add Gemini 3.0 streaming support for Vertex AI #1185

Uh oh!

spartandingo commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add Gemini 3.0 streaming support for Vertex AI #1185

Are you sure you want to change the base?

feat: Add Gemini 3.0 streaming support for Vertex AI #1185

Uh oh!

Conversation

spartandingo commented Dec 22, 2025

Summary

Key Features

Implementation Details

Testing

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant