(feat) schema caching #6

teetangh · 2025-05-07T07:15:54Z

Created executeColumnarQuery function to eliminate duplicate code
- Standardized error handling and response processing

Schema Caching Mechanism
- Added cacheSchema and getCachedSchema functions to store and retrieve schema
- Implemented LRU-style cache eviction to prevent memory issues
- Added 1-hour caching for schemas to significantly reduce API calls
- Generated unique cache keys for both collections and custom queries
Optimized Document Processing
- Replaced recursive algorithm with iterative approach using queue
- Added maximum depth limit to prevent stack overflow with deeply nested documents
- Optimized memory usage by avoiding unnecessary object creation
- Added better logging that won't overflow log limits
Improved Nested Value Extraction
- Enhanced getNestedValue function with iterative approach
- Better handling of array indexing
- Added more robust error checking
Code Structure Improvements
- Moved utility functions to the beginning of the file before first usage
- Added proper JSDoc documentation for all new functions
- Better function parameter validation

These improvements significantly enhance:

Performance through reduced API calls and more efficient algorithms
Reliability by eliminating recursion stack overflow risks
Memory efficiency with better data structures
Code maintainability through reduced duplication

- Created `executeColumnarQuery` function to eliminate duplicate code - Standardized error handling and response processing 2. **Schema Caching Mechanism** - Added `cacheSchema` and `getCachedSchema` functions to store and retrieve schema - Implemented LRU-style cache eviction to prevent memory issues - Added 1-hour caching for schemas to significantly reduce API calls - Generated unique cache keys for both collections and custom queries 3. **Optimized Document Processing** - Replaced recursive algorithm with iterative approach using queue - Added maximum depth limit to prevent stack overflow with deeply nested documents - Optimized memory usage by avoiding unnecessary object creation - Added better logging that won't overflow log limits 4. **Improved Nested Value Extraction** - Enhanced `getNestedValue` function with iterative approach - Better handling of array indexing - Added more robust error checking 5. **Code Structure Improvements** - Moved utility functions to the beginning of the file before first usage - Added proper JSDoc documentation for all new functions - Better function parameter validation These improvements significantly enhance: - Performance through reduced API calls and more efficient algorithms - Reliability by eliminating recursion stack overflow risks - Memory efficiency with better data structures - Code maintainability through reduced duplication

gemini-code-assist

Hello @teetangh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces schema caching to the Couchbase Columnar connector, aiming to reduce API calls and improve performance. It also includes code improvements such as replacing a recursive algorithm with an iterative one for document processing, enhancing nested value extraction, and improving code structure. The changes centralize API request logic, standardize error handling, and add better logging.

Highlights

Schema Caching: Implements schema caching using cacheSchema, getCachedSchema, and generateSchemaCacheKey functions. It uses LRU-style eviction and caches schemas for 1 hour to reduce API calls.
Optimized Document Processing: Replaces the recursive algorithm in processDocument with an iterative approach using a queue, adds a maximum depth limit to prevent stack overflow, and optimizes memory usage.
API Request Centralization: Introduces executeColumnarQuery function to centralize API request logic, standardizing error handling and response processing.
Improved Nested Value Extraction: Enhances the getNestedValue function with an iterative approach and better handling of array indexing.

Changelog

src/columnar/connector.gs
- Introduces constructApiUrl function to construct API URLs, ensuring HTTPS and adding a default port.
- Adds executeColumnarQuery function to centralize API requests, handle errors, and process responses.
- Refactors validateCredentials to use executeColumnarQuery for authentication validation.
- Updates fetchCouchbaseMetadata to use executeColumnarQuery for checking System.Metadata accessibility.
- Implements cacheSchema, getCachedSchema, and generateSchemaCacheKey functions for schema caching.
- Modifies processDocument to use an iterative approach for document processing and adds a maximum depth limit (line 813).
- Updates getSchema to check for cached schemas before inferring and caches the inferred schema (lines 943, 1181).
- Improves getNestedValue with an iterative approach and better array indexing handling (line 1464).
- Moves constructApiUrl function to the utilities section at the beginning of the file.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A cache, a fleeting store,
Schemas kept, to use once more.
Speedier now the query flies,
As data swiftly satisfies.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several significant improvements, including schema caching, optimized document processing, and improved nested value extraction. The changes aim to enhance performance, reliability, memory efficiency, and code maintainability. Overall, the changes look good, but there are a few areas that could be improved.

Summary of Findings

Error Handling in executeColumnarQuery: The executeColumnarQuery function includes comprehensive error logging, but the error message returned in the catch block could be more informative. Consider including the stack trace in the message to aid debugging.
Cache Key Generation: The generateSchemaCacheKey function uses a simple string hash function. While this is sufficient for many cases, consider using a more robust hashing algorithm to reduce the likelihood of collisions, especially for complex queries.
Logging verbosity in processDocument: The processDocument function logs the final flattened document. For large documents, this can lead to excessive logging. Consider reducing the logging level or truncating the output for very large documents.

Merge Readiness

The pull request is well-structured and addresses the stated goals. However, addressing the identified issues, particularly the error handling in executeColumnarQuery, would improve the overall quality and reliability of the code. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. I recommend addressing the high severity issues before merging.

gemini-code-assist · 2025-05-07T07:16:14Z

src/columnar/connector.gs

+    return { 
+      error: true, 
+      code: 500, 
+      message: e.toString() 


Consider including the stack trace in the error message to provide more context for debugging. This can help identify the exact location of the error.

return { error: true, code: 500, message: e.toString() + '\nStack: ' + e.stack };

gemini-code-assist · 2025-05-07T07:16:14Z

src/columnar/connector.gs

+    const query = configParams.query.trim();
+    let hash = 0;
+    for (let i = 0; i < query.length; i++) {
+      const char = query.charCodeAt(i);
+      hash = ((hash << 5) - hash) + char;
+      hash = hash & hash; // Convert to 32bit integer
+    }
+    return 'query:' + hash;


While this hash function is simple and effective for basic queries, it might be prone to collisions for more complex queries. Consider using a more robust hashing algorithm to minimize the risk of collisions. A library might be helpful here.

const query = configParams.query.trim(); let hash = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_256, query, Utilities.Charset.UTF_8).map(function(byte) { var hex = (byte & 0xFF).toString(16); return hex.length === 1 ? '0' + hex : hex; }).join(''); return 'query:' + hash;

gemini-code-assist · 2025-05-07T07:16:14Z

src/columnar/connector.gs

+  Logger.log('processDocument: Final flattened document (keys: %s): %s', 
+             Object.keys(result).length, 
+             Object.keys(result).length > 20 ? 
+               JSON.stringify(Object.keys(result)) : 
+               JSON.stringify(result));


For very large documents, logging the entire flattened document could lead to excessive logging and potentially exceed log limits. Consider reducing the logging level or truncating the output if the document exceeds a certain size.

Logger.log('processDocument: Final flattened document (keys: %s): %s', Object.keys(result).length, Object.keys(result).length > 20 ? 'Too many keys to display' : JSON.stringify(result));

gemini-code-assist bot reviewed May 7, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 7, 2025

View reviewed changes

teetangh self-assigned this May 14, 2025

teetangh added the enhancement New feature or request label May 14, 2025

teetangh marked this pull request as draft July 21, 2025 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(feat) schema caching #6

(feat) schema caching #6

Uh oh!

teetangh commented May 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot May 7, 2025

Uh oh!

gemini-code-assist bot May 7, 2025

Uh oh!

gemini-code-assist bot May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

(feat) schema caching #6

Are you sure you want to change the base?

(feat) schema caching #6

Uh oh!

Conversation

teetangh commented May 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot May 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants