Skip to content

Conversation

@teetangh
Copy link
Collaborator

@teetangh teetangh commented May 7, 2025

  • Created executeColumnarQuery function to eliminate duplicate code
    • Standardized error handling and response processing
  1. Schema Caching Mechanism

    • Added cacheSchema and getCachedSchema functions to store and retrieve schema
    • Implemented LRU-style cache eviction to prevent memory issues
    • Added 1-hour caching for schemas to significantly reduce API calls
    • Generated unique cache keys for both collections and custom queries
  2. Optimized Document Processing

    • Replaced recursive algorithm with iterative approach using queue
    • Added maximum depth limit to prevent stack overflow with deeply nested documents
    • Optimized memory usage by avoiding unnecessary object creation
    • Added better logging that won't overflow log limits
  3. Improved Nested Value Extraction

    • Enhanced getNestedValue function with iterative approach
    • Better handling of array indexing
    • Added more robust error checking
  4. Code Structure Improvements

    • Moved utility functions to the beginning of the file before first usage
    • Added proper JSDoc documentation for all new functions
    • Better function parameter validation

These improvements significantly enhance:

  • Performance through reduced API calls and more efficient algorithms
  • Reliability by eliminating recursion stack overflow risks
  • Memory efficiency with better data structures
  • Code maintainability through reduced duplication

   - Created `executeColumnarQuery` function to eliminate duplicate code
   - Standardized error handling and response processing

2. **Schema Caching Mechanism**
   - Added `cacheSchema` and `getCachedSchema` functions to store and retrieve schema
   - Implemented LRU-style cache eviction to prevent memory issues
   - Added 1-hour caching for schemas to significantly reduce API calls
   - Generated unique cache keys for both collections and custom queries

3. **Optimized Document Processing**
   - Replaced recursive algorithm with iterative approach using queue
   - Added maximum depth limit to prevent stack overflow with deeply nested documents
   - Optimized memory usage by avoiding unnecessary object creation
   - Added better logging that won't overflow log limits

4. **Improved Nested Value Extraction**
   - Enhanced `getNestedValue` function with iterative approach
   - Better handling of array indexing
   - Added more robust error checking

5. **Code Structure Improvements**
   - Moved utility functions to the beginning of the file before first usage
   - Added proper JSDoc documentation for all new functions
   - Better function parameter validation

These improvements significantly enhance:
- Performance through reduced API calls and more efficient algorithms
- Reliability by eliminating recursion stack overflow risks
- Memory efficiency with better data structures
- Code maintainability through reduced duplication
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @teetangh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces schema caching to the Couchbase Columnar connector, aiming to reduce API calls and improve performance. It also includes code improvements such as replacing a recursive algorithm with an iterative one for document processing, enhancing nested value extraction, and improving code structure. The changes centralize API request logic, standardize error handling, and add better logging.

Highlights

  • Schema Caching: Implements schema caching using cacheSchema, getCachedSchema, and generateSchemaCacheKey functions. It uses LRU-style eviction and caches schemas for 1 hour to reduce API calls.
  • Optimized Document Processing: Replaces the recursive algorithm in processDocument with an iterative approach using a queue, adds a maximum depth limit to prevent stack overflow, and optimizes memory usage.
  • API Request Centralization: Introduces executeColumnarQuery function to centralize API request logic, standardizing error handling and response processing.
  • Improved Nested Value Extraction: Enhances the getNestedValue function with an iterative approach and better handling of array indexing.

Changelog

  • src/columnar/connector.gs
    • Introduces constructApiUrl function to construct API URLs, ensuring HTTPS and adding a default port.
    • Adds executeColumnarQuery function to centralize API requests, handle errors, and process responses.
    • Refactors validateCredentials to use executeColumnarQuery for authentication validation.
    • Updates fetchCouchbaseMetadata to use executeColumnarQuery for checking System.Metadata accessibility.
    • Implements cacheSchema, getCachedSchema, and generateSchemaCacheKey functions for schema caching.
    • Modifies processDocument to use an iterative approach for document processing and adds a maximum depth limit (line 813).
    • Updates getSchema to check for cached schemas before inferring and caches the inferred schema (lines 943, 1181).
    • Improves getNestedValue with an iterative approach and better array indexing handling (line 1464).
    • Moves constructApiUrl function to the utilities section at the beginning of the file.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A cache, a fleeting store,
Schemas kept, to use once more.
Speedier now the query flies,
As data swiftly satisfies.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several significant improvements, including schema caching, optimized document processing, and improved nested value extraction. The changes aim to enhance performance, reliability, memory efficiency, and code maintainability. Overall, the changes look good, but there are a few areas that could be improved.

Summary of Findings

  • Error Handling in executeColumnarQuery: The executeColumnarQuery function includes comprehensive error logging, but the error message returned in the catch block could be more informative. Consider including the stack trace in the message to aid debugging.
  • Cache Key Generation: The generateSchemaCacheKey function uses a simple string hash function. While this is sufficient for many cases, consider using a more robust hashing algorithm to reduce the likelihood of collisions, especially for complex queries.
  • Logging verbosity in processDocument: The processDocument function logs the final flattened document. For large documents, this can lead to excessive logging. Consider reducing the logging level or truncating the output for very large documents.

Merge Readiness

The pull request is well-structured and addresses the stated goals. However, addressing the identified issues, particularly the error handling in executeColumnarQuery, would improve the overall quality and reliability of the code. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. I recommend addressing the high severity issues before merging.

Comment on lines +108 to +111
return {
error: true,
code: 500,
message: e.toString()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider including the stack trace in the error message to provide more context for debugging. This can help identify the exact location of the error.

    return { 
      error: true, 
      code: 500, 
      message: e.toString() + '\nStack: ' + e.stack
    };

Comment on lines +707 to +714
const query = configParams.query.trim();
let hash = 0;
for (let i = 0; i < query.length; i++) {
const char = query.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32bit integer
}
return 'query:' + hash;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this hash function is simple and effective for basic queries, it might be prone to collisions for more complex queries. Consider using a more robust hashing algorithm to minimize the risk of collisions. A library might be helpful here.

    const query = configParams.query.trim();
    let hash = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_256, query, Utilities.Charset.UTF_8).map(function(byte) {
          var hex = (byte & 0xFF).toString(16);
          return hex.length === 1 ? '0' + hex : hex;
        }).join('');
    return 'query:' + hash;

Comment on lines +900 to +904
Logger.log('processDocument: Final flattened document (keys: %s): %s',
Object.keys(result).length,
Object.keys(result).length > 20 ?
JSON.stringify(Object.keys(result)) :
JSON.stringify(result));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For very large documents, logging the entire flattened document could lead to excessive logging and potentially exceed log limits. Consider reducing the logging level or truncating the output if the document exceeds a certain size.

  Logger.log('processDocument: Final flattened document (keys: %s): %s', 
             Object.keys(result).length, 
             Object.keys(result).length > 20 ? 
               'Too many keys to display' : 
               JSON.stringify(result));

@teetangh teetangh self-assigned this May 14, 2025
@teetangh teetangh added the enhancement New feature or request label May 14, 2025
@teetangh teetangh marked this pull request as draft July 21, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants