-
Notifications
You must be signed in to change notification settings - Fork 0
(feat) schema caching #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Created `executeColumnarQuery` function to eliminate duplicate code - Standardized error handling and response processing 2. **Schema Caching Mechanism** - Added `cacheSchema` and `getCachedSchema` functions to store and retrieve schema - Implemented LRU-style cache eviction to prevent memory issues - Added 1-hour caching for schemas to significantly reduce API calls - Generated unique cache keys for both collections and custom queries 3. **Optimized Document Processing** - Replaced recursive algorithm with iterative approach using queue - Added maximum depth limit to prevent stack overflow with deeply nested documents - Optimized memory usage by avoiding unnecessary object creation - Added better logging that won't overflow log limits 4. **Improved Nested Value Extraction** - Enhanced `getNestedValue` function with iterative approach - Better handling of array indexing - Added more robust error checking 5. **Code Structure Improvements** - Moved utility functions to the beginning of the file before first usage - Added proper JSDoc documentation for all new functions - Better function parameter validation These improvements significantly enhance: - Performance through reduced API calls and more efficient algorithms - Reliability by eliminating recursion stack overflow risks - Memory efficiency with better data structures - Code maintainability through reduced duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @teetangh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request introduces schema caching to the Couchbase Columnar connector, aiming to reduce API calls and improve performance. It also includes code improvements such as replacing a recursive algorithm with an iterative one for document processing, enhancing nested value extraction, and improving code structure. The changes centralize API request logic, standardize error handling, and add better logging.
Highlights
- Schema Caching: Implements schema caching using
cacheSchema,getCachedSchema, andgenerateSchemaCacheKeyfunctions. It uses LRU-style eviction and caches schemas for 1 hour to reduce API calls. - Optimized Document Processing: Replaces the recursive algorithm in
processDocumentwith an iterative approach using a queue, adds a maximum depth limit to prevent stack overflow, and optimizes memory usage. - API Request Centralization: Introduces
executeColumnarQueryfunction to centralize API request logic, standardizing error handling and response processing. - Improved Nested Value Extraction: Enhances the
getNestedValuefunction with an iterative approach and better handling of array indexing.
Changelog
- src/columnar/connector.gs
- Introduces
constructApiUrlfunction to construct API URLs, ensuring HTTPS and adding a default port. - Adds
executeColumnarQueryfunction to centralize API requests, handle errors, and process responses. - Refactors
validateCredentialsto useexecuteColumnarQueryfor authentication validation. - Updates
fetchCouchbaseMetadatato useexecuteColumnarQueryfor checking System.Metadata accessibility. - Implements
cacheSchema,getCachedSchema, andgenerateSchemaCacheKeyfunctions for schema caching. - Modifies
processDocumentto use an iterative approach for document processing and adds a maximum depth limit (line 813). - Updates
getSchemato check for cached schemas before inferring and caches the inferred schema (lines 943, 1181). - Improves
getNestedValuewith an iterative approach and better array indexing handling (line 1464). - Moves
constructApiUrlfunction to the utilities section at the beginning of the file.
- Introduces
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A cache, a fleeting store,
Schemas kept, to use once more.
Speedier now the query flies,
As data swiftly satisfies.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several significant improvements, including schema caching, optimized document processing, and improved nested value extraction. The changes aim to enhance performance, reliability, memory efficiency, and code maintainability. Overall, the changes look good, but there are a few areas that could be improved.
Summary of Findings
- Error Handling in
executeColumnarQuery: TheexecuteColumnarQueryfunction includes comprehensive error logging, but the error message returned in the catch block could be more informative. Consider including the stack trace in the message to aid debugging. - Cache Key Generation: The
generateSchemaCacheKeyfunction uses a simple string hash function. While this is sufficient for many cases, consider using a more robust hashing algorithm to reduce the likelihood of collisions, especially for complex queries. - Logging verbosity in
processDocument: TheprocessDocumentfunction logs the final flattened document. For large documents, this can lead to excessive logging. Consider reducing the logging level or truncating the output for very large documents.
Merge Readiness
The pull request is well-structured and addresses the stated goals. However, addressing the identified issues, particularly the error handling in executeColumnarQuery, would improve the overall quality and reliability of the code. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. I recommend addressing the high severity issues before merging.
| return { | ||
| error: true, | ||
| code: 500, | ||
| message: e.toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const query = configParams.query.trim(); | ||
| let hash = 0; | ||
| for (let i = 0; i < query.length; i++) { | ||
| const char = query.charCodeAt(i); | ||
| hash = ((hash << 5) - hash) + char; | ||
| hash = hash & hash; // Convert to 32bit integer | ||
| } | ||
| return 'query:' + hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this hash function is simple and effective for basic queries, it might be prone to collisions for more complex queries. Consider using a more robust hashing algorithm to minimize the risk of collisions. A library might be helpful here.
const query = configParams.query.trim();
let hash = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_256, query, Utilities.Charset.UTF_8).map(function(byte) {
var hex = (byte & 0xFF).toString(16);
return hex.length === 1 ? '0' + hex : hex;
}).join('');
return 'query:' + hash;
| Logger.log('processDocument: Final flattened document (keys: %s): %s', | ||
| Object.keys(result).length, | ||
| Object.keys(result).length > 20 ? | ||
| JSON.stringify(Object.keys(result)) : | ||
| JSON.stringify(result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For very large documents, logging the entire flattened document could lead to excessive logging and potentially exceed log limits. Consider reducing the logging level or truncating the output if the document exceeds a certain size.
Logger.log('processDocument: Final flattened document (keys: %s): %s',
Object.keys(result).length,
Object.keys(result).length > 20 ?
'Too many keys to display' :
JSON.stringify(result));
executeColumnarQueryfunction to eliminate duplicate codeSchema Caching Mechanism
cacheSchemaandgetCachedSchemafunctions to store and retrieve schemaOptimized Document Processing
Improved Nested Value Extraction
getNestedValuefunction with iterative approachCode Structure Improvements
These improvements significantly enhance: