-
Notifications
You must be signed in to change notification settings - Fork 0
(fix) arrays handling in analytics connector #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…for aggregation of both object arrays and primitive arrays. Refine value extraction logic in getData function to support nested paths and improve data representation. Introduce a helper function for nested value retrieval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @teetangh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request addresses the handling of arrays in the analytics connector, specifically within the buildSchema and getData functions. Previously, arrays were skipped during schema construction. Now, the code aggregates array data into string fields, differentiating between arrays of objects and arrays of primitives. The getData function has also been modified to handle nested results and extract values from aggregated array fields. A new utility function, getNestedValue, has been added to assist with accessing nested values within objects, including those within arrays.
Highlights
- Array Handling in Schema Building: The
buildSchemafunction now processes arrays, creating aggregated string fields for both arrays of objects (e.g., 'schedule.day') and arrays of primitives (e.g., 'aliases'). Empty arrays are still skipped. - Data Extraction from Arrays: The
getDatafunction has been updated to extract data from aggregated array fields. It now identifies the base part and nested part of a field ID (e.g., 'schedule' and 'day' in 'schedule.day') and aggregates values accordingly. - Nested Value Retrieval: A new utility function,
getNestedValue, has been introduced to handle nested value retrieval within objects, including those nested within arrays. This function replaces the previousreducebased approach.
Changelog
- src/analytics/connector.gs
- Modified
buildSchemato handle arrays of objects and primitives, creating aggregated string fields in the schema. - Modified
getDatato handle nested results and extract values from aggregated array fields, using the newgetNestedValuefunction. - Added a new utility function
getNestedValueto retrieve nested values within objects, including those within arrays.
- Modified
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
In code we dive, arrays in sight,
To string them together, with all our might.
Nested values we chase,
In this digital space,
Making data shine ever so bright.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request addresses the handling of arrays in the analytics connector, which is a valuable improvement. The changes introduce logic to process arrays of objects and primitives, creating aggregated string fields for schema generation. The code is generally well-structured and includes logging for debugging purposes. However, there are a few areas that could be improved for clarity and efficiency.
Summary of Findings
- Potential Performance Bottleneck: The use of
schema.somewithin loops to check for existing fields could lead to performance issues, especially with large schemas. Consider using a more efficient data structure for faster lookups. - Inconsistent Logging: The logging messages use different formats and levels of detail. Standardizing the logging format would improve readability and maintainability.
- Code Duplication: The logic for adding fields to the schema is repeated for both arrays of objects and arrays of primitives. This could be refactored into a single function to reduce code duplication.
Merge Readiness
The pull request introduces important functionality for handling arrays, but there are some potential performance and maintainability issues that should be addressed before merging. Specifically, the performance of the schema.some checks should be improved, the logging format should be standardized, and the duplicated code for adding fields to the schema should be refactored. I am unable to approve this pull request, and recommend that others review and approve this code before merging.
| if (!schema.some(f => f.name === aggFieldName)) { | ||
| schema.push({ | ||
| name: aggFieldName, | ||
| label: aggFieldName, | ||
| dataType: 'STRING', | ||
| semantics: { conceptType: 'DIMENSION' } | ||
| }); | ||
| Logger.log('addFieldToSchema (Array of Objects): Added aggregated field: %s', aggFieldName); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema.some method is used to check if a field already exists before pushing it to the schema. This approach has a time complexity of O(n) for each field being added, where n is the number of fields in the schema. For large schemas, this can become a performance bottleneck. Consider using a Set or Map to store the field names for O(1) lookup time.
Suggestion: Use a Set to track existing field names for faster lookups.
const existingFields = new Set(schema.map(f => f.name));
Array.from(allKeys).forEach(key => {
const aggFieldName = `${arrayPrefix}.${key}`;
if (!existingFields.has(aggFieldName)) { // O(1) lookup
schema.push({
name: aggFieldName,
label: aggFieldName,
dataType: 'STRING',
semantics: { conceptType: 'DIMENSION' }
});
Logger.log('addFieldToSchema (Array of Objects): Added aggregated field: %s', aggFieldName);
}
});
| if (!schema.some(f => f.name === aggFieldName)) { | ||
| schema.push({ | ||
| name: aggFieldName, | ||
| label: aggFieldName, | ||
| dataType: 'STRING', | ||
| semantics: { conceptType: 'DIMENSION' } | ||
| }); | ||
| Logger.log('addFieldToSchema (Array of Primitives): Added aggregated field: %s', aggFieldName); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the array of objects case, the schema.some method is used here as well. Consider using the same optimization with a Set to improve performance for large schemas.
Suggestion: Use a Set to track existing field names for faster lookups.
const existingFields = new Set(schema.map(f => f.name));
const aggFieldName = fieldName;
if (!existingFields.has(aggFieldName)) { // O(1) lookup
schema.push({
name: aggFieldName,
label: aggFieldName,
dataType: 'STRING',
semantics: { conceptType: 'DIMENSION' }
});
Logger.log('addFieldToSchema (Array of Primitives): Added aggregated field: %s', aggFieldName);
}
| if (value.length > 0 && typeof value[0] === 'object' && value[0] !== null) { | ||
| // Array of Objects: Create aggregated string fields | ||
| const arrayPrefix = fieldName; // Use the current fieldName as the prefix (e.g., 'schedule') | ||
| const allKeys = new Set(); | ||
| Object.keys(value[0]).forEach(key => allKeys.add(key)); // Assume consistent keys based on first item | ||
|
|
||
| Logger.log('addFieldToSchema (Array of Objects): Found keys [%s] for prefix \'%s\'', Array.from(allKeys).join(', '), arrayPrefix); | ||
|
|
||
| Array.from(allKeys).forEach(key => { | ||
| const aggFieldName = `${arrayPrefix}.${key}`; // e.g., schedule.day | ||
| // Check if field already exists before pushing | ||
| if (!schema.some(f => f.name === aggFieldName)) { | ||
| schema.push({ | ||
| name: aggFieldName, | ||
| label: aggFieldName, | ||
| dataType: 'STRING', | ||
| semantics: { conceptType: 'DIMENSION' } | ||
| }); | ||
| Logger.log('addFieldToSchema (Array of Objects): Added aggregated field: %s', aggFieldName); | ||
| } | ||
| }); | ||
| } else if (value.length > 0) { | ||
| // Array of Primitives: Create one aggregated string field | ||
| const aggFieldName = fieldName; // Use the current fieldName (e.g., 'aliases') | ||
| if (!schema.some(f => f.name === aggFieldName)) { | ||
| schema.push({ | ||
| name: aggFieldName, | ||
| label: aggFieldName, | ||
| dataType: 'STRING', | ||
| semantics: { conceptType: 'DIMENSION' } | ||
| }); | ||
| Logger.log('addFieldToSchema (Array of Primitives): Added aggregated field: %s', aggFieldName); | ||
| } | ||
| } else { | ||
| // Empty array: Skip (consistent with original logic) | ||
| Logger.log('addFieldToSchema: Skipping empty array for key: %s', fieldName); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for handling arrays of objects and arrays of primitives is very similar. Consider refactoring this logic into a separate function to reduce code duplication and improve maintainability. This function could take the array and a flag indicating whether it's an array of objects or primitives as input.
Suggestion: Refactor the array handling logic into a separate function.
} else if (value.length > 0 && typeof value[0] === 'object' && value[0] !== null) {
handleArrayOfObjects(value, fieldName, schema);
} else if (value.length > 0) {
handleArrayOfPrimitives(value, fieldName, schema);
} else {
// Empty array: Skip (consistent with original logic)
Logger.log('addFieldToSchema: Skipping empty array for key: %s', fieldName);
}
No description provided.