-
Notifications
You must be signed in to change notification settings - Fork 136
fix: Address issue with temporal fields and BigQuery causing MDL issues #1290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* **mdl:** correctly handle null date, datetime, and timestamp values during SQL generation
* **mdl:** ensure `None` variants of date/time scalars are converted to SQL `NULL` instead of panicking or producing an empty string
### Features
* **dialect(bigquery):** add comprehensive support for BigQuery temporal functions
* Implement `DATE_TRUNC`, `DATETIME_TRUNC`, `TIMESTAMP_TRUNC`, and `TIME_TRUNC`.
* Implement `DATE_DIFF`, `DATETIME_DIFF`, `TIMESTAMP_DIFF`, and `TIME_DIFF`.
* Implement `PARSE_DATE`, `PARSE_DATETIME`, and `PARSE_TIMESTAMP`.
* Implement `FORMAT_DATE`, `FORMAT_DATETIME`, and `FORMAT_TIMESTAMP`.
* Implement `CURRENT_DATE`, `CURRENT_DATETIME`, and `CURRENT_TIMESTAMP`.
* Implement `TIME` data type parity with `TIME_ADD`, `TIME_SUB`, and `TIME_TRUNC`.
* **dialect(bigquery):** enhance `EXTRACT` to support all date and time parts ()
* Add support for `HOUR`, `MINUTE`, `SECOND`, `MILLISECOND`, and `MICROSECOND`.
* Add support for timezone-aware extraction using `AT TIME ZONE`.
* **dialect(bigquery):** implement advanced `INTERVAL` arithmetic for date/time functions
* Add support for both `IntervalDayTime` and `IntervalYearMonth` scalar types.
* **dialect(bigquery):** add support for `GENERATE_DATE_ARRAY` for time-series scaffolding
* **mdl:** add comprehensive formatting for all date, datetime, and timestamp scalar values
### Refactor
* **mdl:** integrate `scalar_value_to_ast_value` into `WrenDialect` for robust literal value conversion
WalkthroughExpands BigQuery dialect SQL rewrites to cover many date/time functions and intervals, adds ScalarValue→AST conversion and a defensive lit(...) override that emits literals or inner expressions, extends supported DateTimeField variants, and adds sqllogictest coverage for the new behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Planner
participant BigQueryDialect
participant ASTBuilder
Planner->>BigQueryDialect: scalar_function_to_sql_overrides(name, args)
alt date_part with timezone
BigQueryDialect->>ASTBuilder: build EXTRACT(...), wrap AT TIME ZONE
else trunc/add/sub/diff
BigQueryDialect->>ASTBuilder: validate arity, parse interval if needed, build Function(UPPER(name), args...)
else parse/format/current/generate
BigQueryDialect->>ASTBuilder: validate arity, build Function(name_upper, args)
end
BigQueryDialect-->>Planner: Some(ast::Expr) or None
sequenceDiagram
participant Planner
participant WrenDialect
participant Utils
participant AST
Planner->>WrenDialect: scalar_function_to_sql_overrides("lit", [arg])
alt arg is Expr::Literal(ScalarValue)
WrenDialect->>Utils: scalar_value_to_ast_value(value)
Utils-->>WrenDialect: ast::Value
WrenDialect->>AST: ast::Expr::Value(...)
WrenDialect-->>Planner: Some(AST Expr)
else arg is Expr (non-literal)
WrenDialect->>AST: unparser.expr_to_sql(arg)
WrenDialect-->>Planner: Some(AST Expr)
else wrong arity
WrenDialect-->>Planner: Err("lit requires exactly 1 argument")
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
89-97: lit(...) override is correct for literals, but consider guarding misuse and non-literal argsToday we only rewrite when the first arg is a Literal; otherwise we fall back to Ok(None), which likely renders as
lit(<expr>)in SQL (invalid in all target dialects). Two suggestions:
- Enforce arity 1 for
litand error early if misused.- If
litis invoked with a non-literal (shouldn’t happen), consider passing the inner expression through instead of emittinglit(...).Apply this diff to handle both cases defensively:
// Add override for Literal _ => { - if let Some(Expr::Literal(value)) = args.get(0) { - if func_name == "lit" { - return Ok(Some(ast::Expr::Value(scalar_value_to_ast_value(value)))); - } - } + if func_name == "lit" { + match args.get(0) { + Some(Expr::Literal(value)) => { + return Ok(Some(ast::Expr::Value(scalar_value_to_ast_value(value)))); + } + Some(other) => { + // Fall back to the expression itself to avoid emitting `lit(...)` in SQL + return Ok(Some(unparser.expr_to_sql(other)?)); + } + None => { + return plan_err!("lit requires exactly 1 argument"); + } + } + } Ok(None) }wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
188-243: Intervals: unit selection may produce invalid BigQuery for DATE_ADD/SUB; consider mapping DayTime to DAY for DATE_, and disallow MONTH intervals for TIME_You currently convert IntervalDayTime to MILLISECOND for all *_add/_sub variants. In BigQuery:
- DATE_ADD/SUB only accept DAY/WEEK/MONTH/QUARTER/YEAR (no MILLISECOND).
- TIME_ADD/SUB don’t accept YEAR/MONTH (YearMonth intervals).
Without upstream type checks, this can generate invalid SQL. A light-touch improvement: use DAY for date_* with IntervalDayTime, and reject YearMonth for time_*.
Minimal change to map DATE_* + IntervalDayTime to DAY:
- Expr::Literal(ScalarValue::IntervalDayTime(Some(interval))) => { - let (days, ms) = (*interval >> 32, *interval as i32); - AstExpr::Value(Value::Interval(Interval { - value: Box::new(AstExpr::Value(Value::Number( - format!("{}", days * 24 * 3600 * 1000 + ms as i64), // Convert to ms - false, - ))), - leading_field: Some(DateTimeField::Millisecond), - leading_precision: None, - last_field: None, - fractional_seconds_precision: None, - })) - } + Expr::Literal(ScalarValue::IntervalDayTime(Some(interval))) => { + let (days, ms) = (*interval >> 32, *interval as i32); + let use_day_unit = matches!(function_name, "date_add" | "date_sub"); + let (value_str, unit) = if use_day_unit { + (format!("{}", days), DateTimeField::Day) + } else { + (format!("{}", days * 24 * 3600 * 1000 + ms as i64), DateTimeField::Millisecond) + }; + AstExpr::Value(Value::Interval(Interval { + value: Box::new(AstExpr::Value(Value::Number(value_str, false))), + leading_field: Some(unit), + leading_precision: None, + last_field: None, + fractional_seconds_precision: None, + })) + }Optionally, reject invalid unit combinations up-front:
- Return plan_err! if function_name starts with "time_" and interval is IntervalYearMonth(Some(_)).
- Return plan_err! if function_name starts with "date_" and IntervalDayTime has non-zero ms.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
wren-core/core/src/mdl/dialect/inner_dialect.rs(5 hunks)wren-core/core/src/mdl/dialect/wren_dialect.rs(3 hunks)wren-core/core/src/mdl/utils.rs(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
wren-core/core/src/mdl/dialect/wren_dialect.rs (3)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
get_inner_dialect(71-78)wren-core/core/src/mdl/utils.rs (1)
scalar_value_to_ast_value(263-319)wren-core/core/src/mdl/dialect/utils.rs (1)
args(30-48)
wren-core/core/src/mdl/dialect/inner_dialect.rs (3)
wren-core/core/src/mdl/dialect/wren_dialect.rs (3)
args(137-140)args(176-191)new(130-134)wren-core/core/src/mdl/dialect/utils.rs (1)
args(30-48)wren-core/core/src/logical_plan/analyze/access_control.rs (1)
expr_to_sql(655-658)
🔇 Additional comments (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
401-406: WEEK() parsing looks fine; ensure sqlparser supports Week(Some(Ident)) round-tripThe AST variant DateTimeField::Week(Some(Ident)) must be supported by the unparser; otherwise the emitted SQL may be malformed. If the unparser doesn’t render the weekday (or requires a different node), consider adding coverage tests.
Bug Fixes * dialect(bigquery): correct _DIFF function argument order to (start, end, part) to match dialect expectations * dialect(bigquery): fix invalid SQL generation by applying AT TIME ZONE to the timestamp argument inside EXTRACT instead of wrapping the function call * dialect(bigquery): improve interval handling for DATE_ADD/SUB to prevent generating invalid units for DATE and TIME types * mdl: prevent panics by replacing .unwrap() with fallible conversions in scalar_value_to_ast_value * mdl: correct Date32 epoch calculation to use 1970-01-01 as the base * test: created BigQuery-specific feature test suite. Mostly focuses on temporal items with some coverage for column aliases, structs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
wren-core/core/src/mdl/utils.rs (1)
279-288: Kudos: Date32 epoch fix and removal of unwrapsYou corrected the Date32 epoch bug (now offset from 1970-01-01 with checked_add_signed) and replaced all chrono unwraps with fallible conversions returning SQL NULL on overflow. This addresses earlier concerns.
🧹 Nitpick comments (4)
wren-core/core/src/mdl/utils.rs (2)
279-299: Timezone info in ScalarValue::Timestamp is ignored; consider honoring tz when present*The Timestamp* arms discard the timezone parameter. If tz is Some(...), the formatted value may not reflect the intended local time. Depending on dialect behavior, you may want to either:
- Normalize to UTC and append ‘Z’ or
- Convert to the provided time zone before formatting, or
- Emit a typed literal or an explicit CAST/AT TIME ZONE expression.
If desired, I can propose a minimal enhancement that conditionally wraps the formatted string when tz.is_some().
Also applies to: 300-311
263-327: Add unit/sqllogictest coverage for non-temporal None casesGiven the earlier bug, please add tests asserting that Int32(None), Boolean(None), Utf8(None), Decimal(None, …) etc. map to SQL NULL (not 'NULL'). This will prevent regressions.
I can draft targeted sqllogictest cases (e.g., SELECT lit_int_null(), lit_bool_null()) or a Rust unit test invoking scalar_value_to_ast_value for a sample of None variants. Would you like me to open a follow-up PR with tests?
wren-core/sqllogictest/test_files/bigquery_features.slt (2)
7-90: Solid temporal coverage; suggest adding None/nullability cases and higher-precision timestampsGreat coverage of CAST(NULL AS DATE), trunc/add/diff/parse/format/current/generate, and timezone extraction. To lock in the MDL bugfix and new formatting:
- Add cases asserting that typed-None values render as SQL NULL (e.g., SELECT lit(int_null()), lit(bool_null()), lit(string_null())).
- Add TIMESTAMP_MICRO/NANO formatting coverage to ensure fractional seconds are preserved.
If you share how to encode typed-None literals via sqllogictest in this repo, I’ll propose concrete test lines.
106-125: Array/Struct: good breadth; consider adding UNNEST with STRUCT arrays and WITH OFFSETYou’ve covered make_array, named_struct with field access, and UNNEST on a primitive array. Consider adding:
- UNNEST over array of STRUCTs with field projection
- UNNEST ... WITH OFFSET to ensure parser/unparser handles offsets
I can draft a couple of minimal lines if helpful.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
wren-core/core/src/mdl/dialect/inner_dialect.rs(5 hunks)wren-core/core/src/mdl/utils.rs(3 hunks)wren-core/sqllogictest/test_files/bigquery_features.slt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-core/core/src/mdl/dialect/inner_dialect.rs
🔇 Additional comments (3)
wren-core/core/src/mdl/utils.rs (1)
72-72: Micro-optimization/readability: char-based split is fineUsing .split('.') avoids compiling a &str pattern and is idiomatic. No changes requested.
wren-core/sqllogictest/test_files/bigquery_features.slt (2)
15-20: lit() override behavior looks correctThe rewrite to raw literals for numeric and string args is exercised and matches expectation.
95-101: Alias sanitization: nice regression guardThe mapping of special-character aliases to a sanitized identifier is verified. Good to have this locked down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
88-104: Consider case-insensitive match if upstream normalizes inconsistentlyIf
func_nameisn’t guaranteed lowercased by all call sites,eq_ignore_ascii_case("lit")prevents surprises. If normalization is guaranteed, ignore this.- if func_name == "lit" { + if func_name.eq_ignore_ascii_case("lit") {wren-core/sqllogictest/test_files/bigquery_features.slt (4)
21-27: Make DATE_DIFF reordering assertion meaningfulThe comment says you’re validating (start, end, part) reordering, but the input already matches the expected ordering. Flip the input args so the test proves the rewriter reorders them.
-SELECT DATE_DIFF(CAST('2025-01-15' AS DATE), CAST('2025-01-01' AS DATE), DAY) +SELECT DATE_DIFF(CAST('2025-01-01' AS DATE), CAST('2025-01-15' AS DATE), DAY) ---- SELECT DATE_DIFF(CAST('2025-01-15' AS DATE), CAST('2025-01-01' AS DATE), DAY)
14-20: Add a pass-through test for non-literal lit() argumentsThis exercises the “emit the underlying expression’s SQL” branch (e.g.,
lit(1 + 2)→1 + 2) and guards against regressions.SELECT lit(123), lit('abc') ---- SELECT 123, 'abc' + +-- Test: Validate lit() pass-through for non-literal expression +statement ok +SELECT lit(1 + 2) + +---- +SELECT 1 + 2
110-116: Broaden NULL coverage for lit() to temporal typesGiven the PR’s focus on NULL temporal handling, add casts for DATE/TIMESTAMP to ensure they normalize to bare NULL as intended.
SELECT lit(CAST(NULL AS INT)), lit(CAST(NULL AS BOOLEAN)), lit(CAST(NULL AS VARCHAR)) ---- SELECT NULL, NULL, NULL + +-- Test: Validate NULL handling for temporal types via lit() +statement ok +SELECT lit(CAST(NULL AS DATE)), lit(CAST(NULL AS TIMESTAMP)) + +---- +SELECT NULL, NULL
1-165: Add negative-arity tests for lit()With the stricter arity check, it’s worth asserting both too-few and too-many args produce an error.
+-- Test: lit() with no args should error +statement error +SELECT lit() + +-- Test: lit() with too many args should error +statement error +SELECT lit(1, 2)If your sqllogictest harness requires matching error messages, we can add regex assertions accordingly. Do you want me to tailor these to your runner?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
wren-core/core/src/mdl/dialect/wren_dialect.rs(3 hunks)wren-core/core/src/mdl/utils.rs(3 hunks)wren-core/sqllogictest/test_files/bigquery_features.slt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-core/core/src/mdl/utils.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
wren-core/core/src/mdl/dialect/wren_dialect.rs (2)
wren-core/core/src/mdl/utils.rs (1)
scalar_value_to_ast_value(263-356)wren-core/core/src/mdl/dialect/utils.rs (1)
args(30-48)
🔇 Additional comments (1)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
21-21: Good call: reuse centralized ScalarValue→AST conversionImporting and using
scalar_value_to_ast_valuekeeps literal rendering consistent across dialects and fixes NULL-temporal handling without duplicating logic.
|
Thanks @cougrimes for working on this. There are some building failures. Could you check them? |
Nonevariants of date/time scalars are converted to SQLNULLinstead of panicking or producing an empty stringFeatures
DATE_TRUNC,DATETIME_TRUNC,TIMESTAMP_TRUNC, andTIME_TRUNC.DATE_DIFF,DATETIME_DIFF,TIMESTAMP_DIFF, andTIME_DIFF.PARSE_DATE,PARSE_DATETIME, andPARSE_TIMESTAMP.FORMAT_DATE,FORMAT_DATETIME, andFORMAT_TIMESTAMP.CURRENT_DATE,CURRENT_DATETIME, andCURRENT_TIMESTAMP.TIMEdata type parity withTIME_ADD,TIME_SUB, andTIME_TRUNC.EXTRACTto support all date and time parts ()HOUR,MINUTE,SECOND,MILLISECOND, andMICROSECOND.AT TIME ZONE.INTERVALarithmetic for date/time functionsIntervalDayTimeandIntervalYearMonthscalar types.GENERATE_DATE_ARRAYfor time-series scaffoldingRefactor
scalar_value_to_ast_valueintoWrenDialectfor robust literal value conversionSummary by CodeRabbit