- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
COPY (SELECT '2025-10-25T00:15:00Z'::timestamptz AS ts) TO 'ts.parquet';
CREATE EXTERNAL TABLE t STORED AS PARQUET LOCATION 'ts.parquet';
EXPLAIN
SELECT * FROM t WHERE ts = '1761630189642';Results in
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │    CoalesceBatchesExec    │ |
|               | │    --------------------   │ |
|               | │     target_batch_size:    │ |
|               | │            8192           │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │         FilterExec        │ |
|               | │    --------------------   │ |
|               | │         predicate:        │ |
|               | │ ts = CAST(1761630189642 AS│ |
|               | │    Timestamp(Nanosecond,  │ |
|               | │      Some("+00:00")))     │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │      RepartitionExec      │ |
|               | │    --------------------   │ |
|               | │ partition_count(in->out): │ |
|               | │          1 -> 12          │ |
|               | │                           │ |
|               | │    partitioning_scheme:   │ |
|               | │    RoundRobinBatch(12)    │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │          files: 1         │ |
|               | │      format: parquet      │ |
|               | │                           │ |
|               | │         predicate:        │ |
|               | │ ts = CAST(1761630189642 AS│ |
|               | │    Timestamp(Nanosecond,  │ |
|               | │      Some("+00:00")))     │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
Note that we keep the cast all the way down into the physical plan.
This makes it harder to catch and surface errors. For example if you remove the EXPLAIN on this query you get:
Arrow error: Parser error: Error parsing timestamp from '1761630189642': error parsing date
This is actually an error that happens inside of ParquetOpener(ArrowRowFilter(ParquetRowFilter(...))). If we applied casts to literals during logical plan optimization we would:
- Catch the error earlier and within DataFusion itself before we start calling into arrow-rsand it calls us back.
- Possibly be more efficient (I guess the cast gets evaluated for each batch?).
Describe the solution you'd like
Resolve literal casts during logical plan optimizations.
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request