- 
                Notifications
    
You must be signed in to change notification settings  - Fork 328
 
fix: SQL parser support STRUCT type with angle brackets and colon #5449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Added support for parsing STRUCT<field: type> syntax with angle brackets and colons in SQL type definitions. The implementation validates colon usage and explicitly rejects the STRUCT(...) parentheses syntax with a helpful error message.
Key changes:
- Preprocessing step tokenizes input to detect STRUCT syntax before passing to sqlparser
 - Validates that colons appear consistently (either 0 or exactly one per field)
 - Replaces colons with spaces so sqlparser can parse the modified syntax
 - Added tests for both 
STRUCT<a bool, b int>andSTRUCT<a: bool, b: int>syntaxes 
Issues found:
- Critical: Line 75 uses global string replacement (
s.replace(':', " ")) which affects ALL colons in the input, not just those within the STRUCT definition. This will corrupt inputs containing colons outside the STRUCT context. - The validation at lines 112-115 doesn't prevent keywords like 
STRUCTfrom appearing before a colon, which could allow invalid nested struct syntax to pass validation. 
Confidence Score: 2/5
- This PR has a critical bug that will corrupt inputs with colons outside STRUCT definitions
 - While the PR successfully addresses the original issue of supporting STRUCT syntax with angle brackets and colons, it contains a critical logic error on line 75 where 
s.replace(':', " ")performs a global replacement of all colons in the entire input string rather than only those within the STRUCT definition. This will break any SQL type strings containing colons elsewhere (e.g., nested contexts, or if combined with other syntax). The validation logic also has gaps for nested struct edge cases. - src/daft-sql/src/schema.rs requires immediate attention to fix the global colon replacement bug
 
Important Files Changed
File Analysis
| Filename | Score | Overview | 
|---|---|---|
| src/daft-sql/src/schema.rs | 3/5 | Added STRUCT type parsing with angle brackets and colon syntax, validates colon usage, rejects parentheses syntax; found critical issue with global colon replacement | 
Sequence Diagram
sequenceDiagram
    participant User
    participant try_parse_dtype
    participant check_and_modify_struct
    participant validate_colons
    participant Tokenizer
    participant Parser
    participant sql_dtype_to_dtype
    User->>try_parse_dtype: "STRUCT<a: INT, b: STRING>"
    try_parse_dtype->>check_and_modify_struct: Check for STRUCT syntax
    check_and_modify_struct->>Tokenizer: Tokenize input
    Tokenizer-->>check_and_modify_struct: Token list
    
    alt STRUCT with parentheses
        check_and_modify_struct-->>try_parse_dtype: Error: Use angle brackets
    else STRUCT with angle brackets
        check_and_modify_struct->>check_and_modify_struct: Find matching brackets (depth tracking)
        check_and_modify_struct->>validate_colons: Validate inner tokens
        validate_colons->>validate_colons: Count colons and commas
        validate_colons->>validate_colons: Verify colon positions
        validate_colons-->>check_and_modify_struct: OK or Error
        check_and_modify_struct-->>try_parse_dtype: Modified string (colons replaced with spaces)
    else No STRUCT keyword
        check_and_modify_struct-->>try_parse_dtype: None (no modification)
    end
    
    try_parse_dtype->>Tokenizer: Tokenize final string
    Tokenizer-->>try_parse_dtype: Tokens
    try_parse_dtype->>Parser: Parse with sqlparser
    Parser-->>try_parse_dtype: DataType AST
    try_parse_dtype->>sql_dtype_to_dtype: Convert to Daft DataType
    sql_dtype_to_dtype-->>try_parse_dtype: DataType::Struct(fields)
    try_parse_dtype-->>User: Parsed DataType
    1 file reviewed, 2 comments
          Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##             main    #5449      +/-   ##
==========================================
+ Coverage   71.58%   71.74%   +0.16%     
==========================================
  Files         993      998       +5     
  Lines      125932   126956    +1024     
==========================================
+ Hits        90145    91085     +940     
- Misses      35787    35871      +84     
 🚀 New features to boost your workflow:
  | 
    
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
This reverts commit e55a0ba.
Changes Made
Related Issues
#4448
Checklist
docs/mkdocs.ymlnavigation