Skip to content

Conversation

@max-sixty
Copy link
Member

chumsky 0.10 is a total rewrite from 0.9, and much of our code also needs rewriting if we're to upgrade

this creates a feature that uses chumsky 0.10 only for the lexer, as a way of making progress. claude code helped a lot, which benefits from smaller problems.

before merging, we would keep only the chumsky 0.10 code for the lexer (and possibly upgrade the parser, I'm not sure)

so far, it mostly works; only a couple of failing tests. I'm not sure how to get our "any number of odd quotes" lexing correctly, and there's a small issue with the end of input. I'll ask about the first over at chumsky

(this isn't the most important thing for PRQL, but I'm trying to get back into it, and wanted to spend some time trying a moderately difficult project with LLMs, so this seemed like a reasonable case. though also I thought it would be much easier! very open to feedback on whether there's a way to more incrementally make the changes, rather than basically rewriting from scratch. chumsky's type errors don't make it easy to play whack-a-mole)

max-sixty and others added 30 commits March 31, 2025 17:43
Add feature flag chumsky-10 that when active, makes the lexer use a different implementation.
For now, that implementation is a stub that throws an error, but this sets up the structure to allow
incrementally building the chumsky 0.10 lexer while keeping the 0.9 one working.
This commit completes Phase II of the chumsky 0.10 migration plan by:
1. Implementing a minimal lexer interface that compiles
2. Providing stub implementations for test compatibility
3. Setting up conditional test execution based on feature flags

Note that this is a minimal implementation that provides the API
structure but doesn't yet implement the actual lexer functionality.
Full implementation will be done in Phase III.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Create the basic combinator infrastructure that will be used in Phase III.
- Define Parser trait and essential combinators like map, then, etc.
- Set up basic parser combinators (just, any, end, etc.)
- Create token-specific combinators for lexing
- Maintain fallback imperative implementation to ensure tests pass
- Updated lexer implementation to work with Chumsky 0.10 API
- Modified token parsers to use Stream instead of raw strings
- Added proper test setup for the new Chumsky version
- Fixed issues with mapped values and error handling
- Implemented basic string parsing (more advanced features to come later)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@vanillajonathan
Copy link
Collaborator

Any update on this?

@max-sixty
Copy link
Member Author

Any update on this?

I found it really hard and got distracted by other things. I'd like to come back to it at some point; possibly when there are more docs for me & the LLMs to benefit from...

max-sixty and others added 18 commits September 25, 2025 12:27
Add inline snapshot tests showing lexer output alongside parser output
for changed error positions during Chumsky 0.10 migration:

- `interpolation_end`: Shows unclosed f-string error correctly reported at
  position 20 (end of input) by lexer
- Mississippi curly quotes: Shows lexer detecting both curly quotes at
  positions 22-23 and 35-36
- Converted three lexer tests to use inline `@` snapshots instead of
  external files

All tests pass with correct character offset error positions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add WASM-specific configuration for chumsky_0_10 to disable the
default `stacker` feature which causes compilation failures on
WASM targets (especially on macOS-14).

This mirrors the existing configuration for chumsky 0.9 and fixes
the "section too large" LLVM error when building for wasm32-unknown-unknown.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Delete chumsky_0_9.rs and all feature flags
- Remove chumsky-10 feature from Cargo.toml (now always enabled)
- Simplify test code to only use Chumsky 0.10
- Update ErrorSource to always use String for lexer errors
- All 101 parser tests passing

This completes the Chumsky 0.10 migration for the lexer.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Simple::new() was added in chumsky 0.10.1, not 0.10.0.
This fixes the minimum versions test failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Rename prqlc/prqlc-parser/src/lexer/chumsky_0_10.rs to lexer.rs for canonical naming
- Remove migration header comments from lexer.rs
- Update all module references from chumsky_0_10 to lexer
- Simplify mod.rs and test.rs to use canonical module name

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Remove "Chumsky 0.9 vs 0.10" comparison comments from lexer.rs
- Remove "works in both versions" comments from test.rs
- Remove "Note: Create snapshots without chumsky-10 feature" comments
- Simplify parser mod.rs comment about version usage
- Remove version reference from ErrorSource comment

All comments now describe current behavior, not migration history.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Use `use chumsky_0_10 as chumsky;` at the top of lexer files,
then use canonical `chumsky::` throughout the code instead of
`chumsky_0_10::`.

This makes the code read naturally while maintaining compatibility
with the parser that still uses chumsky 0.9.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Change "Chumsky 0.10's SimpleSpan" to just "SimpleSpan" since
we're describing current behavior, not comparing versions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Update integration test snapshots to reflect lexer changes:
- Error positions for unclosed strings now point to opening quote
- Token span positions adjusted for new lexer implementation

All snapshot changes are expected from the Chumsky 0.10 migration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Move chumsky and stacker dependencies to `cfg(not(target_family="wasm"))`
section to avoid PSM compilation issues on WASM targets.

This matches the pattern used on the main branch and prevents
"section too large" LLVM errors when building for wasm32.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Clippy complained about having `src/lexer/mod.rs` declare `mod lexer;`,
which creates the pattern `lexer::lexer`. Renamed to `lr_lexer` to fix.

Fixes clippy error:
  error: module has the same name as its containing module
   --> prqlc/prqlc-parser/src/lexer/mod.rs:1:1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
# Conflicts:
#	prqlc/prqlc-parser/Cargo.toml
Consolidates the lexer code by moving implementation from lr_lexer.rs
into the mod.rs file where it was originally located. This simplifies
the module structure and makes the code more canonical.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The debug::lex_debug function was just a thin wrapper around lex_source
with no added functionality. Tests now call lex_source directly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@max-sixty max-sixty enabled auto-merge (squash) October 7, 2025 23:18
@max-sixty max-sixty merged commit bb81cc9 into PRQL:main Oct 7, 2025
79 checks passed
@max-sixty max-sixty deleted the chumsky-10 branch October 7, 2025 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants