internal: upgrade chumsky to 0.10 in lexer #5223

max-sixty · 2025-04-02T17:42:41Z

chumsky 0.10 is a total rewrite from 0.9, and much of our code also needs rewriting if we're to upgrade

this creates a feature that uses chumsky 0.10 only for the lexer, as a way of making progress. claude code helped a lot, which benefits from smaller problems.

before merging, we would keep only the chumsky 0.10 code for the lexer (and possibly upgrade the parser, I'm not sure)

so far, it mostly works; only a couple of failing tests. I'm not sure how to get our "any number of odd quotes" lexing correctly, and there's a small issue with the end of input. I'll ask about the first over at chumsky

(this isn't the most important thing for PRQL, but I'm trying to get back into it, and wanted to spend some time trying a moderately difficult project with LLMs, so this seemed like a reasonable case. though also I thought it would be much easier! very open to feedback on whether there's a way to more incrementally make the changes, rather than basically rewriting from scratch. chumsky's type errors don't make it easy to play whack-a-mole)

Add feature flag chumsky-10 that when active, makes the lexer use a different implementation. For now, that implementation is a stub that throws an error, but this sets up the structure to allow incrementally building the chumsky 0.10 lexer while keeping the 0.9 one working.

This commit completes Phase II of the chumsky 0.10 migration plan by: 1. Implementing a minimal lexer interface that compiles 2. Providing stub implementations for test compatibility 3. Setting up conditional test execution based on feature flags Note that this is a minimal implementation that provides the API structure but doesn't yet implement the actual lexer functionality. Full implementation will be done in Phase III. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Create the basic combinator infrastructure that will be used in Phase III. - Define Parser trait and essential combinators like map, then, etc. - Set up basic parser combinators (just, any, end, etc.) - Create token-specific combinators for lexing - Maintain fallback imperative implementation to ensure tests pass

- Updated lexer implementation to work with Chumsky 0.10 API - Modified token parsers to use Stream instead of raw strings - Added proper test setup for the new Chumsky version - Fixed issues with mapped values and error handling - Implemented basic string parsing (more advanced features to come later) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

vanillajonathan · 2025-07-01T23:52:19Z

Any update on this?

max-sixty · 2025-07-02T02:16:22Z

Any update on this?

I found it really hard and got distracted by other things. I'd like to come back to it at some point; possibly when there are more docs for me & the LLMs to benefit from...

Co-authored-by: Claude <[email protected]>

Add inline snapshot tests showing lexer output alongside parser output for changed error positions during Chumsky 0.10 migration: - `interpolation_end`: Shows unclosed f-string error correctly reported at position 20 (end of input) by lexer - Mississippi curly quotes: Shows lexer detecting both curly quotes at positions 22-23 and 35-36 - Converted three lexer tests to use inline `@` snapshots instead of external files All tests pass with correct character offset error positions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Add WASM-specific configuration for chumsky_0_10 to disable the default `stacker` feature which causes compilation failures on WASM targets (especially on macOS-14). This mirrors the existing configuration for chumsky 0.9 and fixes the "section too large" LLVM error when building for wasm32-unknown-unknown. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Delete chumsky_0_9.rs and all feature flags - Remove chumsky-10 feature from Cargo.toml (now always enabled) - Simplify test code to only use Chumsky 0.10 - Update ErrorSource to always use String for lexer errors - All 101 parser tests passing This completes the Chumsky 0.10 migration for the lexer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Simple::new() was added in chumsky 0.10.1, not 0.10.0. This fixes the minimum versions test failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Rename prqlc/prqlc-parser/src/lexer/chumsky_0_10.rs to lexer.rs for canonical naming - Remove migration header comments from lexer.rs - Update all module references from chumsky_0_10 to lexer - Simplify mod.rs and test.rs to use canonical module name 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Remove "Chumsky 0.9 vs 0.10" comparison comments from lexer.rs - Remove "works in both versions" comments from test.rs - Remove "Note: Create snapshots without chumsky-10 feature" comments - Simplify parser mod.rs comment about version usage - Remove version reference from ErrorSource comment All comments now describe current behavior, not migration history. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Use `use chumsky_0_10 as chumsky;` at the top of lexer files, then use canonical `chumsky::` throughout the code instead of `chumsky_0_10::`. This makes the code read naturally while maintaining compatibility with the parser that still uses chumsky 0.9. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Change "Chumsky 0.10's SimpleSpan" to just "SimpleSpan" since we're describing current behavior, not comparing versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Update integration test snapshots to reflect lexer changes: - Error positions for unclosed strings now point to opening quote - Token span positions adjusted for new lexer implementation All snapshot changes are expected from the Chumsky 0.10 migration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Move chumsky and stacker dependencies to `cfg(not(target_family="wasm"))` section to avoid PSM compilation issues on WASM targets. This matches the pattern used on the main branch and prevents "section too large" LLVM errors when building for wasm32. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Clippy complained about having `src/lexer/mod.rs` declare `mod lexer;`, which creates the pattern `lexer::lexer`. Renamed to `lr_lexer` to fix. Fixes clippy error: error: module has the same name as its containing module --> prqlc/prqlc-parser/src/lexer/mod.rs:1:1 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

# Conflicts: # prqlc/prqlc-parser/Cargo.toml

Consolidates the lexer code by moving implementation from lr_lexer.rs into the mod.rs file where it was originally located. This simplifies the module structure and makes the code more canonical. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

The debug::lex_debug function was just a thin wrapper around lex_source with no added functionality. Tests now call lex_source directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

for more information, see https://pre-commit.ci

max-sixty and others added 30 commits March 31, 2025 17:43

stubs

7545809

compiles successfully

ba0e839

wip, currently big func we need to split up

3c1a23f

split up a bit

2559658

instructions for running tests

c322be2

midway through

9792ed9

remove chumsky 10 test

9bc69e9

start using actual combinators

d9a8cda

.

52fe1ac

.

a64ce5f

fix errors outside chumsky_0_10.rs

b13dc07

getting there

4f9c9af

Merge branch 'chumsky-10' into chumsky-orig

b43161b

more progress

284eb8d

tests pass on old chumsky

b67a2ec

Merge branch 'chumsky-orig' into chumsky-10

e556e2b

.

c022613

--check

171bc41

better span handling

51510e1

.

600d6a2

don't accept resuts with feature enabled

a149785

.

94cc596

pretty much working now!

f0a35fe

update instructions

eeaf8fe

remove final conditional compilation

7d2165a

possibly better annotations

eed2705

max-sixty added 5 commits April 3, 2025 15:10

Merge branch 'main' into chumsky-10

aeb2a55

Merge branch 'main' into chumsky-10

ccd00ee

wip quotes

ad5db91

commit the breaking string issue

6545890

Merge branch 'main' into chumsky-10

5018a5a

max-sixty mentioned this pull request Apr 4, 2025

replacing then_with in 0.10 zesterer/chumsky#750

Open

c7e450a

max-sixty and others added 18 commits September 25, 2025 12:27

Merge branch 'main' into chumsky-10

683b0b5

chore: Update Cargo.lock for chumsky 0.10.1 upgrade

e36c8a9

Co-authored-by: Claude <[email protected]>

Merge branch 'main' into chumsky-10

05b2450

fix: Bump chumsky_0_10 to 0.10.1 for Simple::new() API

2790848

Simple::new() was added in chumsky 0.10.1, not 0.10.0. This fixes the minimum versions test failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Merge branch 'main' into chumsky-10

462306b

# Conflicts: # prqlc/prqlc-parser/Cargo.toml

[pre-commit.ci] auto fixes from pre-commit.com hooks

2390df2

for more information, see https://pre-commit.ci

max-sixty enabled auto-merge (squash) October 7, 2025 23:18

max-sixty merged commit bb81cc9 into PRQL:main Oct 7, 2025
79 checks passed

max-sixty deleted the chumsky-10 branch October 7, 2025 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

internal: upgrade chumsky to 0.10 in lexer #5223

internal: upgrade chumsky to 0.10 in lexer #5223

Uh oh!

max-sixty commented Apr 2, 2025

Uh oh!

vanillajonathan commented Jul 1, 2025

Uh oh!

max-sixty commented Jul 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

internal: upgrade chumsky to 0.10 in lexer #5223

internal: upgrade chumsky to 0.10 in lexer #5223

Uh oh!

Conversation

max-sixty commented Apr 2, 2025

Uh oh!

vanillajonathan commented Jul 1, 2025

Uh oh!

max-sixty commented Jul 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants