-
Notifications
You must be signed in to change notification settings - Fork 243
internal: upgrade chumsky to 0.10 in lexer #5223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add feature flag chumsky-10 that when active, makes the lexer use a different implementation. For now, that implementation is a stub that throws an error, but this sets up the structure to allow incrementally building the chumsky 0.10 lexer while keeping the 0.9 one working.
This commit completes Phase II of the chumsky 0.10 migration plan by: 1. Implementing a minimal lexer interface that compiles 2. Providing stub implementations for test compatibility 3. Setting up conditional test execution based on feature flags Note that this is a minimal implementation that provides the API structure but doesn't yet implement the actual lexer functionality. Full implementation will be done in Phase III. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Create the basic combinator infrastructure that will be used in Phase III. - Define Parser trait and essential combinators like map, then, etc. - Set up basic parser combinators (just, any, end, etc.) - Create token-specific combinators for lexing - Maintain fallback imperative implementation to ensure tests pass
- Updated lexer implementation to work with Chumsky 0.10 API - Modified token parsers to use Stream instead of raw strings - Added proper test setup for the new Chumsky version - Fixed issues with mapped values and error handling - Implemented basic string parsing (more advanced features to come later) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Collaborator
|
Any update on this? |
Member
Author
I found it really hard and got distracted by other things. I'd like to come back to it at some point; possibly when there are more docs for me & the LLMs to benefit from... |
Co-authored-by: Claude <[email protected]>
Add inline snapshot tests showing lexer output alongside parser output for changed error positions during Chumsky 0.10 migration: - `interpolation_end`: Shows unclosed f-string error correctly reported at position 20 (end of input) by lexer - Mississippi curly quotes: Shows lexer detecting both curly quotes at positions 22-23 and 35-36 - Converted three lexer tests to use inline `@` snapshots instead of external files All tests pass with correct character offset error positions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add WASM-specific configuration for chumsky_0_10 to disable the default `stacker` feature which causes compilation failures on WASM targets (especially on macOS-14). This mirrors the existing configuration for chumsky 0.9 and fixes the "section too large" LLVM error when building for wasm32-unknown-unknown. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Delete chumsky_0_9.rs and all feature flags - Remove chumsky-10 feature from Cargo.toml (now always enabled) - Simplify test code to only use Chumsky 0.10 - Update ErrorSource to always use String for lexer errors - All 101 parser tests passing This completes the Chumsky 0.10 migration for the lexer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Simple::new() was added in chumsky 0.10.1, not 0.10.0. This fixes the minimum versions test failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Rename prqlc/prqlc-parser/src/lexer/chumsky_0_10.rs to lexer.rs for canonical naming - Remove migration header comments from lexer.rs - Update all module references from chumsky_0_10 to lexer - Simplify mod.rs and test.rs to use canonical module name 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Remove "Chumsky 0.9 vs 0.10" comparison comments from lexer.rs - Remove "works in both versions" comments from test.rs - Remove "Note: Create snapshots without chumsky-10 feature" comments - Simplify parser mod.rs comment about version usage - Remove version reference from ErrorSource comment All comments now describe current behavior, not migration history. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Use `use chumsky_0_10 as chumsky;` at the top of lexer files, then use canonical `chumsky::` throughout the code instead of `chumsky_0_10::`. This makes the code read naturally while maintaining compatibility with the parser that still uses chumsky 0.9. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Change "Chumsky 0.10's SimpleSpan" to just "SimpleSpan" since we're describing current behavior, not comparing versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Update integration test snapshots to reflect lexer changes: - Error positions for unclosed strings now point to opening quote - Token span positions adjusted for new lexer implementation All snapshot changes are expected from the Chumsky 0.10 migration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Move chumsky and stacker dependencies to `cfg(not(target_family="wasm"))` section to avoid PSM compilation issues on WASM targets. This matches the pattern used on the main branch and prevents "section too large" LLVM errors when building for wasm32. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Clippy complained about having `src/lexer/mod.rs` declare `mod lexer;`, which creates the pattern `lexer::lexer`. Renamed to `lr_lexer` to fix. Fixes clippy error: error: module has the same name as its containing module --> prqlc/prqlc-parser/src/lexer/mod.rs:1:1 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
# Conflicts: # prqlc/prqlc-parser/Cargo.toml
Consolidates the lexer code by moving implementation from lr_lexer.rs into the mod.rs file where it was originally located. This simplifies the module structure and makes the code more canonical. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The debug::lex_debug function was just a thin wrapper around lex_source with no added functionality. Tests now call lex_source directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
for more information, see https://pre-commit.ci
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
chumsky 0.10 is a total rewrite from 0.9, and much of our code also needs rewriting if we're to upgrade
this creates a feature that uses chumsky 0.10 only for the lexer, as a way of making progress. claude code helped a lot, which benefits from smaller problems.
before merging, we would keep only the chumsky 0.10 code for the lexer (and possibly upgrade the parser, I'm not sure)
so far, it mostly works; only a couple of failing tests. I'm not sure how to get our "any number of odd quotes" lexing correctly, and there's a small issue with the end of input. I'll ask about the first over at chumsky
(this isn't the most important thing for PRQL, but I'm trying to get back into it, and wanted to spend some time trying a moderately difficult project with LLMs, so this seemed like a reasonable case. though also I thought it would be much easier! very open to feedback on whether there's a way to more incrementally make the changes, rather than basically rewriting from scratch. chumsky's type errors don't make it easy to play whack-a-mole)