[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

rsmmr · 2024-12-17T08:18:29Z

This is still experimental and WIP. The idea is letting error recovery
directly jump forward to specific offsets in the input for cases where
fixed-size unit/fields allow us to compute where to continue.

Fix ParserBuilder::advanceInput().
Remove ParserBuilder::setInput().
Enable productions to provide the number of bytes they parse, if known.
Support synchronization points at fixed offsets, without pattern search.

The implementation tried to automatically detect if it was passed a view or something else. However, that check was unreliable because the type of the argument could still be unresolved (i.e., `auto`). This change removes that broken support for passing a view; seems that wasn't actually used anywhere anyways (probably precisely because it didn't work).

There was exactly one place using it, but there are many other places not using it and instead just setting `state().cur` directly--which is all that this method did as well.

Includes a fix for printing chains with gaps.

There was a bug causing new data to end up with wrong offsets after trimming to offset larger than the end of the current chain. Now we ensure that the chain's head offset cannot move beyond data we have already seen.

For productions that parse a static amount of data, that size can now be retrieved. This includes cases where the number is determined through some expression evaluated at runtime, as long as parsing the production is guaranteed to consume always exactly as many bytes as the expression yields. The new functionality will be used, and tested, in subsequent commits. This also refactors the implementation of `skip` to make use of the same, new machinery to determine the number of bytes a production consumes.

…ern search. If a synchronization point (i.e., a field with `&synchronize`) can be determined to reside at a fixed offset from either the start of a unit or a previous synchronization point, we can now leverage that after an error to jump there directly to resume parsing. Note that this may change semantics for existing units: If there's a `&synchronized` that used to operate by token searching, but now qualifies for offset-based synchronization, the latter is preferred. We see this actually in a number of existing synchronization tests, which we update to force token-based matching so that we continue to test the same functionality.

This addresses a situation like this: type Chunks = unit { chunks: (Chunk &synchronize)[]; }; type Chunk = unit { content_size: bytes &until=b"\n" &convert=$$.to_uint(); content: bytes &size=self.content_size; }; If an error happens while parsing `content` (e.g., a gap in the input), the top-level `chunks` should be able to just move on the subsequent chunk because it's clear where that starts (namely after `contents_size` bytes). Closes #1135.

rsmmr changed the title ~~Extend synchronization points for fixed-size units/fields~~ [WIP] Extend synchronization points to leverage fixed-size units/fields Dec 17, 2024

rsmmr force-pushed the topic/robin/sync-improvements branch from 680a106 to a13eb04 Compare December 20, 2024 12:23

rsmmr force-pushed the topic/robin/sync-improvements branch 2 times, most recently from 4a7b533 to f204027 Compare May 6, 2025 15:57

rsmmr force-pushed the topic/robin/sync-improvements branch 3 times, most recently from 6cbb6ba to 2c575b8 Compare May 14, 2025 12:30

rsmmr added 8 commits May 27, 2025 10:23

Remove ParserBuilder::setInput().

1bb2fa0

There was exactly one place using it, but there are many other places not using it and instead just setting `state().cur` directly--which is all that this method did as well.

Small refactor to add debugging output to stream::detail::Chain.

dbc89b7

Includes a fix for printing chains with gaps.

Fix stream trimming.

48afbe0

There was a bug causing new data to end up with wrong offsets after trimming to offset larger than the end of the current chain. Now we ensure that the chain's head offset cannot move beyond data we have already seen.

Support coercing an exception to any of its base types.

4704525

Factor out some duplicated synchronization logic.

b0ffde4

rsmmr force-pushed the topic/robin/sync-improvements branch from 2c575b8 to 6f00d21 Compare May 27, 2025 14:52

rsmmr force-pushed the topic/robin/sync-improvements branch from 6f00d21 to 84f94f9 Compare May 27, 2025 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

Uh oh!

rsmmr commented Dec 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

Are you sure you want to change the base?

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

Uh oh!

Conversation

rsmmr commented Dec 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants