Skip to content

Conversation

@rsmmr
Copy link
Member

@rsmmr rsmmr commented Dec 17, 2024

This is still experimental and WIP. The idea is letting error recovery
directly jump forward to specific offsets in the input for cases where
fixed-size unit/fields allow us to compute where to continue.

  • Fix ParserBuilder::advanceInput().
  • Remove ParserBuilder::setInput().
  • Enable productions to provide the number of bytes they parse, if known.
  • Support synchronization points at fixed offsets, without pattern search.

@rsmmr rsmmr changed the title Extend synchronization points for fixed-size units/fields [WIP] Extend synchronization points to leverage fixed-size units/fields Dec 17, 2024
@rsmmr rsmmr force-pushed the topic/robin/sync-improvements branch from 680a106 to a13eb04 Compare December 20, 2024 12:23
@rsmmr rsmmr force-pushed the topic/robin/sync-improvements branch 2 times, most recently from 4a7b533 to f204027 Compare May 6, 2025 15:57
@rsmmr rsmmr force-pushed the topic/robin/sync-improvements branch 3 times, most recently from 6cbb6ba to 2c575b8 Compare May 14, 2025 12:30
rsmmr added 8 commits May 27, 2025 10:23
The implementation tried to automatically detect if it was passed a
view or something else. However, that check was unreliable because the
type of the argument could still be unresolved (i.e., `auto`). This
change removes that broken support for passing a view; seems that
wasn't actually used anywhere anyways (probably precisely because it
didn't work).
There was exactly one place using it, but there are many other places
not using it and instead just setting `state().cur` directly--which is
all that this method did as well.
Includes a fix for printing chains with gaps.
There was a bug causing new data to end up with wrong offsets after
trimming to offset larger than the end of the current chain. Now we
ensure that the chain's head offset cannot move beyond data we have
already seen.
For productions that parse a static amount of data, that size can now
be retrieved. This includes cases where the number is determined
through some expression evaluated at runtime, as long as parsing the
production is guaranteed to consume always exactly as many bytes as
the expression yields. The new functionality will be used, and tested,
in subsequent commits.

This also refactors the implementation of `skip` to make use of the
same, new machinery to determine the number of bytes a production
consumes.
…ern search.

If a synchronization point (i.e., a field with `&synchronize`) can be
determined to reside at a fixed offset from either the start of a unit
or a previous synchronization point, we can now leverage that after an
error to jump there directly to resume parsing.

Note that this may change semantics for existing units: If there's a
`&synchronized` that used to operate by token searching, but now
qualifies for offset-based synchronization, the latter is preferred.
We see this actually in a number of existing synchronization tests,
which we update to force token-based matching so that we continue to
test the same functionality.
@rsmmr rsmmr force-pushed the topic/robin/sync-improvements branch from 2c575b8 to 6f00d21 Compare May 27, 2025 14:52
This addresses a situation like this:

    type Chunks = unit {
        chunks: (Chunk &synchronize)[];
    };

    type Chunk = unit {
        content_size: bytes &until=b"\n" &convert=$$.to_uint();
        content: bytes &size=self.content_size;
    };

If an error happens while parsing `content` (e.g., a gap in the
input), the top-level `chunks` should be able to just move on the
subsequent chunk because it's clear where that starts (namely after
`contents_size` bytes).

Closes #1135.
@rsmmr rsmmr force-pushed the topic/robin/sync-improvements branch from 6f00d21 to 84f94f9 Compare May 27, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants