Skip to content

Conversation

@dag-erling
Copy link
Collaborator

The effective limit on regular expression length today is slightly less than half the stack size, which is hardcoded to 10,240 entries. Due to a lack of input validation and persistent use of signed types, values much larger than that will cause TRE to overflow and either crash or misbehave instead of returning REG_ESPACE.

This pull request increases the maximum stack size and sets a hard limit on regular expression length so we always return REG_ESPACE long before we overflow.

Note that this does not address matching, where TRE will misbehave if given a string longer than INT_MAX. Fixing that will require significantly more work than fixing compilation did.

* Grow the stack exponentially instead of linearly.

* Use size_t instead of int for sizes.

* Rename stack_num_objects() to stack_num_items() to reduce confusion
  between stack objects and objects on the stack.
@dag-erling dag-erling requested a review from laurikari December 19, 2025 20:50
@dag-erling dag-erling self-assigned this Dec 19, 2025
@dag-erling dag-erling added the bug label Dec 19, 2025
@laurikari
Copy link
Owner

laurikari commented Dec 20, 2025

This is a nice improvement!

Looks like tre_regwncomp doesn't enforce TRE_MAX_RE. I would add the same check there.

Also, tre_regcomp and tre_regwcomp now call strlen (or wcslen) unconditionally. Previously, NULL was treated as length 0. Now I'm not sure if anyone relies on this, but it is a change in behavior... I would keep the old guard (pun intended).

The effective limit on regular expression length today is slightly less
than half the stack size, which is hardcoded to 10,240 entries.  Due to
a lack of input validation and persistent use of signed types, values
much larger than that will cause TRE to overflow and either crash or
misbehave instead of returning REG_ESPACE.

* Define TRE_MAX_STACK to 1048576 and use that as our stack size
  limit.

* Define TRE_MAX_RE to 65536 and refuse to compile a regular expression
  longer than that (in characters, not bytes).

* Adjust the tests to handle the increased limit.

* Since we make an effort to not crash when asked to compile a null
  regular expression, add tests to verify that we don't.
@dag-erling
Copy link
Collaborator Author

Fixed

@laurikari laurikari merged commit 7df6b4f into laurikari:master Dec 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants