Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use assertion to ensure erroroffset return from pcre2_compile is valid #460

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

alexdowad
Copy link
Contributor

When testing the new pattern rewriting phase for regex compilation using a fuzzer, I had a scary experience. Due to a bug in my pattern rewriting code, pcre2_compile() could return a totally invalid erroroffset. If a library user tried to do something with the erroroffset without checking it for validity, in the worst case, this had the potential to lead to an RCE vulnerability.

In case something similar ever happens again, I've added an assertion which will make it easier to notice the problem.

@alexdowad
Copy link
Contributor Author

I'm surprised to see that one of these assertions fails when RunTest executes in CI. I haven't been able to duplicate this locally yet (all the tests pass for me) and am still trying to figure out the reason.

Copy link
Contributor

@carenas carenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output of the steps from CI could help you replicate locally, maybe didn't configure with --enable-debug?

@@ -11147,6 +11147,8 @@ an offset is available in the parsed pattern. */
ptr = pattern + cb.erroroffset;

HAD_EARLY_ERROR:
PCRE2_ASSERT(ptr >= pattern); /* Ensure we don't return invalid erroroffset */
PCRE2_ASSERT(ptr < (pattern + patlen));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/</<=/, since the NUL at the end of pattern is a perfectly valid erroroffset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carenas Thanks for mentioning that. I actually flip-flopped between < and <= when preparing this PR. However, I noticed that (if I have understood well) it seems that patterns which are passed to pcre2_compile are not always NUL-terminated. (When the pattern is NUL-terminated, one can pass PCRE2_ZERO_TERMINATED as the length argument to pcre2_compile.)

Is that correct? If so, should the assertion be predicated on length == PCRE2_ZERO_TERMINATED?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the assertion be predicated on length == PCRE2_ZERO_TERMINATED?

no; and it is unlikely to work as the length value is reset earlier AFAIK.

the documentation mentions:

Some errors are not detected until the whole pattern has been scanned; in these cases, the offset passed back is the length of the pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the assertion be predicated on length == PCRE2_ZERO_TERMINATED?

no; and it is unlikely to work as the length value is reset earlier AFAIK.

I think I didn't express myself well. I was referring to the value of length on function entry, not at the point where the assertion occurs.

the documentation mentions:

Some errors are not detected until the whole pattern has been scanned; in these cases, the offset passed back is the length of the pattern

Hmm. If that is so, then I guess pattern + erroroffset is not guaranteed to be a valid pointer. Since PCRE2 users may presumably wish to use pattern + erroroffset to display the part of the pattern which caused an error, this feels like a defect in the API to me.

I guess if users read the documentation, then they should know to check for the special value erroroffset == patlength.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways, regardless of the issues mentioned, it is apparent that the assertion should indeed use <= and not <. PR updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if users read the documentation, then they should know to check for the special value erroroffset == patlength.

sure hope they do, specially if trying to print the place where the pattern error was found, as they then would realize it might not be printable, if it happens to be in the middle of an UTF8 character, for example.

Copy link
Contributor

@carenas carenas Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the assertion be predicated on length == PCRE2_ZERO_TERMINATED?

no; and it is unlikely to work as the length value is reset earlier AFAIK.

I think I didn't express myself well. I was referring to the value of length on function entry, not at the point where the assertion occurs.

What I meant was that there is no way for PCRE2 to know if a pattern is NUL terminated or not, PCRE2_ZERO_TERMINATED is an "optimization" so callers of this function can rely on PCRE2 doing strlen() for them) if they don't have that value at hand.

For additional context, '\0' in patterns used to be impossible before PCRE2 and I suspect most patterns to be NUL terminated C strings regardless of what was provided in length to pcre2_compile().

…in bounds

When testing a patch for PCRE2, I found that due to a bug in my code,
`pcre2_compile()` could return a totally invalid error offset. In case
something similar ever happens again, I've added an assertion which will
make it easier to notice the problem.

It should be noted that the pcre2api manpage states: "Some errors are
not detected until the whole pattern has been scanned; in these cases,
the offset passed back is the length of the pattern." Since patterns are
not always null-terminated, this means that `pattern + erroroffset` may
sometimes point to uninitialized (or even unmapped) memory. However,
it is still worthwhile to guard against other unexpected values being
returned in `erroroffset`.
@PhilipHazel PhilipHazel merged commit 317f339 into PCRE2Project:master Sep 4, 2024
11 checks passed
@alexdowad alexdowad deleted the guard branch September 4, 2024 11:41
@carenas
Copy link
Contributor

carenas commented Sep 8, 2024

Note that this assertion is buggy, since patlen was never updated when the pattern was rewritten, but more importantly, the offseterror from a rewritten pattern is invalid outside this function as it references an offset to a different pattern than the one that was provided.

@alexdowad
Copy link
Contributor Author

@carenas Good point. That would be true if pattern rewriting is done, which is not the case in master yet.

@carenas
Copy link
Contributor

carenas commented Sep 8, 2024

That would be true

if being pedantic, you are correct that this assert isn't buggy yet in master, but obviously the comment was meant to address it within the context of this change being part of that series (as explained above) and which was also linked.

I did it this way to avoid polluting the conversation in the other PR, not to make a false statement, but would follow whichever preferences you have to avoid further confusion.

PS: FWIW talking about prefences I prefer not to be "@" to avoid notification spam.

@alexdowad
Copy link
Contributor Author

Understood, I won't keep sending you "@" notifications then.

This PR is not simply part of a series with the other PR; it is a worthwhile change on its own, which is why I submitted it separately.

@alexdowad
Copy link
Contributor Author

Please see added comment in #464 on why the assertion added in this PR is still valid, even if pattern rewriting optimizations are later merged into master.

@carenas
Copy link
Contributor

carenas commented Sep 9, 2024

hy the assertion added in this PR is still valid

Glad to be wrong about needing to update patlen, since that was also a major sticking point as there is no way to communicate that back to the user by the current API.

This means though that ALL erroroffset (except for the few ones that were cached in parsed_pattern) are likely incorrect, eventhough you are right that they are also likely to be lower than patlen and therefore not trigger this assert.

@alexdowad
Copy link
Contributor Author

ALL erroroffset (except for the few ones that were cached in parsed_pattern) are likely incorrect

I don't believe that is the case, for the reasons explained in #464. However, it is good to be sure, so I will check this area over again.

@carenas
Copy link
Contributor

carenas commented Sep 9, 2024

don't believe that is the case

cb.erroroffset is set by all functions that participate in the compilation with code like:

if (bad)
  {
  *errorcodeptr = ERR
  cb->erroroffset = offset; /* or an expression and sometimes even a ptrdiff */
  return FALSE;
  }

how can that work when offset is based on the rewritten pattern?

@alexdowad
Copy link
Contributor Author

how can that work when offset is based on the rewritten pattern?

The point is that if it is set to a value other than patlen, erroroffset must be set to an offset within the pattern string, by code which is analyzing the pattern string. The pattern rewriter does not modify the pattern string. It only modifies the parsed_pattern vector. Hence, it has no effect on any code which analyzes the pattern string.

If there is any code in PCRE2 which currently sets erroroffset based on the offset of a parsed_pattern item within the parsed_pattern vector, that is very wrong and must be fixed. I don't believe that such code exists, but I would like to check and make sure.

We definitely do have code which sets erroroffset to 0, to patlen, and to PCRE2_UNSET. Setting it to PCRE2_UNSET seems wrong to me and I would like to look into that. However, it is a different issue.

@carenas
Copy link
Contributor

carenas commented Sep 9, 2024

I stand corrected, apologies for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants