-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use assertion to ensure erroroffset return from pcre2_compile is valid #460
Conversation
I'm surprised to see that one of these assertions fails when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the output of the steps from CI could help you replicate locally, maybe didn't configure with --enable-debug
?
src/pcre2_compile.c
Outdated
@@ -11147,6 +11147,8 @@ an offset is available in the parsed pattern. */ | |||
ptr = pattern + cb.erroroffset; | |||
|
|||
HAD_EARLY_ERROR: | |||
PCRE2_ASSERT(ptr >= pattern); /* Ensure we don't return invalid erroroffset */ | |||
PCRE2_ASSERT(ptr < (pattern + patlen)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/</<=/
, since the NUL at the end of pattern is a perfectly valid erroroffset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carenas Thanks for mentioning that. I actually flip-flopped between <
and <=
when preparing this PR. However, I noticed that (if I have understood well) it seems that patterns which are passed to pcre2_compile
are not always NUL-terminated. (When the pattern is NUL-terminated, one can pass PCRE2_ZERO_TERMINATED
as the length
argument to pcre2_compile
.)
Is that correct? If so, should the assertion be predicated on length == PCRE2_ZERO_TERMINATED
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the assertion be predicated on
length == PCRE2_ZERO_TERMINATED
?
no; and it is unlikely to work as the length value is reset earlier AFAIK.
the documentation mentions:
Some errors are not detected until the whole pattern has been scanned; in these cases, the offset passed back is the length of the pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the assertion be predicated on
length == PCRE2_ZERO_TERMINATED
?no; and it is unlikely to work as the length value is reset earlier AFAIK.
I think I didn't express myself well. I was referring to the value of length
on function entry, not at the point where the assertion occurs.
the documentation mentions:
Some errors are not detected until the whole pattern has been scanned; in these cases, the offset passed back is the length of the pattern
Hmm. If that is so, then I guess pattern + erroroffset
is not guaranteed to be a valid pointer. Since PCRE2 users may presumably wish to use pattern + erroroffset
to display the part of the pattern which caused an error, this feels like a defect in the API to me.
I guess if users read the documentation, then they should know to check for the special value erroroffset == patlength
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways, regardless of the issues mentioned, it is apparent that the assertion should indeed use <=
and not <
. PR updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if users read the documentation, then they should know to check for the special value
erroroffset == patlength
.
sure hope they do, specially if trying to print the place where the pattern error was found, as they then would realize it might not be printable, if it happens to be in the middle of an UTF8 character, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the assertion be predicated on
length == PCRE2_ZERO_TERMINATED
?no; and it is unlikely to work as the length value is reset earlier AFAIK.
I think I didn't express myself well. I was referring to the value of
length
on function entry, not at the point where the assertion occurs.
What I meant was that there is no way for PCRE2 to know if a pattern is NUL terminated or not, PCRE2_ZERO_TERMINATED is an "optimization" so callers of this function can rely on PCRE2 doing strlen()
for them) if they don't have that value at hand.
For additional context, '\0' in patterns used to be impossible before PCRE2 and I suspect most patterns to be NUL terminated C strings regardless of what was provided in length
to pcre2_compile()
.
…in bounds When testing a patch for PCRE2, I found that due to a bug in my code, `pcre2_compile()` could return a totally invalid error offset. In case something similar ever happens again, I've added an assertion which will make it easier to notice the problem. It should be noted that the pcre2api manpage states: "Some errors are not detected until the whole pattern has been scanned; in these cases, the offset passed back is the length of the pattern." Since patterns are not always null-terminated, this means that `pattern + erroroffset` may sometimes point to uninitialized (or even unmapped) memory. However, it is still worthwhile to guard against other unexpected values being returned in `erroroffset`.
Note that this assertion is buggy, since |
@carenas Good point. That would be true if pattern rewriting is done, which is not the case in |
if being pedantic, you are correct that this assert isn't buggy yet in master, but obviously the comment was meant to address it within the context of this change being part of that series (as explained above) and which was also linked. I did it this way to avoid polluting the conversation in the other PR, not to make a false statement, but would follow whichever preferences you have to avoid further confusion. PS: FWIW talking about prefences I prefer not to be " |
Understood, I won't keep sending you "@" notifications then. This PR is not simply part of a series with the other PR; it is a worthwhile change on its own, which is why I submitted it separately. |
Please see added comment in #464 on why the assertion added in this PR is still valid, even if pattern rewriting optimizations are later merged into |
Glad to be wrong about needing to update This means though that ALL |
I don't believe that is the case, for the reasons explained in #464. However, it is good to be sure, so I will check this area over again. |
how can that work when offset is based on the rewritten pattern? |
The point is that if it is set to a value other than If there is any code in PCRE2 which currently sets We definitely do have code which sets |
I stand corrected, apologies for the confusion. |
When testing the new pattern rewriting phase for regex compilation using a fuzzer, I had a scary experience. Due to a bug in my pattern rewriting code, pcre2_compile() could return a totally invalid erroroffset. If a library user tried to do something with the erroroffset without checking it for validity, in the worst case, this had the potential to lead to an RCE vulnerability.
In case something similar ever happens again, I've added an assertion which will make it easier to notice the problem.