Skip to content

Commit

Permalink
Fix pcre2_match() bug when a condition was a variable-length lookbehi…
Browse files Browse the repository at this point in the history
…nd that matched over the current position.
  • Loading branch information
PhilipHazel committed Aug 20, 2024
1 parent 1415565 commit ead0828
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 5 deletions.
6 changes: 6 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,12 @@ that checks for such a lookbehind; it was looking only at the first branch,
which is wrong because some branches can be fixed length when others are not,
for example (?<=AB|CD?). Now all branches are checked for variability.

11. Matching with pcre2_match() could give an incorrect result if a
variable-length lookbehind was used as the condition in a conditional group.
The condition could erroneously be treated as true if a branch matched but
overran the current position. This bug was in the interpreter only; matching
with JIT was correct.


Version 10.44 07-June-2024
--------------------------
Expand Down
16 changes: 11 additions & 5 deletions src/pcre2_match.c
Original file line number Diff line number Diff line change
Expand Up @@ -5931,14 +5931,20 @@ fprintf(stderr, "++ %2ld op=%3d %s\n", Fecode - mb->start_code, *Fecode,
(char *)P->eptr - (char *)mb->start_subject);
#endif

/* If we are at the end of an assertion that is a condition, return a
match, discarding any intermediate backtracking points. Copy back the
mark setting and the captures into the frame before N so that they are
set on return. Doing this for all assertions, both positive and negative,
seems to match what Perl does. */
/* If we are at the end of an assertion that is a condition, first check
to see if we are at the end of a variable-length branch in a lookbehind.
If this is the case and we have not landed on the current character,
return no match. Compare code below for non-condition lookbehinds. In
other cases, return a match, discarding any intermediate backtracking
points. Copy back the mark setting and the captures into the frame before
N so that they are set on return. Doing this for all assertions, both
positive and negative, seems to match what Perl does. */

if (GF_IDMASK(N->group_frame_type) == GF_CONDASSERT)
{
if ((*bracode == OP_ASSERTBACK || *bracode == OP_ASSERTBACK_NOT) &&
branch_start[1 + LINK_SIZE] == OP_VREVERSE && Feptr != P->eptr)
RRETURN(MATCH_NOMATCH);
memcpy((char *)P + offsetof(heapframe, ovector), Fovector,
Foffset_top * sizeof(PCRE2_SIZE));
P->offset_top = Foffset_top;
Expand Down
6 changes: 6 additions & 0 deletions testdata/testinput1
Original file line number Diff line number Diff line change
Expand Up @@ -6671,4 +6671,10 @@ $/x
/(?<=PQ|Pc.b?)(.?)(b?)/
Pc.b

/(?(?<=aa.b|ab)b).b/
aaab

/(?(?<=a(?:a.b|b))b).b/
aaab

# End of testinput1
8 changes: 8 additions & 0 deletions testdata/testoutput1
Original file line number Diff line number Diff line change
Expand Up @@ -10528,4 +10528,12 @@ No match
1: b
2:

/(?(?<=aa.b|ab)b).b/
aaab
0: ab

/(?(?<=a(?:a.b|b))b).b/
aaab
0: ab

# End of testinput1

0 comments on commit ead0828

Please sign in to comment.