Skip to content

Commit

Permalink
Fix bug in 'first code unit' and 'last code unit' optimization combin…
Browse files Browse the repository at this point in the history
…ed with lookahead assertion
  • Loading branch information
alexdowad committed Aug 25, 2024
1 parent 96ed5fd commit 4b66716
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 4 deletions.
15 changes: 13 additions & 2 deletions src/pcre2_compile.c
Original file line number Diff line number Diff line change
Expand Up @@ -10895,8 +10895,19 @@ if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0)
(these are not saved during the compile because they can cause conflicts with
actual literals that follow). */

if (firstcuflags >= REQ_NONE)
firstcu = find_firstassertedcu(codestart, &firstcuflags, 0);
if (firstcuflags >= REQ_NONE) {
uint32_t assertedcuflags = 0;
uint32_t assertedcu = find_firstassertedcu(codestart, &assertedcuflags, 0);
/* It would be wrong to use the asserted first code unit as `firstcu` for
* regexes which are able to match a 1-character string (e.g. /(?=a)b?a/)
* For that example, if we set both firstcu and reqcu to 'a', it would mean
* the subject string needs to be at least 2 characters long, which is wrong.
* With more analysis, we would be able to set firstcu in more cases. */
if (assertedcuflags < REQ_NONE && assertedcu != reqcu) {
firstcu = assertedcu;
firstcuflags = assertedcuflags;
}
}

/* Save the data for a first code unit. The existence of one means the
minimum length must be at least 1. */
Expand Down
6 changes: 6 additions & 0 deletions testdata/testinput1
Original file line number Diff line number Diff line change
Expand Up @@ -6677,4 +6677,10 @@ $/x
/(?(?<=a(?:a.b|b))b).b/
aaab

/(?=a)b?a/
a

/(?=a)b?a./
ab

# End of testinput1
8 changes: 8 additions & 0 deletions testdata/testoutput1
Original file line number Diff line number Diff line change
Expand Up @@ -10536,4 +10536,12 @@ No match
aaab
0: ab

/(?=a)b?a/
a
0: a

/(?=a)b?a./
ab
0: ab

# End of testinput1
4 changes: 2 additions & 2 deletions testdata/testoutput2
Original file line number Diff line number Diff line change
Expand Up @@ -10747,9 +10747,9 @@ Subject length lower bound = 1
/(?=a{3})[bcd]/Ii
Capture group count = 0
Options: caseless
First code unit = 'a' (caseless)
Starting code units: A a
Last code unit = 'a' (caseless)
Subject length lower bound = 2
Subject length lower bound = 1

/(abc)\1+/

Expand Down

0 comments on commit 4b66716

Please sign in to comment.