Skip to content

Commit

Permalink
Add fix and tests for \x with no hex digits (#478)
Browse files Browse the repository at this point in the history
  • Loading branch information
NWilson authored Sep 17, 2024
1 parent b6d0554 commit 97a2937
Show file tree
Hide file tree
Showing 7 changed files with 58 additions and 14 deletions.
23 changes: 21 additions & 2 deletions src/pcre2_compile.c
Original file line number Diff line number Diff line change
Expand Up @@ -2093,10 +2093,29 @@ else

else
{
c = 0;
if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff) break; /* Not a hex digit */
/* Perl has the surprising/broken behaviour that \x without following
hex digits is treated as an escape for NUL. Their source code laments
this but keeps it for backwards compatibility. A warning is printed
when "use warnings" is enabled, and it's forbidden when "use strict"

This comment has been minimized.

Copy link
@carenas

carenas Sep 18, 2024

Contributor

which version of Perl does this?, got the latest fedora beta and couldn't reproduce:

# cat w.pl 
#!/usr/bin/perl -w

use warnings;
use strict;
use re;

my $v = "\0";
print "MATCH\n" if $v =~ /\x/;
# ./w.pl
MATCH
# perl -v

This is perl 5, version 40, subversion 0 (v5.40.0) built for x86_64-linux-thread-multi
(with 15 registered patches, see perl -V for more detail)

Copyright 1987-2024, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at https://www.perl.org/, the Perl Home Page.
is enabled. Because we don't have warnings, we simply forbid it. */
if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff)
{
/* Not a hex digit */
*errorcodeptr = ERR78;
break;
}
ptr++;
c = cc;

/* With "use re 'strict'" Perl actually requires exactly two digits (error
for both \xA and \xAAA). This seems overly strict, and there seems
little incentive to align with that, given the backwards-compatibility
cost.
For comparison, note that other engines disagree. For example:
- Java allows 1 or 2 hex digits. Error if 0 digits. No error if >2 digits
- .NET requires 2 hex digits. Error if 0, 1 digits. No error if >2 digits.
*/
if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff) break; /* Not a hex digit */
ptr++;
c = (c << 4) | cc;
Expand Down
2 changes: 1 addition & 1 deletion src/pcre2_error.c
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ static const unsigned char compile_error_texts[] =
"using UCP is disabled by the application\0"
"name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
"character code point value in \\u.... sequence is too large\0"
"digits missing in \\x{} or \\o{} or \\N{U+}\0"
"digits missing after \\x or in \\x{} or \\o{} or \\N{U+}\0"
"syntax error or number too big in (?(VERSION condition\0"
/* 80 */
"internal error: unknown opcode in auto_possessify()\0"
Expand Down
12 changes: 9 additions & 3 deletions testdata/testinput1
Original file line number Diff line number Diff line change
Expand Up @@ -5610,9 +5610,15 @@ name)/mark
/^ (?:(?<A>A)|(?'B'B)(?<A>A)) (?('A')x) (?(<B>)y)$/x,dupnames
Ax
BAxy

/^A\xZ/
A\0Z

/^A\xBz/
A\x{0B}z

/^A\xABz/
A\x{AB}z

/^A\xABCz/
A\x{AB}Cz

/^A\o{123}B/
A\123B
Expand Down
4 changes: 4 additions & 0 deletions testdata/testinput2
Original file line number Diff line number Diff line change
Expand Up @@ -3983,6 +3983,10 @@

/\xthing/

/^A\xZ/

/^A\x/

/\x{}/

/\x{whatever}/
Expand Down
16 changes: 12 additions & 4 deletions testdata/testoutput1
Original file line number Diff line number Diff line change
Expand Up @@ -8919,10 +8919,18 @@ No match
1: <unset>
2: B
3: A

/^A\xZ/
A\0Z
0: A\x00Z

/^A\xBz/
A\x{0B}z
0: A\x0bz

/^A\xABz/
A\x{AB}z
0: A\xabz

/^A\xABCz/
A\x{AB}Cz
0: A\xabCz

/^A\o{123}B/
A\123B
Expand Down
13 changes: 10 additions & 3 deletions testdata/testoutput2
Original file line number Diff line number Diff line change
Expand Up @@ -13239,7 +13239,7 @@ Failed: error 167 at offset 5: non-hex character in \x{} (closing brace missing?
Failed: error 167 at offset 7: non-hex character in \x{} (closing brace missing?)

/^A\x{/
Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+}
Failed: error 178 at offset 5: digits missing after \x or in \x{} or \o{} or \N{U+}

/[ab]++/B,no_auto_possess
------------------------------------------------------------------
Expand Down Expand Up @@ -13451,15 +13451,22 @@ Failed: error 133 at offset 7: parentheses are too deeply nested (stack check)
Failed: error 155 at offset 2: missing opening brace after \o

/\o{}/
Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+}
Failed: error 178 at offset 3: digits missing after \x or in \x{} or \o{} or \N{U+}

/\o{whatever}/
Failed: error 164 at offset 3: non-octal character in \o{} (closing brace missing?)

/\xthing/
Failed: error 178 at offset 2: digits missing after \x or in \x{} or \o{} or \N{U+}

/^A\xZ/
Failed: error 178 at offset 4: digits missing after \x or in \x{} or \o{} or \N{U+}

/^A\x/
Failed: error 178 at offset 4: digits missing after \x or in \x{} or \o{} or \N{U+}

/\x{}/
Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+}
Failed: error 178 at offset 3: digits missing after \x or in \x{} or \o{} or \N{U+}

/\x{whatever}/
Failed: error 167 at offset 3: non-hex character in \x{} (closing brace missing?)
Expand Down
2 changes: 1 addition & 1 deletion testdata/testoutput5
Original file line number Diff line number Diff line change
Expand Up @@ -4702,7 +4702,7 @@ Callout 0: last capture = 1
Failed: error 193 at offset 2: \N{U+dddd} is supported only in Unicode (UTF) mode

/\N{U+}/utf
Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+}
Failed: error 178 at offset 5: digits missing after \x or in \x{} or \o{} or \N{U+}

/\N{U}/
Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u
Expand Down

0 comments on commit 97a2937

Please sign in to comment.