Skip to content

Commit

Permalink
Add check for ridiculous offsets to substring extraction functions.
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilipHazel committed Nov 14, 2023
1 parent ce5b604 commit 183e8a9
Show file tree
Hide file tree
Showing 7 changed files with 14 additions and 0 deletions.
3 changes: 3 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,9 @@ patterns.
a recursion, pcre2_match() misbehaved and gave the wrong match. For example,
the pattern /|a(?0)/ matched against "aaaa".

39. Add a test for ridiculous ovector offset values to the substring extraction
functions.


Version 10.42 11-December-2022
------------------------------
Expand Down
1 change: 1 addition & 0 deletions src/pcre2.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,7 @@ released, the numbers must not be changed. */
#define PCRE2_ERROR_CONVERT_SYNTAX (-64)
#define PCRE2_ERROR_INTERNAL_DUPMATCH (-65)
#define PCRE2_ERROR_DFA_UINVALID_UTF (-66)
#define PCRE2_ERROR_INVALIDOFFSET (-67)


/* Request types for pcre2_pattern_info() */
Expand Down
1 change: 1 addition & 0 deletions src/pcre2_dfa_match.c
Original file line number Diff line number Diff line change
Expand Up @@ -4013,6 +4013,7 @@ for (;;)
match_data->ovector[0] = (PCRE2_SIZE)(start_match - subject);
match_data->ovector[1] = (PCRE2_SIZE)(end_subject - subject);
}
match_data->subject_length = length;
match_data->leftchar = (PCRE2_SIZE)(mb->start_used_ptr - subject);
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
Expand Down
1 change: 1 addition & 0 deletions src/pcre2_error.c
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,7 @@ static const unsigned char match_error_texts[] =
/* 65 */
"internal error - duplicate substitution match\0"
"PCRE2_MATCH_INVALID_UTF is not supported for DFA matching\0"
"INTERNAL ERROR: invalid substring offset\0"
;


Expand Down
1 change: 1 addition & 0 deletions src/pcre2_intmodedep.h
Original file line number Diff line number Diff line change
Expand Up @@ -659,6 +659,7 @@ typedef struct pcre2_real_match_data {
PCRE2_SPTR mark; /* Pointer to last mark */
struct heapframe *heapframes; /* Backtracking frames heap memory */
PCRE2_SIZE heapframes_size; /* Malloc-ed size */
PCRE2_SIZE subject_length; /* Subject length */
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
PCRE2_SIZE startchar; /* Offset to starting code unit */
Expand Down
4 changes: 4 additions & 0 deletions src/pcre2_match.c
Original file line number Diff line number Diff line change
Expand Up @@ -6669,6 +6669,7 @@ if (use_jit)
match_data, mcontext);
if (rc != PCRE2_ERROR_JIT_BADOPTION)
{
match_data->subject_length = length;

This comment has been minimized.

Copy link
@jddurand

jddurand Dec 29, 2023

When the caller uses pcre2_jit_match directly, match_data->subject_length may contain uninitialized data as far as I understand. Shouldn' this be placed in pcre2_jit_match directly ? My use case is:

Current RC candidate shows that subject_length is uninitialized when run under DrMemory:

Error #362: UNINITIALIZED READ: reading register rax
# 0 pcre2_substring_length_bynumber_8                     [C:\Users\jddfr\git\c-marpaESLIF-build\_deps\pcre2-src\src\pcre2_substring.c:345]
# 1 pcre2_substitute_8                                    [C:\Users\jddfr\git\c-marpaESLIF-build\_deps\pcre2-src\src\pcre2_substitute.c:724]
# 2 _marpaESLIFRecognizer_terminal_matcherb               [C:\Users\jddfr\git\c-marpaESLIF\src\marpaESLIF.c:7449]
# 3 _marpaESLIFRecognizer_symbol_matcherb                 [C:\Users\jddfr\git\c-marpaESLIF\src\marpaESLIF.c:8277]
# 4 __marpaESLIFRecognizer_symbol_tryb                    [C:\Users\jddfr\git\c-marpaESLIF\src\marpaESLIF.c:25337]
# 5 _marpaESLIFSymbol_tryb                                [C:\Users\jddfr\git\c-marpaESLIF\src\marpaESLIF.c:24542]
# 6 marpaESLIFSymbol_tryb                                 [C:\Users\jddfr\git\c-marpaESLIF\src\marpaESLIF.c:24492]
# 7 main                                                  [C:\Users\jddfr\git\c-marpaESLIF\test\selfTester.c:973]
Note: @0:11:29.664 in thread 4676
Note: instruction: cmp    0x08(%rsp) %rax

I putted naively this change at the end of pcre2_jit_match.c to bypass this uninitialized memory access (sorry I did not forked the project - this was done locally):

if (rc >= 0 || rc == PCRE2_ERROR_PARTIAL) {
  match_data->subject = subject;
  match_data->subject_length = length;
} else {
  match_data->subject = NULL;
  match_data->subject_length = 0;
}

This comment has been minimized.

Copy link
@zherczeg

zherczeg Dec 29, 2023

Collaborator

I have no context about this suggestion. Is this a new feature? Is it needed?

This comment has been minimized.

Copy link
@PhilipHazel

PhilipHazel via email Dec 29, 2023

Author Collaborator

This comment has been minimized.

Copy link
@PhilipHazel

PhilipHazel via email Dec 29, 2023

Author Collaborator

This comment has been minimized.

Copy link
@PhilipHazel

PhilipHazel via email Dec 29, 2023

Author Collaborator

This comment has been minimized.

Copy link
@zherczeg

zherczeg Dec 29, 2023

Collaborator

This looks simpler:

match_data->subject = subject;
match_data->subject_length = length;

No need an if.

This comment has been minimized.

Copy link
@PhilipHazel

PhilipHazel via email Dec 29, 2023

Author Collaborator

This comment has been minimized.

Copy link
@zherczeg

zherczeg Dec 29, 2023

Collaborator

The test part is confusing for me (is or is not possible). If you have an idea for testing, you can land it as well.

This comment has been minimized.

Copy link
@PhilipHazel

PhilipHazel via email Dec 29, 2023

Author Collaborator
if (rc >= 0 && (options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
{
length = CU2BYTES(length + was_zero_terminated);
Expand Down Expand Up @@ -7603,6 +7604,7 @@ if (rc == MATCH_MATCH)
{
match_data->rc = ((int)mb->end_offset_top >= 2 * match_data->oveccount)?
0 : (int)mb->end_offset_top/2 + 1;
match_data->subject_length = length;
match_data->startchar = start_match - subject;
match_data->leftchar = mb->start_used_ptr - subject;
match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)?
Expand All @@ -7617,6 +7619,7 @@ if (rc == MATCH_MATCH)
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
}
else match_data->subject = subject;

return match_data->rc;
}

Expand All @@ -7638,6 +7641,7 @@ PCRE2_ERROR_PARTIAL. */
else if (match_partial != NULL)
{
match_data->subject = subject;
match_data->subject_length = length;
match_data->ovector[0] = match_partial - subject;
match_data->ovector[1] = end_subject - subject;
match_data->startchar = match_partial - subject;
Expand Down
3 changes: 3 additions & 0 deletions src/pcre2_substring.c
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,7 @@ Returns: if successful: 0
PCRE2_ERROR_NOSUBSTRING: no such substring
PCRE2_ERROR_UNAVAILABLE: ovector is too small
PCRE2_ERROR_UNSET: substring is not set
PCRE2_ERROR_INVALIDOFFSET: internal error, should not occur
*/

PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
Expand Down Expand Up @@ -341,6 +342,8 @@ else /* Matched using pcre2_dfa_match() */

left = match_data->ovector[stringnumber*2];
right = match_data->ovector[stringnumber*2+1];
if (left > match_data->subject_length || right > match_data->subject_length)
return PCRE2_ERROR_INVALIDOFFSET;
if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left;
return 0;
}
Expand Down

0 comments on commit 183e8a9

Please sign in to comment.