Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcre2_jit_compile returns PCRE2_ERROR_NOMEMORY on patterns compiled with PCRE2_AUTO_CALLOUT #437

Closed
siegel opened this issue Aug 6, 2024 · 7 comments

Comments

@siegel
Copy link

siegel commented Aug 6, 2024

Version info from pcre2.h:

/* The current PCRE version information. */

#define PCRE2_MAJOR           10
#define PCRE2_MINOR           45
#define PCRE2_PRERELEASE      -DEV
#define PCRE2_DATE            2024-06-09

Platform: macOS (Darwin), 14.6, arm64.
(The OS version is inconsequential; I've not yet tested on Intel but it was happening there too, as of a long time ago.)

At runtime, I programmatically generate this pattern:

(?(?=\C|\z)(?=\C|\z)(?=\C|\z)|(?P<comment>#.*$)(?P<string>(?x:
			(?>	"	(?s: \\. | [^"] )*?		(?: " | $)	)	|
			(?>	'	(?s: \\. | [^'] )*?		(?: ' | $)	)	|
			(?>	`	(?s: \\. | [^`] )*?		(?: ` | $)	)
		))(?P<number>^\.PHONY:[^\\\r]*(\\\r[^\\\r]*)*))(?x:
			(?P>comment)	|
			(?P>number)	|
			(?P>string)
		)

The code which does this is fairly intertwined, but what gets run looks like this (NB: I am using pcre2_16):

	pcre2_compile_context
						*compileContext = NULL;
	pcre2_code			*pcre = NULL;

	compileContext = pcre2_compile_context_create(NULL);
	verify_noerr(pcre2_set_compile_extra_options(compileContext, PCRE2_EXTRA_ESCAPED_CR_IS_LF));

	pcre = pcre2_compile(pattern, patternLength, options, &errornum, &erroroffset, compileContext);
	if (NULL != pcre)
	{
		int	jitError = 0;
		
		jitError = pcre2_jit_compile(pcre, PCRE2_JIT_COMPLETE);
	}

At the time this runs, options may vary, but in this case its value is 0xc00a0404. Notably, this includes PCRE2_AUTO_CALLOUT.

When compiling the generated pattern above, the call to pcre2_jit_compile returns PCRE2_ERROR_NOMEMORY (-48). This isn't a new issue, and most of the time life goes on just fine without JIT. But it's been bothering me that I can't JIT these patterns, so I started to debug it.

To make a very long story short, I found that on line 14283 in pcre2_jit_compile.c it returns PCRE2_ERROR_NOMEMORY if check_opcode_types() returns FALSE:

if (!check_opcode_types(common, common->start, ccend))
  {
  SLJIT_FREE(common->optimized_cbracket, allocator_data);
  return PCRE2_ERROR_NOMEMORY;
  }

So I set breakpoints in the places in that function where there's a return FALSE;. And this one fires, on line 1167:

    case OP_COND:
    case OP_SCOND:
    /* Only AUTO_CALLOUT can insert this opcode. We do
       not intend to support this case. */
    if (cc[1 + LINK_SIZE] == OP_CALLOUT || cc[1 + LINK_SIZE] == OP_CALLOUT_STR)
      return FALSE;
    cc += 1 + LINK_SIZE;
    break;

My pattern is indeed compiled with PCRE2_AUTO_CALLOUT, so the callout opcodes will be in the compiled pattern.

This strikes me as odd, because pcre2_jit_compile() does support ordinary callouts:

    case OP_CALLOUT:
    case OP_CALLOUT_STR:
    if (common->capture_last_ptr == 0)
      {
      common->capture_last_ptr = common->ovector_start;
      common->ovector_start += sizeof(sljit_sw);
      }
    cc += (*cc == OP_CALLOUT) ? PRIV(OP_lengths)[OP_CALLOUT] : GET(cc, 1 + 2*LINK_SIZE);
    break;

Would it be possible to relax the restriction, so that pcre2_jit_compile() allows auto-callouts?

Additionally, the documentation isn't quite in agreement:

  • The man page for pcre2_jit_compile says:

The function can also return PCRE2_ERROR_NOMEMORY if JIT is unable to allocate executable memory for the compiler, even if it was because of a system security restriction.`

It doesn't mention that pcre2_jit_compile(3) will also return PCRE2_ERROR_NOMEMORY for other specific reasons, such as if the pattern uses auto-callouts.

  • The documentation for pcre2callout says that callouts are supported by JIT:

During matching, when PCRE2 reaches a callout point, if an external function is provided in the match context, it is called. This applies to both normal, DFA, and JIT matching.

This suggests that perhaps the behavior I'm currently encountering is a bug; but if it's intentional the documentation needs to clearly state that PCRE2_AUTO_CALLOUT may cause JIT compiles to fail.

So:

  1. I think it makes sense for pcre2_jit_compile() return specific codes for specific failures, not just "can't allocate memory" (which is a lie in this case; there was no allocation failure, just a bad opcode).

  2. It appears that pcre2_jit_match() supports callouts generally. So is it possible that the auto-callout test in check_opcode_types() is erroneous, and that auto-callouts ought to be allowed at that point?

  3. If auto-callouts are not to be supported in JIT-ed patterns, I'd like to have a flag to pass to pcre2_jit_compile() that says "I don't care whether this contains auto-callouts; compile it anyway and NOP them." (I don't know how practical that is, though.)

Thanks for reading. :-)

@zherczeg
Copy link
Collaborator

zherczeg commented Aug 7, 2024

I don't remember why the code returns with no memory for unsupported features. It does not make sense. Maybe there was no better error code. Adding new error codes will be a new feature.

Auto callouts are supported in general, but for conditions, we probably need to save/restore several registers. Implementing this would be a new feature as well.

@siegel
Copy link
Author

siegel commented Aug 7, 2024

I don't remember why the code returns with no memory for unsupported features. It does not make sense. Maybe there was no better error code. Adding new error codes will be a new feature.

Should I create a new issue for that specific feature? (I'm happy to make an argument that it's a bug, but if you're doing the work and consider it a feature, I will write it up as a feature. :-) )

Auto callouts are supported in general, but for conditions, we probably need to save/restore several registers. Implementing this would be a new feature as well.

Can you elaborate on this a bit? I'm not clear on what exactly is occuring at that point in the pattern compilation.

(Also, if you consider support for autocallouts after conditionals to be a feature to be added, should I likewise write up a new issue for it? How should I title it?)

@zherczeg
Copy link
Collaborator

zherczeg commented Aug 9, 2024

You can open reports if you wish, but it is not necessary.

The only thing I remember is that the conditionals with callouts was a complex case. Something was needed to do for conditionals before and after the callout happens. The interpreter also does some specialization:

https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_match.c#L5627

@zherczeg
Copy link
Collaborator

#441

@zherczeg
Copy link
Collaborator

I have checked the callout problem. The main issue is that the callout can escape from conditional before the condition is checked. Supporting cond/scond is already requires a lot of specialization in the code, and supporting iterated conditional with an extra escape is a rather complex case. Introducing bugs in the code is very likely.

@siegel
Copy link
Author

siegel commented Aug 22, 2024

I appreciate you taking the time to look into the feasibility.

I ended up resolving this issue by turning off PCRE2_AUTO_CALLOUT for these patterns and instead generating the pattern with explicit (?C) callouts (which are fine for my needs), and further by reworking the pattern to use (?s).|\z instead of \C (the latter which fails to JIT compile when the pattern was compiled with PCRE2_UTF, as mine are).

So I think this issue can probably be closed leaving #441. :-)

@PhilipHazel
Copy link
Collaborator

I'll be reviewing #441 later today. Closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants