-
Notifications
You must be signed in to change notification settings - Fork 199
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add new API function pcre2_set_optimization() for controlling enabled…
… optimizations It is anticipated that over time, more and more optimizations will be added to PCRE2, and we want to be able to switch optimizations off/on, both for testing purposes and to be able to work around bugs in a released library version. The number of free bits left in the compile options word is very small. Hence, we will start putting all optimization enable/disable flags in a separate word. To switch these off/on, the new API function pcre2_set_optimization() will be used. The values which can be passed to pcre2_set_optimization() are different from the internal flag bit values. The values accepted by pcre2_set_optimization() are contiguous integers, so there is no danger of ever running out of them. This means in the future, the internal representation can be changed at any time without breaking backwards compatibility. Further, the 'directives' passed to pcre2_set_optimization() are not restricted to control a single, specific optimization. As an example, passing PCRE2_OPTIMIZATION_FULL will turn on all optimizations supported by whatever version of PCRE2 the client program happens to be linked with. Co-Authored-By: Carlo Marcelo Arenas Belón <carenas@gmail.com> Co-Authored-by: Zoltan Herczeg <hzmester@freemail.hu>
- Loading branch information
1 parent
5e75d9b
commit a346039
Showing
25 changed files
with
713 additions
and
84 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
<html> | ||
<head> | ||
<title>pcre2_set_optimize specification</title> | ||
</head> | ||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> | ||
<h1>pcre2_set_optimize man page</h1> | ||
<p> | ||
Return to the <a href="index.html">PCRE2 index page</a>. | ||
</p> | ||
<p> | ||
This page is part of the PCRE2 HTML documentation. It was generated | ||
automatically from the original man page. If there is any nonsense in it, | ||
please consult the man page, in case the conversion went wrong. | ||
<br> | ||
<br><b> | ||
SYNOPSIS | ||
</b><br> | ||
<P> | ||
<b>#include <pcre2.h></b> | ||
</P> | ||
<P> | ||
<b>int pcre2_set_optimize(pcre2_compile_context *<i>ccontext</i>,</b> | ||
<b> uint32_t <i>directive</i>);</b> | ||
</P> | ||
<br><b> | ||
DESCRIPTION | ||
</b><br> | ||
<P> | ||
This function controls which performance optimizations will be applied | ||
by <b>pcre2_compile</b>. The permitted values of <i>directive</i> are as follows: | ||
<pre> | ||
PCRE2_OPTIMIZATION_NONE | ||
</pre> | ||
Disable all optional performance optimizations. | ||
<pre> | ||
PCRE2_OPTIMIZATION_FULL | ||
</pre> | ||
Enable all optional performance optimizations. This is the default value. | ||
<pre> | ||
PCRE2_AUTO_POSSESS | ||
PCRE2_AUTO_POSSESS_OFF | ||
</pre> | ||
Enable/disable "auto-possessification" of variable quantifiers such as * and +. | ||
This optimization, for example, turns a+b into a++b in order to avoid | ||
backtracks into a+ that can never be successful. However, if callouts are in | ||
use, auto-possessification means that some callouts are never taken. You can | ||
disable this optimization if you want the matching functions to do a full, | ||
unoptimized search and run all the callouts. | ||
<pre> | ||
PCRE2_DOTSTAR_ANCHOR | ||
PCRE2_DOTSTAR_ANCHOR_OFF | ||
</pre> | ||
Enable/disable an optimization that is applied when .* is the first significant | ||
item in a top-level branch of a pattern, and all the other branches also start | ||
with .* or with \A or \G or ^. Such a pattern is automatically anchored if | ||
PCRE2_DOTALL is set for all the .* items and PCRE2_MULTILINE is not set for any | ||
^ items. Otherwise, the fact that any match must start either at the start of | ||
the subject or following a newline is remembered. Like other optimizations, | ||
this can cause callouts to be skipped. | ||
</P> | ||
<P> | ||
Dotstar anchor optimization is automatically disabled for .* if it is inside an | ||
atomic group or a capture group that is the subject of a backreference, or if | ||
the pattern contains (*PRUNE) or (*SKIP). | ||
<pre> | ||
PCRE2_START_OPTIMIZE | ||
PCRE2_START_OPTIMIZE_OFF | ||
</pre> | ||
Enable/disable optimizations which cause matching functions to scan the subject | ||
string for specific code unit values before attempting a match. For example, if | ||
it is known that an unanchored match must start with a specific value, the | ||
matching code searches the subject for that value, and fails immediately if it | ||
cannot find it, without actually running the main matching function. This means | ||
that a special item such as (*COMMIT) at the start of a pattern is not | ||
considered until after a suitable starting point for the match has been found. | ||
Also, when callouts or (*MARK) items are in use, these "start-up" optimizations | ||
can cause them to be skipped if the pattern is never actually used. The start-up | ||
optimizations are in effect a pre-scan of the subject that takes place before | ||
the pattern is run. | ||
</P> | ||
<P> | ||
Disabling start-up optimizations ensures that in cases where the result is "no | ||
match", the callouts do occur, and that items such as (*COMMIT) and (*MARK) are | ||
considered at every possible starting position in the subject string. | ||
</P> | ||
<P> | ||
Disabling start-up optimizations may change the outcome of a matching operation. | ||
Consider the pattern | ||
<pre> | ||
(*COMMIT)ABC | ||
</pre> | ||
When this is compiled, PCRE2 records the fact that a match must start with the | ||
character "A". Suppose the subject string is "DEFABC". The start-up | ||
optimization scans along the subject, finds "A" and runs the first match | ||
attempt from there. The (*COMMIT) item means that the pattern must match the | ||
current starting position, which in this case, it does. However, if the same | ||
match is run without start-up optimizations, the initial scan along the subject | ||
string does not happen. The first match attempt is run starting from "D" and | ||
when this fails, (*COMMIT) prevents any further matches being tried, so the | ||
overall result is "no match". | ||
</P> | ||
<P> | ||
Another start-up optimization makes use of a minimum length for a matching | ||
subject, which is recorded when possible. Consider the pattern | ||
<pre> | ||
(*MARK:1)B(*MARK:2)(X|Y) | ||
</pre> | ||
The minimum length for a match is two characters. If the subject is "XXBB", the | ||
"starting character" optimization skips "XX", then tries to match "BB", which | ||
is long enough. In the process, (*MARK:2) is encountered and remembered. When | ||
the match attempt fails, the next "B" is found, but there is only one character | ||
left, so there are no more attempts, and "no match" is returned with the "last | ||
mark seen" set to "2". Without start-up optimizations, however, matches are | ||
tried at every possible starting position, including at the end of the subject, | ||
where (*MARK:1) is encountered, but there is no "B", so the "last mark seen" | ||
that is returned is "1". In this case, the optimizations do not affect the | ||
overall match result, which is still "no match", but they do affect the | ||
auxiliary information that is returned. | ||
<p> | ||
Return to the <a href="index.html">PCRE2 index page</a>. | ||
</p> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.