diff --git a/ChangeLog b/ChangeLog index e2e39bb45..667f70866 100644 --- a/ChangeLog +++ b/ChangeLog @@ -7,14 +7,18 @@ there is also the log of commit messages. Version 10.45 xx-xxx-2024 ------------------------- -1. Change 6 of 10.44 broke 32-bit compiles because pcre2test's reporting of -memory size was changed to the entire compiled data block, instead of just the -pattern and tables data, so as to align with the new length restriction. +1. Change 6 of 10.44 broke 32-bit tests because pcre2test's reporting of +memory size was changed to the entire compiled data block, instead of just the +pattern and tables data, so as to align with the new length restriction. Because the block's header contains pointers, this meant the pcre2test output was different in 32-bit mode. A patch by Carlo reverts to the preevious state and makes sure that any limit set by pcre2_set_max_pattern_compiled_length() also avoids the internal struct overhead. +2. Add --posix-pattern-file to pcre2grep to allow processing of empty patterns +through the -f option, as well as patterns that end in space characters for +compatibility with other grep tools. + Version 10.44 07-June-2024 -------------------------- diff --git a/RunGrepTest b/RunGrepTest index c38218710..6aaf1f5a7 100755 --- a/RunGrepTest +++ b/RunGrepTest @@ -861,6 +861,35 @@ echo "---------------------------- Test 153 -----------------------------" >>tes (cd $srcdir; $valgrind $vjs $pcre2grep -nA3 --no-group-separator 'four' ./testdata/grepinputx) >>testtrygrep echo "RC=$?" >>testtrygrep +echo "---------------------------- Test 154 -----------------------------" >>testtrygrep +>testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 155 -----------------------------" >>testtrygrep +echo "" >testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 156 -----------------------------" >>testtrygrep +echo "" >testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep --posix-pattern-file --file $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 157 -----------------------------" >>testtrygrep +echo "spaces " >testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep -o --posix-pattern-file --file=$builddir/testtemp1grep ./testdata/grepinputv >testtemp2grep && [ `wc -c >testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 158 -----------------------------" >>testtrygrep +echo "spaces." >testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 159 -----------------------------" >>testtrygrep +printf "spaces.\015\012" >testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep --posix-pattern-file -f$builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep # Now compare the results. diff --git a/doc/html/pcre2_set_max_pattern_compiled_length.html b/doc/html/pcre2_set_max_pattern_compiled_length.html index ab570cf60..a40f41e45 100644 --- a/doc/html/pcre2_set_max_pattern_compiled_length.html +++ b/doc/html/pcre2_set_max_pattern_compiled_length.html @@ -27,9 +27,9 @@

pcre2_set_max_pattern_compiled_length man page


This function sets, in a compile context, the maximum size (in bytes) for the -memory needed to hold the compiled version of a pattern that is compiled with -this context. The result is always zero. If a pattern that is passed to -pcre2_compile() with this context needs more memory, an error is +memory needed to hold the compiled version of a pattern that is using this +context. The result is always zero. If a pattern that is passed to +pcre2_compile() referencing this context needs more memory, an error is generated. The default is the largest number that a PCRE2_SIZE variable can hold, which is effectively unlimited.

diff --git a/doc/html/pcre2grep.html b/doc/html/pcre2grep.html index bd12246ae..8b2fa541e 100644 --- a/doc/html/pcre2grep.html +++ b/doc/html/pcre2grep.html @@ -391,9 +391,10 @@

pcre2grep man page

command line, no delimiters should be used. What constitutes a newline when reading the file is the operating system's default interpretation of \n. The --newline option has no effect on this option. Trailing white space is -removed from each line, and blank lines are ignored. An empty file contains no +removed from each line, and blank lines are ignored unless the +--posix-pattern-file option is also provided. An empty file contains no patterns and therefore matches nothing. Patterns read from a file in this way -may contain binary zeros, which are treated as ordinary data characters. +may contain binary zeros, which are treated as ordinary character literals.

If this option is given more than once, all the specified files are read. A @@ -808,6 +809,15 @@

pcre2grep man page

allowing \w to match Unicode letters and digits.

+--posix-pattern-file +When patterns are provided with the -f option, do not trim trailing +spaces or ignore empty lines in a similar way than other grep tools. To keep +the behaviour consistent with older versions, if the pattern read was +terminated with CRLF (as character literals) then both characters won't be +included as part of it, so if you really need to have pattern ending in '\r', +use a escape sequence or provide it by a different method. +

+

-q, --quiet Work quietly, that is, display nothing except error messages. The exit status indicates whether or not any matches were found. diff --git a/doc/pcre2grep.1 b/doc/pcre2grep.1 index ffe9d397b..020f45605 100644 --- a/doc/pcre2grep.1 +++ b/doc/pcre2grep.1 @@ -337,9 +337,10 @@ Read patterns from the file, one per line. As is the case with patterns on the command line, no delimiters should be used. What constitutes a newline when reading the file is the operating system's default interpretation of \en. The \fB--newline\fP option has no effect on this option. Trailing white space is -removed from each line, and blank lines are ignored. An empty file contains no +removed from each line, and blank lines are ignored unless the +\fB--posix-pattern-file\fP option is also provided. An empty file contains no patterns and therefore matches nothing. Patterns read from a file in this way -may contain binary zeros, which are treated as ordinary data characters. +may contain binary zeros, which are treated as ordinary character literals. .sp If this option is given more than once, all the specified files are read. A data line is output if any of the patterns match it. A file name can be given @@ -701,6 +702,14 @@ option settings within patterns that affect individual classes. For example, when in UCP mode, the sequence (?aP) restricts [:word:] to ASCII letters, while allowing \ew to match Unicode letters and digits. .TP +\fB--posix-pattern-file\fP +When patterns are provided with the \fB-f\fP option, do not trim trailing +spaces or ignore empty lines in a similar way than other grep tools. To keep +the behaviour consistent with older versions, if the pattern read was +terminated with CRLF (as character literals) then both characters won't be +included as part of it, so if you really need to have pattern ending in '\er', +use a escape sequence or provide it by a different method. +.TP \fB-q\fP, \fB--quiet\fP Work quietly, that is, display nothing except error messages. The exit status indicates whether or not any matches were found. diff --git a/src/config.h.in b/src/config.h.in index 8249182de..3bb01c83d 100644 --- a/src/config.h.in +++ b/src/config.h.in @@ -145,7 +145,8 @@ sure both macros are undefined; an emulation function will then be used. */ /* Define to 1 if you have the header file. */ #undef HAVE_UNISTD_H -/* Define to 1 if the compiler supports simple visibility declarations. */ +/* Define to 1 if the compiler supports GCC compatible visibility + declarations. */ #undef HAVE_VISIBILITY /* Define to 1 if you have the header file. */ diff --git a/src/pcre2grep.c b/src/pcre2grep.c index bb96067f0..ff7f0f0a9 100644 --- a/src/pcre2grep.c +++ b/src/pcre2grep.c @@ -290,6 +290,7 @@ static BOOL show_total_count = FALSE; static BOOL silent = FALSE; static BOOL utf = FALSE; static BOOL posix_digit = FALSE; +static BOOL posix_pattern_file = FALSE; static uint8_t utf8_buffer[8]; @@ -428,6 +429,7 @@ used to identify them. */ #define N_POSIX_DIGIT (-26) #define N_GROUP_SEPARATOR (-27) #define N_NO_GROUP_SEPARATOR (-28) +#define N_POSIX_PATFILE (-29) static option_item optionlist[] = { { OP_NODATA, N_NULL, NULL, "", "terminate options" }, @@ -449,6 +451,7 @@ static option_item optionlist[] = { { OP_PATLIST, 'e', &match_patdata, "regex(p)=pattern", "specify pattern (may be used more than once)" }, { OP_NODATA, 'F', NULL, "fixed-strings", "patterns are sets of newline-separated strings" }, { OP_FILELIST, 'f', &pattern_files_data, "file=path", "read patterns from file" }, + { OP_NODATA, N_POSIX_PATFILE, NULL, "posix-pattern-file", "use POSIX semantics for pattern files" }, { OP_FILELIST, N_FILE_LIST, &file_lists_data, "file-list=path","read files to search from file" }, { OP_NODATA, N_FOFFSETS, NULL, "file-offsets", "output file offsets, not text" }, { OP_STRING, N_GROUP_SEPARATOR, &group_separator, "group-separator=text", "set separator between groups of lines" }, @@ -1448,7 +1451,34 @@ while ((c = fgetc(f)) != EOF) return yield; } +/************************************************* +* Read one pattern from file * +*************************************************/ +/* Wrap around read_one_line() to make sure any terminating '\n' is not +included in the pattern and empty patterns are correctly identified. + +Arguments: + buffer the buffer to read into + length maximum number of characters to read and report how many were + f the file + +Returns: TRUE if a pattern was read into buffer +*/ + +static BOOL +read_pattern(char *buffer, PCRE2_SIZE *length, FILE *f) +{ +*buffer = '\0'; +*length = read_one_line(buffer, *length, f); +if (*length > 0 && buffer[*length-1] == '\n') *length = *length - 1; +if (posix_pattern_file && *length > 0 && buffer[*length-1] == '\r') + { + *length = *length - 1; + if (*length == 0) return TRUE; + } +return (*length > 0 || *buffer == '\n'); +} /************************************************* * Find end of line * @@ -3598,6 +3628,7 @@ switch(letter) case N_NOJIT: use_jit = FALSE; break; case N_ALLABSK: extra_options |= PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK; break; case N_NO_GROUP_SEPARATOR: group_separator = NULL; break; + case N_POSIX_PATFILE: posix_pattern_file = TRUE; break; case 'a': binary_files = BIN_TEXT; break; case 'c': count_only = TRUE; break; case N_POSIX_DIGIT: posix_digit = TRUE; break; @@ -3808,11 +3839,15 @@ else filename = name; } -while ((patlen = read_one_line(buffer, sizeof(buffer), f)) > 0) +while ((patlen = sizeof(buffer)) && read_pattern(buffer, &patlen, f)) { - while (patlen > 0 && isspace((unsigned char)(buffer[patlen-1]))) patlen--; + if (!posix_pattern_file) + { + while (patlen > 0 && isspace((unsigned char)(buffer[patlen-1]))) patlen--; + } + linenumber++; - if (patlen == 0) continue; /* Skip blank lines */ + if (!posix_pattern_file && patlen == 0) continue; /* Skip blank lines */ /* Note: this call to add_pattern() puts a pointer to the local variable "buffer" into the pattern chain. However, that pointer is used only when diff --git a/testdata/grepinputv b/testdata/grepinputv index 366d4fb49..029e2bcdf 100644 --- a/testdata/grepinputv +++ b/testdata/grepinputv @@ -7,3 +7,4 @@ The word is cat in this line The caterpillar sat on the mat The snowcat is not an animal A buried feline in the syndicate +trailing spaces diff --git a/testdata/grepoutput b/testdata/grepoutput index d9233c26a..b63a27900 100644 --- a/testdata/grepoutput +++ b/testdata/grepoutput @@ -464,6 +464,7 @@ The word is cat in this line The caterpillar sat on the mat The snowcat is not an animal A buried feline in the syndicate +trailing spaces RC=0 ---------------------------- Test 52 ------------------------------ fox jumps @@ -1169,6 +1170,7 @@ The word is cat in this line The caterpillar sat on the mat The snowcat is not an animal A buried feline in the syndicate +trailing spaces RC=0 ---------------------------- Test 146 ----------------------------- (standard input):A123B @@ -1253,3 +1255,27 @@ RC=0 36-sixteen 37-seventeen RC=0 +---------------------------- Test 154 ----------------------------- +RC=1 +---------------------------- Test 155 ----------------------------- +RC=1 +---------------------------- Test 156 ----------------------------- +The quick brown +fox jumps +over the lazy dog. +This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal +A buried feline in the syndicate +trailing spaces +RC=0 +---------------------------- Test 157 ----------------------------- +RC=0 +---------------------------- Test 158 ----------------------------- +trailing spaces +RC=0 +---------------------------- Test 159 ----------------------------- +trailing spaces +RC=0