Skip to content

Commit

Permalink
pcre2grep: add --posix-pattern-file for compatibility with other grep (
Browse files Browse the repository at this point in the history
…#428)

Historically, pcre2grep has done minor processing of the patterns that
were read through the `-f` option.

The end result is that for some patterns there are different results
depending if they were provided through `-e`, `-f` or as a parameter
in the command line.

Add a flag that could be provided to skip that processing so that the
same pattern file used with other grep implementations could be used
directly for the same result.
  • Loading branch information
carenas authored Jun 18, 2024
1 parent 3b90149 commit c63d7c9
Show file tree
Hide file tree
Showing 9 changed files with 129 additions and 14 deletions.
10 changes: 7 additions & 3 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,18 @@ there is also the log of commit messages.
Version 10.45 xx-xxx-2024
-------------------------

1. Change 6 of 10.44 broke 32-bit compiles because pcre2test's reporting of
memory size was changed to the entire compiled data block, instead of just the
pattern and tables data, so as to align with the new length restriction.
1. Change 6 of 10.44 broke 32-bit tests because pcre2test's reporting of
memory size was changed to the entire compiled data block, instead of just the
pattern and tables data, so as to align with the new length restriction.
Because the block's header contains pointers, this meant the pcre2test output
was different in 32-bit mode. A patch by Carlo reverts to the preevious state
and makes sure that any limit set by pcre2_set_max_pattern_compiled_length()
also avoids the internal struct overhead.

2. Add --posix-pattern-file to pcre2grep to allow processing of empty patterns
through the -f option, as well as patterns that end in space characters for
compatibility with other grep tools.


Version 10.44 07-June-2024
--------------------------
Expand Down
29 changes: 29 additions & 0 deletions RunGrepTest
Original file line number Diff line number Diff line change
Expand Up @@ -861,6 +861,35 @@ echo "---------------------------- Test 153 -----------------------------" >>tes
(cd $srcdir; $valgrind $vjs $pcre2grep -nA3 --no-group-separator 'four' ./testdata/grepinputx) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 154 -----------------------------" >>testtrygrep
>testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 155 -----------------------------" >>testtrygrep
echo "" >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 156 -----------------------------" >>testtrygrep
echo "" >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep --posix-pattern-file --file $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 157 -----------------------------" >>testtrygrep
echo "spaces " >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep -o --posix-pattern-file --file=$builddir/testtemp1grep ./testdata/grepinputv >testtemp2grep && $valgrind $vjs $pcre2grep -q "s " testtemp2grep) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 158 -----------------------------" >>testtrygrep
echo "spaces." >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 159 -----------------------------" >>testtrygrep
printf "spaces.\015\012" >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep --posix-pattern-file -f$builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

# Now compare the results.

Expand Down
6 changes: 3 additions & 3 deletions doc/html/pcre2_set_max_pattern_compiled_length.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ <h1>pcre2_set_max_pattern_compiled_length man page</h1>
</b><br>
<P>
This function sets, in a compile context, the maximum size (in bytes) for the
memory needed to hold the compiled version of a pattern that is compiled with
this context. The result is always zero. If a pattern that is passed to
<b>pcre2_compile()</b> with this context needs more memory, an error is
memory needed to hold the compiled version of a pattern that is using this
context. The result is always zero. If a pattern that is passed to
<b>pcre2_compile()</b> referencing this context needs more memory, an error is
generated. The default is the largest number that a PCRE2_SIZE variable can
hold, which is effectively unlimited.
</P>
Expand Down
14 changes: 12 additions & 2 deletions doc/html/pcre2grep.html
Original file line number Diff line number Diff line change
Expand Up @@ -391,9 +391,10 @@ <h1>pcre2grep man page</h1>
command line, no delimiters should be used. What constitutes a newline when
reading the file is the operating system's default interpretation of \n. The
<b>--newline</b> option has no effect on this option. Trailing white space is
removed from each line, and blank lines are ignored. An empty file contains no
removed from each line, and blank lines are ignored unless the
<b>--posix-pattern-file</b> option is also provided. An empty file contains no
patterns and therefore matches nothing. Patterns read from a file in this way
may contain binary zeros, which are treated as ordinary data characters.
may contain binary zeros, which are treated as ordinary character literals.
<br>
<br>
If this option is given more than once, all the specified files are read. A
Expand Down Expand Up @@ -808,6 +809,15 @@ <h1>pcre2grep man page</h1>
allowing \w to match Unicode letters and digits.
</P>
<P>
<b>--posix-pattern-file</b>
When patterns are provided with the <b>-f</b> option, do not trim trailing
spaces or ignore empty lines in a similar way than other grep tools. To keep
the behaviour consistent with older versions, if the pattern read was
terminated with CRLF (as character literals) then both characters won't be
included as part of it, so if you really need to have pattern ending in '\r',
use a escape sequence or provide it by a different method.
</P>
<P>
<b>-q</b>, <b>--quiet</b>
Work quietly, that is, display nothing except error messages. The exit
status indicates whether or not any matches were found.
Expand Down
13 changes: 11 additions & 2 deletions doc/pcre2grep.1
Original file line number Diff line number Diff line change
Expand Up @@ -337,9 +337,10 @@ Read patterns from the file, one per line. As is the case with patterns on the
command line, no delimiters should be used. What constitutes a newline when
reading the file is the operating system's default interpretation of \en. The
\fB--newline\fP option has no effect on this option. Trailing white space is
removed from each line, and blank lines are ignored. An empty file contains no
removed from each line, and blank lines are ignored unless the
\fB--posix-pattern-file\fP option is also provided. An empty file contains no
patterns and therefore matches nothing. Patterns read from a file in this way
may contain binary zeros, which are treated as ordinary data characters.
may contain binary zeros, which are treated as ordinary character literals.
.sp
If this option is given more than once, all the specified files are read. A
data line is output if any of the patterns match it. A file name can be given
Expand Down Expand Up @@ -701,6 +702,14 @@ option settings within patterns that affect individual classes. For example,
when in UCP mode, the sequence (?aP) restricts [:word:] to ASCII letters, while
allowing \ew to match Unicode letters and digits.
.TP
\fB--posix-pattern-file\fP
When patterns are provided with the \fB-f\fP option, do not trim trailing
spaces or ignore empty lines in a similar way than other grep tools. To keep
the behaviour consistent with older versions, if the pattern read was
terminated with CRLF (as character literals) then both characters won't be
included as part of it, so if you really need to have pattern ending in '\er',
use a escape sequence or provide it by a different method.
.TP
\fB-q\fP, \fB--quiet\fP
Work quietly, that is, display nothing except error messages. The exit
status indicates whether or not any matches were found.
Expand Down
3 changes: 2 additions & 1 deletion src/config.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,8 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <unistd.h> header file. */
#undef HAVE_UNISTD_H

/* Define to 1 if the compiler supports simple visibility declarations. */
/* Define to 1 if the compiler supports GCC compatible visibility
declarations. */
#undef HAVE_VISIBILITY

/* Define to 1 if you have the <wchar.h> header file. */
Expand Down
41 changes: 38 additions & 3 deletions src/pcre2grep.c
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ static BOOL show_total_count = FALSE;
static BOOL silent = FALSE;
static BOOL utf = FALSE;
static BOOL posix_digit = FALSE;
static BOOL posix_pattern_file = FALSE;

static uint8_t utf8_buffer[8];

Expand Down Expand Up @@ -428,6 +429,7 @@ used to identify them. */
#define N_POSIX_DIGIT (-26)
#define N_GROUP_SEPARATOR (-27)
#define N_NO_GROUP_SEPARATOR (-28)
#define N_POSIX_PATFILE (-29)

static option_item optionlist[] = {
{ OP_NODATA, N_NULL, NULL, "", "terminate options" },
Expand All @@ -449,6 +451,7 @@ static option_item optionlist[] = {
{ OP_PATLIST, 'e', &match_patdata, "regex(p)=pattern", "specify pattern (may be used more than once)" },
{ OP_NODATA, 'F', NULL, "fixed-strings", "patterns are sets of newline-separated strings" },
{ OP_FILELIST, 'f', &pattern_files_data, "file=path", "read patterns from file" },
{ OP_NODATA, N_POSIX_PATFILE, NULL, "posix-pattern-file", "use POSIX semantics for pattern files" },
{ OP_FILELIST, N_FILE_LIST, &file_lists_data, "file-list=path","read files to search from file" },
{ OP_NODATA, N_FOFFSETS, NULL, "file-offsets", "output file offsets, not text" },
{ OP_STRING, N_GROUP_SEPARATOR, &group_separator, "group-separator=text", "set separator between groups of lines" },
Expand Down Expand Up @@ -1448,7 +1451,34 @@ while ((c = fgetc(f)) != EOF)
return yield;
}

/*************************************************
* Read one pattern from file *
*************************************************/

/* Wrap around read_one_line() to make sure any terminating '\n' is not
included in the pattern and empty patterns are correctly identified.
Arguments:
buffer the buffer to read into
length maximum number of characters to read and report how many were
f the file
Returns: TRUE if a pattern was read into buffer
*/

static BOOL
read_pattern(char *buffer, PCRE2_SIZE *length, FILE *f)
{
*buffer = '\0';
*length = read_one_line(buffer, *length, f);
if (*length > 0 && buffer[*length-1] == '\n') *length = *length - 1;
if (posix_pattern_file && *length > 0 && buffer[*length-1] == '\r')
{
*length = *length - 1;
if (*length == 0) return TRUE;
}
return (*length > 0 || *buffer == '\n');
}

/*************************************************
* Find end of line *
Expand Down Expand Up @@ -3598,6 +3628,7 @@ switch(letter)
case N_NOJIT: use_jit = FALSE; break;
case N_ALLABSK: extra_options |= PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK; break;
case N_NO_GROUP_SEPARATOR: group_separator = NULL; break;
case N_POSIX_PATFILE: posix_pattern_file = TRUE; break;
case 'a': binary_files = BIN_TEXT; break;
case 'c': count_only = TRUE; break;
case N_POSIX_DIGIT: posix_digit = TRUE; break;
Expand Down Expand Up @@ -3808,11 +3839,15 @@ else
filename = name;
}

while ((patlen = read_one_line(buffer, sizeof(buffer), f)) > 0)
while ((patlen = sizeof(buffer)) && read_pattern(buffer, &patlen, f))
{
while (patlen > 0 && isspace((unsigned char)(buffer[patlen-1]))) patlen--;
if (!posix_pattern_file)
{
while (patlen > 0 && isspace((unsigned char)(buffer[patlen-1]))) patlen--;
}

linenumber++;
if (patlen == 0) continue; /* Skip blank lines */
if (!posix_pattern_file && patlen == 0) continue; /* Skip blank lines */

/* Note: this call to add_pattern() puts a pointer to the local variable
"buffer" into the pattern chain. However, that pointer is used only when
Expand Down
1 change: 1 addition & 0 deletions testdata/grepinputv
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces
26 changes: 26 additions & 0 deletions testdata/grepoutput
Original file line number Diff line number Diff line change
Expand Up @@ -464,6 +464,7 @@ The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces
RC=0
---------------------------- Test 52 ------------------------------
fox jumps
Expand Down Expand Up @@ -1169,6 +1170,7 @@ The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces
RC=0
---------------------------- Test 146 -----------------------------
(standard input):A123B
Expand Down Expand Up @@ -1253,3 +1255,27 @@ RC=0
36-sixteen
37-seventeen
RC=0
---------------------------- Test 154 -----------------------------
RC=1
---------------------------- Test 155 -----------------------------
RC=1
---------------------------- Test 156 -----------------------------
The quick brown
fox jumps
over the lazy dog.
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces
RC=0
---------------------------- Test 157 -----------------------------
RC=0
---------------------------- Test 158 -----------------------------
trailing spaces
RC=0
---------------------------- Test 159 -----------------------------
trailing spaces
RC=0
Expand Down

5 comments on commit c63d7c9

@diizzyy
Copy link
Contributor

@diizzyy diizzyy commented on c63d7c9 Jun 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carenas This commit casues pcre2_grep_test to start failing on FreeBSD 14.0 (amd64)

@carenas
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start failing on FreeBSD 14.0

I don't see how that could be, and neither can reproduce the failure.

@diizzyy: do you have any more details that might help explain the reported issue?

@diizzyy
Copy link
Contributor

@diizzyy diizzyy commented on c63d7c9 Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, 10.44 with following PRs and commits backported (using CMake and amd64):

fba4a6a5e8a7895db7e0efcbcdb7087a04a00322 - PR 366
57906628d7babd27c01eb1c085d3e0cdd512189a
bb8e7e0c728ca10fa21fbc306a7f2ae4205e8618
a80920509e49865405477244c294fb3d8864375e
3b90149f3c9aca9fe8c2aa623a18e18c089dc449
c63d7c992ef1bbf64ef93e0e8e551ed29fd988e7

-- PCRE2-10.45 configuration summary:
--
--   Install prefix .................. : /usr/local
--   C compiler ...................... : /usr/bin/cc
--   C compiler flags ................ : -O2 -pipe -march=tigerlake  -fstack-protector-strong -fno-strict-aliasing -O2 -pipe -march=tigerlake  -fstack-protector-strong -fno-strict-aliasing  -DNDEBUG
--
--   Build 8 bit PCRE2 library ....... : ON
--   Build 16 bit PCRE2 library ...... : ON
--   Build 32 bit PCRE2 library ...... : ON
--   Enable JIT compiling support .... : ON
--   Use SELinux allocator in JIT .... : IGNORE
--   Enable Unicode support .......... : ON
--   Newline char/sequence ........... : LF
--   \R matches only ANYCRLF ......... : OFF
--   \C is disabled .................. : OFF
--   EBCDIC coding ................... : OFF
--   EBCDIC coding with NL=0x25 ...... : OFF
--   Rebuild char tables ............. : OFF
--   Internal link size .............. : 2
--   Maximum variable lookbehind ..... : 255
--   Parentheses nest limit .......... : 250
--   Heap limit ...................... : 20000000
--   Match limit ..................... : 10000000
--   Match depth limit ............... : MATCH_LIMIT
--   Build shared libs ............... : ON
--   Build static libs ............... : OFF
--      with PIC enabled ............. : OFF
--   Build pcre2grep ................. : ON
--   Enable JIT in pcre2grep ......... : ON
--   Enable callouts in pcre2grep .... : ON
--   Enable callout fork in pcre2grep. : ON
--   Buffer size for pcre2grep ....... : 20480
--   Build tests (implies pcre2test .. : ON
--                and pcre2grep)
--   Link pcre2grep with libz ........ : Library not found
--   Link pcre2grep with libbz2 ...... : Library not found
--   Link pcre2test with libeditline . : Library not found
--   Link pcre2test with libreadline . : Library not found
--   Support Valgrind .................: OFF
--   Use %zu and %td ..................: AUTO
2/4 Testing: pcre2_grep_test
2/4 Test: pcre2_grep_test
Command: "/bin/sh" "/usr/ports/devel/pcre2/work/.build/pcre2_grep_test.sh"
Directory: /usr/ports/devel/pcre2/work/.build
"pcre2_grep_test" start time: Jun 25 00:55 CEST
Output:
----------------------------------------------------------
Testing pcre2grep version 10.45-DEV 2024-06-09
Testing pcre2grep main features
--- /usr/ports/devel/pcre2/work/pcre2-10.44/testdata/grepoutput 2024-06-25 00:55:34.453341000 +0200
+++ testtrygrep 2024-06-25 00:55:50.448674000 +0200
@@ -495,11 +495,13 @@
 ./testdata/grepinput8:0
 ./testdata/grepinputM:0
 ./testdata/grepinputv:1
+./testdata/grepinputv.orig:1
 ./testdata/grepinputx:0
 RC=0
 ---------------------------- Test 57 -----------------------------
 ./testdata/grepinput:456
 ./testdata/grepinputv:1
+./testdata/grepinputv.orig:1
 RC=0
 ---------------------------- Test 58 -----------------------------
 PATTERN at the start of a line.
@@ -827,7 +829,7 @@
 37220,12
 RC=0
 ---------------------------- Test 113 -----------------------------
-480
+483
 RC=0
 ---------------------------- Test 114 -----------------------------
 testdata/grepinput:469
@@ -835,18 +837,20 @@
 testdata/grepinput8:0
 testdata/grepinputM:2
 testdata/grepinputv:3
+testdata/grepinputv.orig:3
 testdata/grepinputx:6
-TOTAL:480
+TOTAL:483
 RC=0
 ---------------------------- Test 115 -----------------------------
 testdata/grepinput:469
 testdata/grepinputM:2
 testdata/grepinputv:3
+testdata/grepinputv.orig:3
 testdata/grepinputx:6
-TOTAL:480
+TOTAL:483
 RC=0
 ---------------------------- Test 116 -----------------------------
-478
+481
 RC=0
 ---------------------------- Test 117 -----------------------------
 469
@@ -854,8 +858,9 @@
 0
 2
 3
+3
 6
-480
+483
 RC=0
 ---------------------------- Test 118 -----------------------------
 testdata/grepinput3
<end of output>
Test time =   0.27 sec
----------------------------------------------------------
Test Failed.
"pcre2_grep_test" end time: Jun 25 00:55 CEST
"pcre2_grep_test" time elapsed: 00:00:00
----------------------------------------------------------

Adding 7e14196 and 9a51f31 doesn't fix the failing test

@carenas
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't fix the failing test

correct, because the issue was created by the backporting.

Does deleting testdata/grepinputv.orig fix it?

@diizzyy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, cleaning up the orig files fixes it, thanks and sorry for the noise!

Please sign in to comment.