[comp.lang.perl] Weird grep bug ?

pem@frankland-river.aaii.oz.au (pem) (01/19/90)

I have noticed what seems to be a strange bug in grep.
I am not quite sure what is going on -- the problem seems only to
occur when I use a 'do <file>' statement to include a header.

I was wondering if anyone else has seen this something like this before.

Here is a small program which demonstrates the problem on my machine:
(a sun 3/60 running perl 3.0 pl8)

All it does is use the built in grep command to match at the start of
an array of lines, returning a new array of matching lines.
If I include 'getopts.pl', for example, the grep (wrongly) matches
every line.  Otherwise it works as I would have expected.

-----------cut here-----------
#!/usr/bin/perl

# comment the next line and the program behaves fine!
do 'getopts.pl';

$" = "\n";

@dump_lines = ( 
    "f:/, 0", "f:/usr, 1",
    "h:/, 3", "h:/usr, 4",
    "f:/, 5", "f:/usr, 6",
    "h:/, 7", "h:/usr, 8"
    );

for (;;) {
    print "\n(note: if you type 'f' you would expect to get only 4 matching lines)\n";
    print "Which dump ? (name eg. 'f' or '?' or 'q' to quit) ";
    chop($_ = <STDIN>);
    if (/^[Qq]$/)	{exit 0;}
    elsif (/^\?$/)	{print("@dump_lines"); next;}
    /^\s*(\S+)/;
    @entry = grep(/^$1/, @dump_lines);
    print "found the following entries in the log:\n@entry\n";
}

------------------
Paul E. Maisano
Australian Artificial Intelligence Institute
1 Grattan St. Carlton, Vic. 3053, Australia
Ph: +613 663-7922  Fax: +613 663-7937
Email: pem@aaii.oz.au

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (01/19/90)

In article <845@frankland-river.aaii.oz.au> pem@frankland-river.aaii.oz.au (pem) writes:
: I have noticed what seems to be a strange bug in grep.
: I am not quite sure what is going on -- the problem seems only to
: occur when I use a 'do <file>' statement to include a header.
: 
: I was wondering if anyone else has seen this something like this before.
: 
: Here is a small program which demonstrates the problem on my machine:
: (a sun 3/60 running perl 3.0 pl8)
: 
: All it does is use the built in grep command to match at the start of
: an array of lines, returning a new array of matching lines.
: If I include 'getopts.pl', for example, the grep (wrongly) matches
: every line.  Otherwise it works as I would have expected.
: 
: -----------cut here-----------
: #!/usr/bin/perl
: 
: # comment the next line and the program behaves fine!
: do 'getopts.pl';
: 
: $" = "\n";
: 
: @dump_lines = ( 
:     "f:/, 0", "f:/usr, 1",
:     "h:/, 3", "h:/usr, 4",
:     "f:/, 5", "f:/usr, 6",
:     "h:/, 7", "h:/usr, 8"
:     );
: 
: for (;;) {
:     print "\n(note: if you type 'f' you would expect to get only 4 matching lines)\n";
:     print "Which dump ? (name eg. 'f' or '?' or 'q' to quit) ";
:     chop($_ = <STDIN>);
:     if (/^[Qq]$/)	{exit 0;}
:     elsif (/^\?$/)	{print("@dump_lines"); next;}
:     /^\s*(\S+)/;
:     @entry = grep(/^$1/, @dump_lines);
:     print "found the following entries in the log:\n@entry\n";
: }

This is a subtle little semantic difficulty caused by an optimization.

The immediate cause of your problem is the use of $1 inside a pattern
that may invalidate the meaning of $1.  It's always a little dangerous
to do that sort of thing, especially on a pattern that's evaluated
more than once.  What if $1 contained ()?

In this case, there's no () in $1, so ordinarily you'd get away with it.
But the decision in pattern matching whether to remember a new $1, $2, etc
is tied (currently, anyway) to whether it will remember $&, $` and $'.
(The offsets for returning these are actually kept in retrieval info for
$0, of all places.  Which is why $0 gets clobbered by pattern matches.
Someday I'll fix that.)  Anyway, if perl sees a $&, $` or $' anywhere
in your program, it assumes that it has to recreate $0, $1, etc.

But wait, you say, those variables don't occur, even in getopt.pl.  True.
But if the program contains an eval, perl has to assume that a lot
of variables might be there that it hasn't seen yet.  Now "do FILENAME" is
a kind of eval.  So when you included that line, perl had to initialize
space for $& et al.  And because it did that, the /^$1/ figured it had
to set up for $& et all to return the correct info.  So it clobbered $1.
Cute, eh?

In the ordinary run of things, you'd get away with that, even so, because
the old $1 would be interpolated before the pattern (a run-time pattern)
was compiled.  But grep evaluates its first argument repeatedly, and since
it's a run-time pattern, it has to recompile the pattern, so on the
second array element, $1 is no longer valid.

The obvious quick fix is to make perl reset the scope to the outer pattern
match before each iteration of grep.  In fact, that'll be in patch 9.

The obvious quick workaround is to put $1 into a temp variable and interpolate
that:

	($which) = /^\s*(\S+)/;
	@entry = grep(/^$which/, @dump_lines);

That's probably more readable anyway.

If you know the grep is only going to happen once, it would behoove you
to add an 'o' modifier to avoid unnecessary recompilations of the pattern.
But since you have it in a loop, the possibility remains that you might
want to change it.  If you were going to be grepping many things, it might
be more efficient to use the 'o' modifier inside an eval:

	eval '@entry = grep(/^$which/o, @dump_lines)';

This just compiles the pattern once for the grep, but recompiles each time
the grep is run.  Of course, for a small list, who cares.

Larry