[comp.lang.perl] regexp and slice bugs

phillips@cs.ubc.ca (George Phillips) (06/29/90)

Under perl 3.0, patchlevel 18, the following script gives a wrong answer:

$block = "a\nd";
print $block =~ /^d/;		print "\n";
print $block =~ /^\144/;	print "\n";

The second regexp matches.  It seems like the bug is in scanconst() since
it doesn't understand octal escapes and therefore doesn't give the right
answer to scanpat().  On the other hand, scanconst() may not be to blame
but its minor failure lets a bug crawl out of somewhere else.

The other bug can come up in many ways, but you can reproduce it 
(and a core dump) with:

@out[0] + 1;

I haven't found a fix here either, but it seems that the array slice 
routine ends up returning a NULL pointer for the single element slice
('cause that's what afetch() returned).  Add blindly converts the top 
two stack elements to numbers and crashes when it hits the NULL.  Maybe
do_slice() shouldn't put Nullstr on the stack or maybe add (and others)
should be more careful.  Who knows?  Larry!  Guide us!

A challenge:  produce some useful code that gets killed because of
the array slice "bug" (bonus points if its output is you-know-what).
The best I could do was:

perl -D14 -e 'foreach $f (@out[1..3]) { print $f + 1; }'

But that runs if you turn off debugging.


George Phillips phillips@cs.ubc.ca {alberta,uw-beaver,uunet}!ubc-cs!phillips

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (06/30/90)

In article <8483@ubc-cs.UUCP> phillips@cs.ubc.ca (George Phillips) writes:
: Under perl 3.0, patchlevel 18, the following script gives a wrong answer:
: 
: $block = "a\nd";
: print $block =~ /^d/;		print "\n";
: print $block =~ /^\144/;	print "\n";
: 
: The second regexp matches.  It seems like the bug is in scanconst() since
: it doesn't understand octal escapes and therefore doesn't give the right
: answer to scanpat().  On the other hand, scanconst() may not be to blame
: but its minor failure lets a bug crawl out of somewhere else.

From the manual:

     By default, the ^ character is only guaranteed to  match  at
     the beginning of the string, the $ character only at the end
     (or before the newline at the end)  and  perl  does  certain
     optimizations  with  the assumption that the string contains
     only one line.  The behavior of ^ and $ on embedded newlines
     will  be  inconsistent.   You  may, however, wish to treat a
     string as a multi-line buffer, such that the  ^  will  match
     after any newline within the string, and $ will match before
     any newline.  At the cost of a little more overhead, you can
     do this by setting the variable $* to 1.  Setting it back to
     0 makes perl revert to its old behavior.

: The other bug can come up in many ways, but you can reproduce it 
: (and a core dump) with:
: 
: @out[0] + 1;
: 
: I haven't found a fix here either, but it seems that the array slice 
: routine ends up returning a NULL pointer for the single element slice
: ('cause that's what afetch() returned).  Add blindly converts the top 
: two stack elements to numbers and crashes when it hits the NULL.  Maybe
: do_slice() shouldn't put Nullstr on the stack or maybe add (and others)
: should be more careful.  Who knows?  Larry!  Guide us!

I suppose I really should decide one way or the other.  It's one of
those stupid things I could argue either way.

Larry

phillips@cs.ubc.ca (George Phillips) (07/05/90)

In article <8547@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
:In article <8483@ubc-cs.UUCP> phillips@cs.ubc.ca (George Phillips) writes:
:: Under perl 3.0, patchlevel 18, the following script gives a wrong answer:
:: 
:: $block = "a\nd";
:: print $block =~ /^d/;		print "\n";
:: print $block =~ /^\144/;	print "\n";
:: 
:From the manual:
:
:     By default, the ^ character is only guaranteed to  match  at
:     the beginning of the string, the $ character only at the end
:     (or before the newline at the end)  and  perl  does  certain
:     optimizations  with  the assumption that the string contains
:     only one line.  The behavior of ^ and $ on embedded newlines
:     will  be  inconsistent.

Arg!  My apologies for not reading the manual more closely.  So
the only way to ensure an anchored match in the face of arbitrary
input is to do something like (/^foo/ && $` eq ""), right?  It
seems like a cheap hack like this could be applied internally to
guarantee that ^ and $ always, always anchor a pattern match.
Would it be preferable to make this the default behavior (i.e., no
manual page caveats) or should it be selectable by $* = 2 or
something?  I think that by default ^ and $ should only ever match
the beginning and the end of the string, but $* = 2 is fine by me.
I'll see if I can figure out how to fix things up.


George Phillips phillips@cs.ubc.ca {alberta,uw-beaver,uunet}!ubc-cs!phillips