[comp.lang.perl] Is this a Bug?

worley@compass.com (Dale Worley) (07/14/90)

The program

	/\d{5,5}/;

gets the error message

	Can't do {n,0} at /compass/c/worley/time/aa line 1.

As far as I can tell from the manual, this is legal.

Dale Worley		Compass, Inc.			worley@compass.com
--
The living dead don't NEED to solve word problems.

worley@compass.com (Dale Worley) (08/09/90)

I tried to write a program with the following regexp:

	/(^\s*$)|(^---)/

That is, match any line containing only whitespace, or beginning with
'---'.  (Are ^ and $ allowed other than at the beginning or end of the
regexp?)  Perl gives the strange error message:

	/(^\s*|(^---)/: unmatched () in regexp at ss line 3.

Where did the missing ')' go?

Actually, it was probably assumed to be part of a '$)' variable.  (Can
one use '$/' as a variable is a regexp?)

What is going on here?  What *should* be going on here?

Dale Worley		Compass, Inc.			worley@compass.com
--
LA, truth to tell, is not much different from a pretty girl with the clap.

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (08/11/90)

In article <1990Aug9.155120.2703@uvaarpa.Virginia.EDU> worley@compass.com writes:
: I tried to write a program with the following regexp:
: 
: 	/(^\s*$)|(^---)/
: 
: That is, match any line containing only whitespace, or beginning with
: '---'.  (Are ^ and $ allowed other than at the beginning or end of the
: regexp?)  Perl gives the strange error message:
: 
: 	/(^\s*|(^---)/: unmatched () in regexp at ss line 3.
: 
: Where did the missing ')' go?
: 
: Actually, it was probably assumed to be part of a '$)' variable.  (Can
: one use '$/' as a variable is a regexp?)
: 
: What is going on here?  What *should* be going on here?

It was misinterpreting $) in patterns as a variable.  At patchlevel 27
it's interpreted correctly as an end of line check and a terminating paren.
Which means you can't interpolate $) into a pattern directly.

$/ has never been a problem.

By the way, it's more efficient to factor out the ^ to the front:

	/^(\s*$|---)/

The reason for this is that it then knows it doesn't have to start looking
at every single position of the input string.  I suppose I should make it
do this optimization itself...

It's probably also faster to put the literal string before the *:

	/^(---|\s*$)/

This will be less of a problem after patchlevel 27, but it still helps
some, unless almost all your strings are blank.

Larry

worley@compass.com (Dale Worley) (08/13/90)

   X-Name: Larry Wall

   It was misinterpreting $) in patterns as a variable.  At patchlevel 27
   it's interpreted correctly as an end of line check and a terminating paren.
   Which means you can't interpolate $) into a pattern directly.

   $/ has never been a problem.

Well, what exactly are the rules for which variables can be used in
regexps and which can't?  That is, why is interpreting "$)" as e-o-l
and paren correct and interpreting it as a variable incorrect?

I guess I don't really need an answer here, but I hope that the book
will be enough of a language reference that all such questions will be
answered by it.

Dale Worley		Compass, Inc.			worley@compass.com
--
"I have the same insecurities as Woody Allen."
"Yes, but he's paid more for having them."

white@cg-atla.UUCP (Frank ) (10/05/90)

	I am using 'perl' PL18. Running the following script
causes "WORD WITH PARENS" to be printed. What's the difference?
----------------- cut here -------------------------------------------
#!/usr/local/bin/perl

$_ = "This line contains a\nword beginning a line";
if ( /^word/ ) {
	print "WORD NO PARENS\n";
} elsif ( /^(word)/ ) {
	print "WORD WITH PARENS\n";	# I get this message !!
}
----------------- cut here -------------------------------------------
				Chip White (Uunet!samsung!cg-atla!white)
				Principal Software Engineer
				AGFA Compugraphic Division
				200 Ballardvale Street
				Wilmington, Massachusetts 01887
				MS-200-3-7K
				Phone:     (508) 658-5600 (x5440)
				CompuDial: (508) 658-0200 (x5440)
-- 
Chip White 			      
AGFA Compugraphic	    ...!{decvax,samsung}!cg-atla!white
200 Ballardvale St.	               
Wilmington, Mass. 01887     		

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (10/06/90)

In article <9134@cg-atla.UUCP> white@cg-atla.UUCP (Frank ) writes:
: 
: 	I am using 'perl' PL18. Running the following script
: causes "WORD WITH PARENS" to be printed. What's the difference?
: ----------------- cut here -------------------------------------------
: #!/usr/local/bin/perl
: 
: $_ = "This line contains a\nword beginning a line";
: if ( /^word/ ) {
: 	print "WORD NO PARENS\n";
: } elsif ( /^(word)/ ) {
: 	print "WORD WITH PARENS\n";	# I get this message !!
: }

This is documented behavior at patchlevel 18--the man page says you can't
expect ^ to behave consistently in mid-string if $* isn't set.

As of the next patch, ^ should never match in mid string unless $* is set.

Larry

merlyn@iwarp.intel.com (Randal Schwartz) (10/06/90)

In article <9134@cg-atla.UUCP>, white@cg-atla (Frank ) writes:
| 
| 	I am using 'perl' PL18. Running the following script
| causes "WORD WITH PARENS" to be printed. What's the difference?
| ----------------- cut here -------------------------------------------
| #!/usr/local/bin/perl
| 
| $_ = "This line contains a\nword beginning a line";
| if ( /^word/ ) {
| 	print "WORD NO PARENS\n";
| } elsif ( /^(word)/ ) {
| 	print "WORD WITH PARENS\n";	# I get this message !!
| }

Quoting from perl(1):

     By default, the ^ character is only guaranteed to  match  at
     the beginning of the string, the $ character only at the end
     (or before the newline at the end)  and  perl  does  certain
     optimizations  with  the assumption that the string contains
     only one line.  The behavior of ^ and $ on embedded newlines
                     ============================================
     will  be  inconsistent.   You  may, however, wish to treat a
     =======================
     string as a multi-line buffer, such that the  ^  will  match
     after any newline within the string, and $ will match before
     any newline.  At the cost of a little more overhead, you can
     do this by setting the variable $* to 1.  Setting it back to
     0 makes perl revert to its old behavior.

The "Fine" Manual says all.

++$*; $_ = "\nJust another Perl hacker,"; /^J.*/; print $&
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

phillips@cs.ubc.ca (George Phillips) (10/06/90)

In article <9134@cg-atla.UUCP> white@cg-atla.UUCP (Frank ) writes:
>
>	I am using 'perl' PL18. Running the following script
>causes "WORD WITH PARENS" to be printed. What's the difference?
>----------------- cut here -------------------------------------------
>#!/usr/local/bin/perl
>
>$_ = "This line contains a\nword beginning a line";
>if ( /^word/ ) {
>	print "WORD NO PARENS\n";
>} elsif ( /^(word)/ ) {
>	print "WORD WITH PARENS\n";	# I get this message !!
>}

Yep, this is a bug.  There's even a passage in the manual page which
says, more or less, that ^ does not necessarily work as advertised.
If you're using ^ in a regular expression and you're not sure if
the string has a newline in it, you'd better do something like:

if (/^regexp/ && $` eq "") { # yep, it really did do an anchored match

So here is a fixed version of your script:

$_ = "This line contains a\nword beginning a line";
if ( /^word/ ) {
   print "WORD NO PARENS\n";
} elsif ( /^(word)/ && $` eq "" ) {
   print "WORD WITH PARENS\n"; # I get this message !!
}

--
George Phillips phillips@cs.ubc.ca {alberta,uw-beaver,uunet}!ubc-cs!phillips