[comp.lang.perl] matching word boundaries with regexps

piet@cs.ruu.nl (Piet van Oostrum) (01/15/90)

It seems that matching word boundaries with \b in regexps doesn't work
properly.
I get random results. The following script should illustrate the problem.
Feed it with lines of the form: vote yes (or: vote no)
Or an I doing something wrong???

---------------------------------------------------------------- 
#! /usr/bin/perl

$yes = 0;
$no = 0;
$vote = 0;

while (<STDIN>) {
	chop;
	print "=$_=\n";
	if (/vote/i || /comp.text.tex/i) {
	    $vote ++;
	    $yes ++ if /\byes\b/i;
	    $no ++ if /\bno\b/i;
    	}
	 print "vote = $vote, yes = $yes, no = $no\n";
}
---------------------------------------------------------------- 
Piet* van Oostrum, Dept of Computer Science, Utrecht University,
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands.
Telephone: +31-30-531806   Uucp:   uunet!mcsun!hp4nl!ruuinf!piet
Telefax:   +31-30-513791   Internet:  piet@cs.ruu.nl   (*`Pete')

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (01/16/90)

In article <2295@ruuinf.cs.ruu.nl> piet@cs.ruu.nl (Piet van Oostrum) writes:
: It seems that matching word boundaries with \b in regexps doesn't work
: properly.
: I get random results. The following script should illustrate the problem.
: Feed it with lines of the form: vote yes (or: vote no)
: Or an I doing something wrong???

It's a bug.  I fixed it a week or two ago here, and it will be fixed in
patch 9.  There was a bad interaction between the code that handles
case insensitivity and the code that checks for \b-ness at the beginning
of a string.  I tried your test program and it works under the new version.

Soon.

Larry

jv@mh.nl (Johan Vromans) (01/16/90)

In article <2295@ruuinf.cs.ruu.nl> piet@cs.ruu.nl (Piet van Oostrum) writes:
> It seems that matching word boundaries with \b in regexps doesn't work
> properly.
[example deleted]

Knowing Piet* is running perl on HP-UX, I tried a little, and found
out that:
 - on VAX/Ultrix it behaves like expected
 - on HP-UX it fails.
 - it runs fine on HP-UX if the 'ignore case' spec is removed from the
   matches:
	$yes++ if /\byes\b/;
   So I think it has something to do with HP's NLS system (a wild
   guess, but - who knows?)

Johan
--
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------

jand@kuling.UUCP (Jan Dj{rv) (01/18/90)

In article <JV.90Jan15220501@mhres.mh.nl> jv@mh.nl (Johan Vromans) writes:
:In article <2295@ruuinf.cs.ruu.nl> piet@cs.ruu.nl (Piet van Oostrum) writes:
:> It seems that matching word boundaries with \b in regexps doesn't work
:> properly.
:[example deleted]
:
:Knowing Piet* is running perl on HP-UX, I tried a little, and found
:out that:
: - on VAX/Ultrix it behaves like expected
: - on HP-UX it fails.
: - it runs fine on HP-UX if the 'ignore case' spec is removed from the
:   matches:
:	$yes++ if /\byes\b/;
:   So I think it has something to do with HP's NLS system (a wild
:   guess, but - who knows?)
:
:Johan

Well, I tried the example on our HP 835 running HP-UX 7.0 and it worked
just fine. (O.B.S. This is not to be taken as a defence of HP-UX. 7.0 was
installed last Friday, and the bugs keeps coming in :-( )

Probably there is something more subtle to the error.
I agree that Johans guess is a wild one, did you forget to put out
a :-) somewhere, Johan ?

	Jan D.