[comp.lang.perl] how long is a {0,m} match?

weisberg@hpccc.HP.COM (Len Weisberg) (08/02/90)

Here's another bit of confusing regexp behavior:

------------------------------------------------------------------------------
## perl test program:

sub try {
  local($pat, $str) = @_;
  $str =~ /$pat/ ;
  print "\$pat=/$pat/, \t\$str=$str, \t\$&=$&<<\n";
}


&try ( "a+", "aaay");
&try ( "a+", "xaaay");
print "\n";

&try ( "a{1,}", "aaay");
&try ( "a{1,}", "xaaay");
print "-----------------\n";

&try ( "a{1,2}", "aaay");
&try ( "a{1,2}", "xaaay");
print "\n";

&try ( "a{1,4}", "aaay");
&try ( "a{1,4}", "xaaay");
print "-----------------\n";

&try ( "a*", "aaay");
&try ( "a*", "xaaay");
print "\n";

&try ( "a{0,}", "aaay");
&try ( "a{0,}", "xaaay");
print "-----------------\n";

&try ( "a?", "aaay");
&try ( "a?", "xaaay");
print "\n";

&try ( "a{0,1}", "aaay");
&try ( "a{0,1}", "xaaay");
print "-----------------\n";

&try ( "a{0,4}", "aaay");
&try ( "a{0,4}", "xaaay");
print "\n";




-----------------------------------------------------------------------------
results:

$pat=/a+/, 	$str=aaay, 	$&=aaa<<
$pat=/a+/, 	$str=xaaay, 	$&=aaa<<

$pat=/a{1,}/, 	$str=aaay, 	$&=aaa<<
$pat=/a{1,}/, 	$str=xaaay, 	$&=aaa<<
-----------------
$pat=/a{1,2}/, 	$str=aaay, 	$&=aa<<
$pat=/a{1,2}/, 	$str=xaaay, 	$&=aa<<

$pat=/a{1,4}/, 	$str=aaay, 	$&=aaa<<
$pat=/a{1,4}/, 	$str=xaaay, 	$&=aaa<<
-----------------
$pat=/a*/, 	$str=aaay, 	$&=aaa<<
$pat=/a*/, 	$str=xaaay, 	$&=<<

$pat=/a{0,}/, 	$str=aaay, 	$&=aaa<<
$pat=/a{0,}/, 	$str=xaaay, 	$&=<<
-----------------
$pat=/a?/, 	$str=aaay, 	$&=a<<
$pat=/a?/, 	$str=xaaay, 	$&=<<

$pat=/a{0,1}/, 	$str=aaay, 	$&=a<<
$pat=/a{0,1}/, 	$str=xaaay, 	$&=<<
-----------------
$pat=/a{0,4}/, 	$str=aaay, 	$&=aaa<<
$pat=/a{0,4}/, 	$str=xaaay, 	$&=<<

------------------------------------------------------------------------------

If there is some general principle about longest match, 
it seems to break when 0 repetitions match, but only when
the match is not at the start of the string.

Does this make sense?   Is it a bug?

( perl -v gives:
$Header: perly.c,v 3.0.1.5 90/03/27 16:20:57 lwall Locked $
Patch level: 18
     etc...)

Any enlightenment appreciated.
Thanks,

- Len Weisberg - HP Corp Computing & Services - weisberg@corp.HP.COM

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (08/04/90)

In article <12170008@hpccc.HP.COM> weisberg@hpccc.HP.COM (Len Weisberg) writes:
: If there is some general principle about longest match, 
: it seems to break when 0 repetitions match, but only when
: the match is not at the start of the string.
: 
: Does this make sense?   Is it a bug?

Yes, it makes sense, and no, it isn't a bug.

The longest match rule is subordinate to two other rules:

	1. Find the leftmost match.

	2. Given 1, find the first alternative that matches.

	3. Given 1 and 2, find the longest match.

(Rules 2 and 3 apply recursively inside parentheses.)

Any item that can match 0 or more times will do so at the beginning of
a string if the first thing in the string isn't a match.

Larry

weisberg@hpccc.HP.COM (Len Weisberg) (08/04/90)

A few days ago I asked a question in a rather long form.
Let me try again a bit more concisely:


Here's another bit of confusing regexp behavior:

------------------------------------------------------------------------------
pattern:        applied to:  matches:

/a{1,4}/	'aaay'	     'aaa'
/a{1,4}/	'xaaay'	     'aaa'
/a{0,4}/	'aaay' 	     'aaa'
/a{0,4}/	'xaaay'	     ''
( ... more examples that fit the same pattern)
------------------------------------------------------------------------------

If there is some general principle about longest match, 
it seems to break when 0 repetitions match, but only when
the match is not at the start of the string.

Does this make sense?   Is it a bug?

Any enlightenment appreciated.
Thanks,

- Len Weisberg - HP Corp Computing & Services - weisberg@corp.HP.COM