weisberg@hpccc.HP.COM (Len Weisberg) (08/02/90)
Here's another bit of confusing regexp behavior:
------------------------------------------------------------------------------
## perl test program:
sub try {
local($pat, $str) = @_;
$str =~ /$pat/ ;
print "\$pat=/$pat/, \t\$str=$str, \t\$&=$&<<\n";
}
&try ( "a+", "aaay");
&try ( "a+", "xaaay");
print "\n";
&try ( "a{1,}", "aaay");
&try ( "a{1,}", "xaaay");
print "-----------------\n";
&try ( "a{1,2}", "aaay");
&try ( "a{1,2}", "xaaay");
print "\n";
&try ( "a{1,4}", "aaay");
&try ( "a{1,4}", "xaaay");
print "-----------------\n";
&try ( "a*", "aaay");
&try ( "a*", "xaaay");
print "\n";
&try ( "a{0,}", "aaay");
&try ( "a{0,}", "xaaay");
print "-----------------\n";
&try ( "a?", "aaay");
&try ( "a?", "xaaay");
print "\n";
&try ( "a{0,1}", "aaay");
&try ( "a{0,1}", "xaaay");
print "-----------------\n";
&try ( "a{0,4}", "aaay");
&try ( "a{0,4}", "xaaay");
print "\n";
-----------------------------------------------------------------------------
results:
$pat=/a+/, $str=aaay, $&=aaa<<
$pat=/a+/, $str=xaaay, $&=aaa<<
$pat=/a{1,}/, $str=aaay, $&=aaa<<
$pat=/a{1,}/, $str=xaaay, $&=aaa<<
-----------------
$pat=/a{1,2}/, $str=aaay, $&=aa<<
$pat=/a{1,2}/, $str=xaaay, $&=aa<<
$pat=/a{1,4}/, $str=aaay, $&=aaa<<
$pat=/a{1,4}/, $str=xaaay, $&=aaa<<
-----------------
$pat=/a*/, $str=aaay, $&=aaa<<
$pat=/a*/, $str=xaaay, $&=<<
$pat=/a{0,}/, $str=aaay, $&=aaa<<
$pat=/a{0,}/, $str=xaaay, $&=<<
-----------------
$pat=/a?/, $str=aaay, $&=a<<
$pat=/a?/, $str=xaaay, $&=<<
$pat=/a{0,1}/, $str=aaay, $&=a<<
$pat=/a{0,1}/, $str=xaaay, $&=<<
-----------------
$pat=/a{0,4}/, $str=aaay, $&=aaa<<
$pat=/a{0,4}/, $str=xaaay, $&=<<
------------------------------------------------------------------------------
If there is some general principle about longest match,
it seems to break when 0 repetitions match, but only when
the match is not at the start of the string.
Does this make sense? Is it a bug?
( perl -v gives:
$Header: perly.c,v 3.0.1.5 90/03/27 16:20:57 lwall Locked $
Patch level: 18
etc...)
Any enlightenment appreciated.
Thanks,
- Len Weisberg - HP Corp Computing & Services - weisberg@corp.HP.COM
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (08/04/90)
In article <12170008@hpccc.HP.COM> weisberg@hpccc.HP.COM (Len Weisberg) writes:
: If there is some general principle about longest match,
: it seems to break when 0 repetitions match, but only when
: the match is not at the start of the string.
:
: Does this make sense? Is it a bug?
Yes, it makes sense, and no, it isn't a bug.
The longest match rule is subordinate to two other rules:
1. Find the leftmost match.
2. Given 1, find the first alternative that matches.
3. Given 1 and 2, find the longest match.
(Rules 2 and 3 apply recursively inside parentheses.)
Any item that can match 0 or more times will do so at the beginning of
a string if the first thing in the string isn't a match.
Larry
weisberg@hpccc.HP.COM (Len Weisberg) (08/04/90)
A few days ago I asked a question in a rather long form. Let me try again a bit more concisely: Here's another bit of confusing regexp behavior: ------------------------------------------------------------------------------ pattern: applied to: matches: /a{1,4}/ 'aaay' 'aaa' /a{1,4}/ 'xaaay' 'aaa' /a{0,4}/ 'aaay' 'aaa' /a{0,4}/ 'xaaay' '' ( ... more examples that fit the same pattern) ------------------------------------------------------------------------------ If there is some general principle about longest match, it seems to break when 0 repetitions match, but only when the match is not at the start of the string. Does this make sense? Is it a bug? Any enlightenment appreciated. Thanks, - Len Weisberg - HP Corp Computing & Services - weisberg@corp.HP.COM