[comp.lang.perl] Regexps

merlyn@iwarp.intel.com (Randal Schwartz) (07/04/90)

In article <1990Jul3.144552.5407@uvaarpa.Virginia.EDU>, worley@compass (Dale Worley) writes:
| Also, this illustrates one thing I don't like about regexps -- people
| write code which depends on the order in which the alternatives are
| matched.

Regexps are well-defined and extremely predicatable about their
"leftmost wildcard matches the most possible iterations" behavior.
It's no more silly than presuming that "a" really matches "a", is it?

|	    For instance, in the regexp above, the case where [^\0]?
| matches the null string can always match, so it implicitly depends on
| the fact that the non-null match is tried first. 

Ugh.  It does this *by* *definition*.  No assumption necessary.

|						    On the other hand,
| it's hard (impossible?) to write a regexp which matches in only the
| right way without some way to specify context for the match (shades of
| \: and \;!!!).

It's probably durn near impossible, and an unnecessary burden on the
part of the programmer.  For example, what is the context for matching
/ab.*cd/ in the string "aaababfoocdcdce"?

I frequently run up against * and + matching a bit too much, and want
to "back it off" a bit, but have found the problem without general
solution.

For example, matching the first two-digit number in a line, discarding
all text before it.  I want to write:

	s/.*(\d\d)/\1/;

but instead am forced to do something like:

	/\d\d.*/; $_ = $&;

The first expression matches the *last* occurrance of two digits.  (I
know... the second one discards the newline... gimme a break.)

Regexps... your best friend... your worst enemy.  You decide. :-)

s//Just another Perl hacker,/; print
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

tneff@bfmny0.BFM.COM (Tom Neff) (07/04/90)

In article <1990Jul3.184351.3820@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes:
>I frequently run up against * and + matching a bit too much, and want
>to "back it off" a bit, but have found the problem without general
>solution.
>
>For example, matching the first two-digit number in a line, discarding
>all text before it.  I want to write:
>
>	s/.*(\d\d)/\1/;
>
>but instead am forced to do something like:
>
>	/\d\d.*/; $_ = $&;

Try

	/\d\d.*\n?$/ && $_ = $&;

or my fave

	/\d{2}/ && substr($_,0,length($`)) = '';

This doesn't lose the newline and you don't care what comes after.
(I wish Larry'd make $` and $' lvalues, hehe.)



-- 
"My God, Thiokol, when do you      \\    Tom Neff
want me to launch?  Next April?"   \\    tneff@bfmny0.BFM.COM

worley@compass.com (Dale Worley) (07/05/90)

   From: merlyn@iwarp.intel.com (Randal Schwartz)

   Regexps are well-defined and extremely predicatable about their
   "leftmost wildcard matches the most possible iterations" behavior.

OK, where is it written down?  Also, you forgot to mention that |
attempts to match the left alternative before the right.  (I mention
that to show that it's not so easy to completely define dynamic
behavior.)

   It's no more silly than presuming that "a" really matches "a", is it?

Well, every exposition of "regular expressions" I've ever seen states
that "a" matches only "a", but I've never seen *in print* anyone
stating that regexps match the most possible first.

Also, I have a general preference for declarative definitions of
things above dynamic definitions.

Dale Worley		Compass, Inc.			worley@compass.com
--
It was peculiarly satisfying to watch the reactions at the truck stops
along the way to this bunch of men with hippie long hair, biker
leather jackets, and a nose ring who nonetheless were warm,
intelligent, friendly and polite, and paid with credit cards - we blew
all their possible-stereotype fuses.

merlyn@iwarp.intel.com (Randal Schwartz) (07/06/90)

In article <1990Jul5.135434.11673@uvaarpa.Virginia.EDU>, worley@compass (Dale Worley) writes:
|    Regexps are well-defined and extremely predicatable about their
|    "leftmost wildcard matches the most possible iterations" behavior.
| 
| OK, where is it written down?  Also, you forgot to mention that |
| attempts to match the left alternative before the right.  (I mention
| that to show that it's not so easy to completely define dynamic
| behavior.)

Say "man egrep" on any sane machine.  Or "man ed" for an even older
reference.

I admit, I don't see any "left alternative before right" stuff in
there.  I guess I've been reading the code too much (the *ultimate*
spec, as any UNIX hacker knows...).

Just another RE hacker,
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

worley@compass.com (Dale Worley) (07/06/90)

   From: merlyn@iwarp.intel.com (Randal Schwartz)

   In article <1990Jul5.135434.11673@uvaarpa.Virginia.EDU>, worley@compass (Dale Worley) writes:
   | OK, where is it written down?

   Say "man egrep" on any sane machine.  Or "man ed" for an even older
   reference.

Well so it does.  However, that's new with SunOS 4.0.  On 3.5 "man
egrep" said there was no such topic -- you had to say "man grep".
(Whether you consider SunOS "sane" is a matter of debate...)  I will
also note that the Perl manual page nowhere mentions the egrep man
page -- it mentions the "version 8 regexp routines", which I don't
have, whatever they are.  ("man ed" is useless, because ed's regular
expressions are only a tiny subset of Perl's.)

But how about putting this stuff into the manual page?  I'm forever
trying to figure out exactly what the rules are for regexps in program
foo, because they're rarely exactly documented, or given as "exactly
like program bar except when there's a new moon and you're wearing a
yellow necktie..."  It'd be nice if Perl did it right.

Gripe, gripe,

Dale Worley		Compass, Inc.			worley@compass.com
--
Be nice.
If you can't be nice, be good.
If you can't be good, be careful.
If you can't be careful, name it after me.