[comp.sys.3b1] egrep on the 3b1 is weird!

kak@hico2.UUCP (Kris A. Kugel) (03/16/91)

Am I confused, or does egrep on the 3b1
handle multiple regular expressions incorrectly?

	egrep 'foo|bar'

seems to match '[fb][oa][or]' patterns,
instead of what I wanted (matching lines with "foo" or "bar")

                               Kris A. Kugel
                             ( 908 ) 842-2707
                      uunet!tsdiag.ccur.com!hico2!kak
                        {daver,ditka,zorch}!hico2!kak
                      internet: kak@hico2.westmark.com

david@twg.com (David S. Herron) (03/21/91)

In article <1268@hico2.UUCP> kak@hico2.UUCP (Kris A. Kugel) writes:
>Am I confused, or does egrep on the 3b1
>handle multiple regular expressions incorrectly?
>
>	egrep 'foo|bar'
>
>seems to match '[fb][oa][or]' patterns,
>instead of what I wanted (matching lines with "foo" or "bar")

Try

	egrep '(foo)|(bar)'

And read the F'ing manual a little more closely next time ...


-- 
<- David Herron, an MMDF & WIN/MHS guy, <david@twg.com>
<- Formerly: David Herron -- NonResident E-Mail Hack <david@ms.uky.edu>
<-
<- "MS-DOS? Where we're going we don't need MS-DOS." --Back To The Future

kak@hico2.UUCP (Kris A. Kugel) (03/22/91)

> <- David Herron, an MMDF & WIN/MHS guy, <david@twg.com>

In article <8783@gollum.twg.com>, david@twg.com (David S. Herron) writes:
> In article <1268@hico2.UUCP> kak@hico2.UUCP (Kris A. Kugel) writes:
> >Am I confused, or does egrep on the 3b1
> >handle multiple regular expressions incorrectly?
> >
> >	egrep 'foo|bar'
> >
> >seems to match '[fb][oa][or]' patterns,
> >instead of what I wanted (matching lines with "foo" or "bar")
> 
> Try
> 
> 	egrep '(foo)|(bar)'

I'll try this, thanks for the method.
  
> And read the F'ing manual a little more closely next time ...

No need to get hostile, I did after your message.  Carefully.

It says,

"3. Two regular expressions separated by | or by a
    new line match strings that are matched by either.
 4. A regular expression may be enclosed in parentheses () for grouping.

    The order of precedence of operators is [], then * ? + ,
    then concatenation, then | and new-line."

Now, if you want to complain that I missed something obvious
and well documented, fine.  If what you describe is really how
the feature is designed, then I claim that that is NOT obvious
from the description on the manual page.

Now, I could see from this description that maybe the pattern I
tried would match  "fo[bo]ar", but that was NOT the behavior
I remember seeing.  

I claim that either the manual page is very poorly written,
or else what I tried should have given me CLEARLY different
results than I got.

Now, to find the egrep on osu-cis . . . .

                               Kris A. Kugel
                             ( 908 ) 842-2707
                      uunet!tsdiag.ccur.com!hico2!kak
                        {daver,ditka,zorch}!hico2!kak
                      internet: kak@hico2.westmark.com

andrew@alice.att.com (Andrew Hume) (03/22/91)

In article <8783@gollum.twg.com>, david@twg.com (David S. Herron) writes:
~ In article <1268@hico2.UUCP> kak@hico2.UUCP (Kris A. Kugel) writes:
~ >Am I confused, or does egrep on the 3b1
~ >handle multiple regular expressions incorrectly?
~ >
~ >	egrep 'foo|bar'
~ >
~ >seems to match '[fb][oa][or]' patterns,
~ >instead of what I wanted (matching lines with "foo" or "bar")
~ 
~ Try
~ 
~ 	egrep '(foo)|(bar)'
~ 
~ And read the F'ing manual a little more closely next time ...
~ 
~ 
~ -- 
~ <- David Herron, an MMDF & WIN/MHS guy, <david@twg.com>
~ <- Formerly: David Herron -- NonResident E-Mail Hack <david@ms.uky.edu>
~ <-
~ <- "MS-DOS? Where we're going we don't need MS-DOS." --Back To The Future


	with respect, mr herron, kris apparently did read the manual
and his/her expression is perfectly valid. the parentheses in your retort
are superfluous. if after typing
	egrep 'foo|bar'
	foo
	bar
	far
	<ctrl-d>
you get only the lines
	foo
	bar
echoed back, then all is well. if anything different happens,
your egrep is buggered.


	andrew@research.att.com

guest@geech.ai.mit.edu (Guest Account) (03/23/91)

Since without any kind of quoting or special chars a regular
expression is 1 character:
	egrep 'foo|bar'
matches lines containing fooar and fobar.  That's why you need to use
parentheses.  You have to read the manual page for ed(1) to get the
rest of the story on regular expressions.  To quote the grep man page:
"Egrep accepts regular expressions as in ed(1), except..."

Daniel Guilderson
ryan@cs.umb.edu

tkacik@rphroy.ph.gmr.com (Tom Tkacik) (03/23/91)

In article <GUEST.91Mar22115440@geech.ai.mit.edu>, guest@geech.ai.mit.edu (Guest Account) writes:
|> Since without any kind of quoting or special chars a regular
|> expression is 1 character:
|> 	egrep 'foo|bar'
|> matches lines containing fooar and fobar.  That's why you need to use
|> parentheses.  You have to read the manual page for ed(1) to get the
|> rest of the story on regular expressions.  To quote the grep man page:

Sorry, try again.  Concatenation has higher precedence than '|'.

egrep 'foo|bar' will match either foo or bar, not fobar nor fooar.

egrep works like that here at work.  I have not yet tried it at home.  I will.
If it works as Kris A. Kugel says, then it must be busted.

-- 
Tom Tkacik				tkacik@clyde.cs.gmr.com
GM Research Labs			tkacik@kyzyl.mi.org
"I'm president of the United States, and I'm not going to eat anymore broccoli."
						--- George Bush

guy@cbnewsc.att.com (guy.r.berentsen) (03/23/91)

> |> 	egrep 'foo|bar'
> |> matches lines containing fooar and fobar.  That's why you need to use
> 
> egrep 'foo|bar' will match either foo or bar, not fobar nor fooar.
> 

On my unix pc at work (running 3.5) egrep 'foo|bar' matches foo or bar
(of course it also matches "fobar" and "fooar" since 
the later contains the string "foo" and the former 
contains the string "bar")

tkacik@kyzyl.mi.org (Tom Tkacik) (03/23/91)

In article <1268@hico2.UUCP>, kak@hico2.UUCP (Kris A. Kugel) writes:
> Am I confused, or does egrep on the 3b1
> handle multiple regular expressions incorrectly?
> 
> 	egrep 'foo|bar'
> 
> seems to match '[fb][oa][or]' patterns,
> instead of what I wanted (matching lines with "foo" or "bar")
 

According to the man page   egrep 'foo|bar'   should match any line
containing either 'foo' or 'bar'.  I just tried it on kyzyl, and
it seems to work properly.  It does not match any line containing
either 'boo' or 'far', (or other incorrect pattern mentioned above).

I suggest that Kris check again.

Someone stated that the | has higher precedence than concatination,
and that the above is equivalent to    egrep 'fo(o|b)ar', and will
match  either 'fooar' or 'fobar'.
It will match these, but only because they are a subset of
the patterns it will match.
-- 
Tom Tkacik                |
tkacik@kyzyl.mi.org       |     To rent this space, call 1-800-555-QUIP.
...!rphroy!kyzyl!tkacik   |

kak@hico2.UUCP (Kris A. Kugel) (03/23/91)

> > In article <1268@hico2.UUCP> kak@hico2.UUCP (Kris A. Kugel) writes:
> > >Am I confused, or does egrep on the 3b1
> > >handle multiple regular expressions incorrectly?
> > >
> > >	egrep 'foo|bar'
> > >
> > >seems to match '[fb][oa][or]' patterns,
> 
> Now, I could see from this [manual] description that maybe the
> pattern I tried would match  "fo[bo]ar", but that was NOT the
> behavior I remember seeing.  


Well, evidently the problem is not as simple as I described it.
I'd used the "foo|bar" example as a substitute phrase for what
I was really trying.  Bruce Lilly tried this literally with a
good sample file, and with this simple case egrep seems to work
correctly.  And his test works correctly on my machine, too.

If I recall correctly, I saw the (I say) incorrect behavior
when I trying to find a newsarticle about some subject
(I didn't remember the title or group exactly, so
I had an "egrep -i "pat1|pat2|pat3|pat4|pat5|pat6|pat7|pat9"
kind of thing, probably running on stdin.)
I saw the malfunction, looked at a couple of (false match) lines,
and thought "these could be explained by . . ." without
doing a comprehensive check.

It sounds like I'm going to have to find a test case
showing this problem exactly, because it's CLEARLY not
the straightforward malfunction that I described in my
original article.  (*sigh* when I get time . . . .)

I'd also conclude that I should have been more accurate
and complete about the conditions under which I saw the
problem, and presented the "[fb][oa][or]" explanation
as a hypothesis, rather than as the observed problem.
I'll try to be more accurate in the future.
                               Kris A. Kugel
                             ( 908 ) 842-2707
                      uunet!tsdiag.ccur.com!hico2!kak
                        {daver,ditka,zorch}!hico2!kak
                      internet: kak@hico2.westmark.com