[comp.unix.wizards] new grep

williams@nrl-css.arpa (06/21/88)

	Date: Fri, 27 May 88 14:39:34 EDT
	From: Root Boy Jim <rbj@icst-cmr.arpa>
	Subject: grep replacement
	(1087 chars) [More?]   
	
	        Al Aho and I are designing a replacement for grep, egrep
		and fgrep.  The question is what flags should it support and
		what kind of patterns ...
	 
	I have always thought it would be nice to print only the first match.
	
	        (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa>

I'll second this.  There have been many times when I simply wanted to
detect the presence of a pattern in a file, then stop searching.  Combining
this with the -n option can also be useful.

 ------------------------------------------------------------
"If you're going to skate on thin ice, you might as well dance."
	- Denis Owens, WGMS 103.5 FM

James W. Williams			williams@nrl-css.arpa
Systems Administrator, Code 5505
Information Technology Division		
Naval Research Laboratory		(202) 767-9035
Washington, DC 20375
 ------------------------------------------------------------

mohamed@hscfvax.harvard.edu (Mohamed Ellozy) (06/21/88)

In article <16237@brl-adm.ARPA> williams@nrl-css.arpa writes:
>	        Al Aho and I are designing a replacement for grep, egrep
>		and fgrep.  The question is what flags should it support and
>		what kind of patterns ...
>	 
>	I have always thought it would be nice to print only the first match.
>	
>	        (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa>
>
>I'll second this.  There have been many times when I simply wanted to
>detect the presence of a pattern in a file, then stop searching.  Combining
>this with the -n option can also be useful.

Also times when I am searching for something that is close to the start
of a long file (e. g. a host that is close to the start of /etc/hosts).
I can save time by doing a head -NNN | grep, but I must guess a suitable
value for NNN, while a grep that quits after the first match does it for
me.

avr@mtgzz.att.com (XMRP50000[jcm]-a.v.reed) (06/22/88)

In article <16237@brl-adm.ARPA>, williams@nrl-css.arpa writes:
< 	        Al Aho and I are designing a replacement for grep, egrep
< 		and fgrep.  The question is what flags should it support and
< 		what kind of patterns ...
< 	 
< 	I have always thought it would be nice to print only the first match.
< 	        (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa>
< I'll second this.  There have been many times when I simply wanted to
< detect the presence of a pattern in a file, then stop searching.  Combining
< this with the -n option can also be useful.

This would be a good time to make the flags orthogonal and
consistent. Ideally, all standard output should be requested in
the same way, by specifying the corresponding flag. The default
should be something like the -s option today; the way to specify
one's own favorite options would be by defining an alias, e.g.

                mygrep="grep -i -l -n -m"

The options I'd like to see are:

	-l	List paths (names) of files with matches
	-n	Number: list numbers of lines that match
	-b	Block: list block number for each match
	-m	Matching lines: list matching lines
	-c	Display a count of maching lines in each file
	-o	One: Don't search beyond one match per file

plus the -v, -i, -e, and -f options as they are now. Ideally,
there should be a new name; if the new tool is called with one of
the names that exist today, then the behavior of the old tool of
that name ought to be emulated exactly.

				Adam Reed (mtgzz!avr)

williams@nrl-css.arpa (06/22/88)

	Date: Fri, 27 May 88 14:39:34 EDT
	From: Root Boy Jim <rbj@icst-cmr.arpa>
	Subject: grep replacement
	(1087 chars) [More?]   
	
	        Al Aho and I are designing a replacement for grep, egrep
		and fgrep.  The question is what flags should it support and
		what kind of patterns ...
	 
	I have always thought it would be nice to print only the first match.
	
	        (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa>

I'll second this.  There have been many times when I simply wanted to
detect the presence of a pattern in a file, then stop searching.  Various
ways of doing this with existing tools (head -1, etc.) are inefficient.

 ------------------------------------------------------------
"If you're going to skate on thin ice, you might as well dance."
	- Denis Owens, WGMS 103.5 FM

James W. Williams			williams@nrl-css.arpa
Systems Administrator, Code 5505
Information Technology Division		
Naval Research Laboratory		(202) 767-9035
Washington, DC 20375
 ------------------------------------------------------------

ford@elgar.UUCP (Mike "Ford" Ditto) (06/24/88)

In article <16237@brl-adm.ARPA>, williams@nrl-css.arpa writes:
> 	        Al Aho and I are designing a replacement for grep, egrep
> 		and fgrep.  The question is what flags should it support and
> 		what kind of patterns ...
> 	 
> 	I have always thought it would be nice to print only the first match.
> 	        (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa>
> I'll second this.  There have been many times when I simply wanted to
> detect the presence of a pattern in a file, then stop searching.  Combining
> this with the -n option can also be useful.

Everyone's forgetting an important option: "status only".  I have seen
a few greps that had the -s switch, which would cause grep to exit with
an appropriate status as soon as it "knew" what it should be.  This is
the same as the "only-print-first-match" with no output done at all.

So...

	if who | grep tty000 > /dev/null
	then
		echo "port in use"
		exit
	fi

becomes:

	if who | grep -s tty000
	then
		echo "port in use"
		exit
	fi

which is easier to understand and more efficient (since grep can quit
as soon as it sees a match).


					-=] Ford [=-

"Once there were parking lots,		(In Real Life:  Mike Ditto)
now it's a peaceful oasis.		ford@kenobi.cts.com
This was a Pizza Hut,			...!sdcsvax!crash!kenobi!ford
now it's all covered with daisies." -- Talking Heads

jgy@homxc.UUCP (J.YOUNG) (06/29/88)

In regards to the discussions regarding a grep option to
halt processing upon the first match being found.

A couple more suggestions:
	Seems you would like a flag which tells grep whether
	to terminate all searching or only on the current file.

	How about an option (only appicable when patterns start
	with "^") to tell grep that the file is lexically sorted
	(?what order?) so it can terminate if the input line
	is lexically greater than the pattern (s).
	This is usefull for searching through sorted lists (files,
	dictionaries, etc...)

John Young
AT&T BL.
Red Hill Rd,
Middletown, NJ 07748
201-615-4412

hutch@lzaz.ATT.COM (R.HUTCHISON) (06/29/88)

> 	Date: Fri, 27 May 88 14:39:34 EDT
> 	From: Root Boy Jim <rbj@icst-cmr.arpa>
> 	Subject: grep replacement
> 	
> 	        Al Aho and I are designing a replacement for grep, egrep
> 		and fgrep.  The question is what flags should it support and
> 		what kind of patterns ...
> 	 
> 	I have always thought it would be nice to print only the first match.
> 	
> 	        (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa>
> 

I would like grep to be able to ignore case.

guy@gorodish.Sun.COM (Guy Harris) (06/30/88)

> I would like grep to be able to ignore case.

You already can, if you're running 4.0BSD or later (perhaps even earlier,
although it's not documented in V7) or System V Release 2.0 or later.  (I don't
know about "fgrep" or "egrep"; I could look it up, but then so could many of
you....)

hutch@lzaz.ATT.COM (R.HUTCHISON) (06/30/88)

> I would like grep to be able to ignore case.


Sorry.  I was wrong.  Thanks for all you who told me about the -i
flag.  I also realized how nasty (some) people can be when someone
makes a mistake.  I got a few nice replies - to the others I would
suggest that you cool down a bit.

R. Hutchison

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/01/88)

In article <58443@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>> I would like grep to be able to ignore case.
>You already can, if you're running 4.0BSD or later (perhaps even earlier,
>although it's not documented in V7) or System V Release 2.0 or later.

Of course it didn't work right until you fixed the bug (which I picked
up for the BRL SysV emulation, but I haven't checked AT&T's current
"grep" sources).

By the way, this is another case where separate tools might be better.
There are languages (and character sets) for which "case" is not a
relevant concept.  And even the fast case mapping code from 4BSD (which
again Guy put into the SVR2 version and I picked it up for ours, but
don't know about AT&T's current version) still adds some degree of
overhead that is wasteful in the usual case.

guy@gorodish.UUCP (07/01/88)

> Of course it didn't work right until you fixed the bug...

To what bug are you referring?  I just built the standard "grep" and "egrep"
from the standard S5R2 source, and at least "grep -i aarhus /usr/dict/words"
worked.  There was a bug in "fgrep", fixed in the "V7 addendum tape" version
(and thus in the 4BSD version) but not in the S5 version, but I don't think it
had anything to do with case-mapping (if somebody has the original bug fix that
I posted, could they mail it to me - I no longer remember what the bug was, and
may need to know what it was).

There may have been performance problems, but at least they worked....

> By the way, this is another case where separate tools might be better.

> There are languages (and character sets) for which "case" is not a
> relevant concept.

So don't use "-i" with files containing text in those languages.

> And even the fast case mapping code from 4BSD (which again Guy put into the
> SVR2 version and I picked it up for ours, but don't know about AT&T's
> current version) still adds some degree of overhead that is wasteful in the
> usual case.

Wasteful, maybe, but noticeable?  "grep" and "egrep" lower-casify the pattern
argument, but that's cheap.  "grep" tests the "-i" flag once per line, and
"egrep" (done right) tests it only when building the FSM, not when reading the
file.  "fgrep" tests it more often, so that's the only case where I see any
value to worrying about overhead in the usual case.  I infer from your
reference to the "fast case mapping code from 4BSD", which appeared in "fgrep",
that you may be referring to "fgrep"; the S5R3 version has a different form of
fast case mapping.

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (07/02/88)

In article <162@lzaz.ATT.COM> hutch@lzaz.ATT.COM (R.HUTCHISON) writes:

>I would like grep to be able to ignore case.

  On BSD the option -i does this. I think it's added to later versions
of SRV, but I don't have a handy way to test here. In Xenix it's called
-y (I have *NO* idea, don't ask me).

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

root@hawkmoon.MN.ORG (Admin) (07/05/88)

In article <162@lzaz.ATT.COM>, hutch@lzaz.ATT.COM (R.HUTCHISON) writes:
> I would like grep to be able to ignore case.

grep -i pat files

will ignore case.
-- 
Derek Terveer	root@hawkmoon.MN.ORG	uunet!rosevax!elric!hawkmoon!root

andrew@alice.UUCP (07/05/88)

it is interesting how much it hurts. for boyer-moore, it costs about 10%
in practice. for egrep, it costs nothing as you simply duplicate entries
in a jump table. for fgrep, it costs only 3-5% as you only have to map
case for nodes other than the root which, at least in the new grep, is
a jump table.