[net.unix-wizards] grep

trb (03/15/83)

To quote from the 4.1bsd manual:

	Ideally there should be only one grep, but we don't know a
	single algorithm that spans a wide enough range of space-time
	tradeoffs.

The question remains, which grep should I use?  I run 4.1bsd.  I read the
4.1bsd manual and it says:

	Grep patterns are limited regular expressions in the style of
	ex (1); it uses a compact nondeterministic algorithm.

	Egrep patterns are full regular expressions; it uses a fast
	deterministic algorithm that sometimes needs exponential
	space.

	Fgrep patterns are fixed strings; it is fast and compact.

I timed a grep through a uucp LOGFILE (277K+ bytes) on my somewhat loaded
(load average about 8) 780, here's what I found:

	% fgrep vax135 LOGFILE > /dev/null
	6.0u 1.7s 0:36 21% 5+6k 278+2io 1pf+0w

	% egrep vax135 LOGFILE > /dev/null
	3.7u 1.6s 0:40 13% 8+16k 284+1io 2pf+0

	% grep vax135 LOGFILE > /dev/null
	4.9u 1.5s 0:56 11% 5+5k 281+1io 1pf+0

Yes, judging the user+system time figures, on the normal grep (for a
single literal string), fast and compact fgrep is slower and not much
more compact than egrep.

The three grep programs do all have their graces in different
situations, grep is small and simple, fgrep handles a list of strings,
and egrep handles full regular expressions.

Grepping my entry out of a 150 line /etc/passwd showed no appreciable
timing difference between the greps, so you should not be afraid of
using egrep on small files.

My message is that you VAX hacking speed daemons with should start
using egrep for your pedestrian grep needs.

	Andy Tannenbaum   Bell Labs  Whippany, NJ   (201) 386-6491

swatt (03/16/83)

Andy's performed a valuable service showing USENET readers just what a
wonderful job "egrep" is.  I remember being completely blown away back
in 1979 when I discovered comparing the three "grep"s on multi-megabyte
listing files using an 11/70, and discovered "egrep" was the fastest of
the lot, even on very long expressions.  It demonstrates once again the
power of good algorithms.

However I still use good 'ol "grep" most of the time because (1) my
fingers are VERY good at typing it, and (2) it has fewer metacharacters
that I have to escape if I want them literally.

Also note that "egrep" takes a fair amount of time to just compile that
regular expression, whereas "grep" is very fast.  Take a look at
/bin/calendar: it creates a file with two egrep-style regular
expressions, and then "egrep"s through calendar files using that file
as the set of regular expressions.  The time required for egrep to
actually process the input is trivial, but it can take a significant
fraction of a CPU second to compile the expression.

Also, "grep" has some options which "egrep" lacks, notably the "ignore
case" flag.

Conclusion:  The grep man page authors are right -- there isn't yet one
grep for all seasons.

	- Alan S. Watt