[comp.sources.d] Fastest `grep'?? Really?

wagner@iaoobelix.UUCP (06/21/87)

A few weeks ago, the "fastest grep around" was posted thru comp.sources. Well,
reading this announcement I was really interested in the preformance figures
of this nice program. So I just moved it over to a Sun 3/260 (SunOS 3.2) and
compiled it as it came with 'notes'.

Ahemm! Well!	**SURPRISE**

The results I obtained from just little playing around with this reportedly
fastest grep program were a bit :-) :-) disappointing. Below you find the
figures I got from doing a
	time grep "[0-9][a-z] [a-z]" /usr/man/cat2/* > /dev/null
with several grep versions. I say several versions (not just two) because
there are three of them:
	(a) the standard /bin/grep coming with UNIX,
	(b) the grep from comp.sources, and
	(c) the grep from an older posting of Ozan S. Yigit (utzoo!yetti!oz)
	    coming with the author's own version of the regexp package
	    (here, it is called re_exp).

Ok, here are the timing figures:
	(a) time: 6.7u + 0.9s = 60% of 0:12
	(b) time: 29.7u + 1.3s = 89% of 0:34
	(c) time: 6.5u + 1.0s = 70% of 0:10
Remember: It is a Sun 3/260, and I am the only user; nobody's distracting my
machine from grepping thru /usr/man/cat2...

Is there anybody out there who knows why some people called the program (a)
the "fastest grep around"?? Apparently, it is the factor four (in average)
slower even than the standard `grep'...

Juergen Wagner,		     (USENET) ...seismo!unido!iaoobel!wagner
("Gandalf")			 Fraunhofer Institute IAO, Stut7=EENV

jaw@aurora.UUCP (James A. Woods) (06/25/87)

The concocted example indeed makes grep spend much time searching for nothing.

Now, if you want to play the "worst case" game, tell me why, on a
vax 11/750 (bsd 4.3, fujitsu eagles), we get

	% time /bin/grep '....stuff' /usr/dict/words
	foodstuff
	22.2u 1.1s 0:44 
but
	% time grep '....stuff' /usr/dict/words
	foodstuff
	0.4u 0.4s 0:01

using my code.  Reasons for such disparity can be gleaned from the writeups
pep4grep.doc[1-4], bundled into the comp.sources shar.  As stated there,
cases can be "cooked" to show any of the algorithms in a bad light.

The newer [ef]?grep remains nonpareil for a vast majority of typical search
patterns, including but not limited to:

	expressions containing few or no metacharacters.
	    (thus obsoleting 'bm', 'match', etc.)
	complicated patterns with closures, but containing
	    nontrivial "must match" substring residue.
	meta-free kanji input.
	fgrep-style alternations of low cardinality.

[ef]?grep users who get strange results from the distributed eg.shell
test script, or have other suggestions, may write to:

ames!jaw

jaw@ames.arpa (James A. Woods) (06/25/87)

the unusual case mentioned, which does point out a performance gaffe, can
be helped.  until i can submit a patch, changing CTHRESH to be lower, say

	#define CTHRESH 3000

and the line

	if (!grepflag && pctmatch > PUNTPERCENT && file != NULL) {

to
	if (pctmatch > PUNTPERCENT && file != NULL) {

should aid such searches.

amos@nsta.UUCP (Amos Shapir) (06/25/87)

It probably depends on what you are grepping for. I tried grepping for a
simple word (i.e. no magic characters) through *.c on a source directory
- 50  files, 22k lines, 448Kb  total size - on  a vax 750 using  all the
4.3BSD  standard grep's  and the  new one.  Sys time  was about  2.2-2.3
seconds for all; execution (user) times were: fgrep - 20.4 seconds, grep
- 15.1, egrep - 11.6, and - hold on  to your seats - the new grep: 1.2 !
(yes, one-decimal-two).

Naturally, I was  just as fast installing  it. It seems the  new grep is
faster on  simple (and common)  searches; for more complicated  cases it
just exec's the appropriate old grep.
-- 
	Amos Shapir
National Semiconductor (Israel)
6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel  Tel. (972)52-522261
amos%nsta@nsc.com @{hplabs,pyramid,sun,decwrl} 34 48 E / 32 10 N