wagner@iaoobelix.UUCP (06/21/87)
A few weeks ago, the "fastest grep around" was posted thru comp.sources. Well, reading this announcement I was really interested in the preformance figures of this nice program. So I just moved it over to a Sun 3/260 (SunOS 3.2) and compiled it as it came with 'notes'. Ahemm! Well! **SURPRISE** The results I obtained from just little playing around with this reportedly fastest grep program were a bit :-) :-) disappointing. Below you find the figures I got from doing a time grep "[0-9][a-z] [a-z]" /usr/man/cat2/* > /dev/null with several grep versions. I say several versions (not just two) because there are three of them: (a) the standard /bin/grep coming with UNIX, (b) the grep from comp.sources, and (c) the grep from an older posting of Ozan S. Yigit (utzoo!yetti!oz) coming with the author's own version of the regexp package (here, it is called re_exp). Ok, here are the timing figures: (a) time: 6.7u + 0.9s = 60% of 0:12 (b) time: 29.7u + 1.3s = 89% of 0:34 (c) time: 6.5u + 1.0s = 70% of 0:10 Remember: It is a Sun 3/260, and I am the only user; nobody's distracting my machine from grepping thru /usr/man/cat2... Is there anybody out there who knows why some people called the program (a) the "fastest grep around"?? Apparently, it is the factor four (in average) slower even than the standard `grep'... Juergen Wagner, (USENET) ...seismo!unido!iaoobel!wagner ("Gandalf") Fraunhofer Institute IAO, Stut7=EENV
jaw@aurora.UUCP (James A. Woods) (06/25/87)
The concocted example indeed makes grep spend much time searching for nothing. Now, if you want to play the "worst case" game, tell me why, on a vax 11/750 (bsd 4.3, fujitsu eagles), we get % time /bin/grep '....stuff' /usr/dict/words foodstuff 22.2u 1.1s 0:44 but % time grep '....stuff' /usr/dict/words foodstuff 0.4u 0.4s 0:01 using my code. Reasons for such disparity can be gleaned from the writeups pep4grep.doc[1-4], bundled into the comp.sources shar. As stated there, cases can be "cooked" to show any of the algorithms in a bad light. The newer [ef]?grep remains nonpareil for a vast majority of typical search patterns, including but not limited to: expressions containing few or no metacharacters. (thus obsoleting 'bm', 'match', etc.) complicated patterns with closures, but containing nontrivial "must match" substring residue. meta-free kanji input. fgrep-style alternations of low cardinality. [ef]?grep users who get strange results from the distributed eg.shell test script, or have other suggestions, may write to: ames!jaw
jaw@ames.arpa (James A. Woods) (06/25/87)
the unusual case mentioned, which does point out a performance gaffe, can be helped. until i can submit a patch, changing CTHRESH to be lower, say #define CTHRESH 3000 and the line if (!grepflag && pctmatch > PUNTPERCENT && file != NULL) { to if (pctmatch > PUNTPERCENT && file != NULL) { should aid such searches.
amos@nsta.UUCP (Amos Shapir) (06/25/87)
It probably depends on what you are grepping for. I tried grepping for a simple word (i.e. no magic characters) through *.c on a source directory - 50 files, 22k lines, 448Kb total size - on a vax 750 using all the 4.3BSD standard grep's and the new one. Sys time was about 2.2-2.3 seconds for all; execution (user) times were: fgrep - 20.4 seconds, grep - 15.1, egrep - 11.6, and - hold on to your seats - the new grep: 1.2 ! (yes, one-decimal-two). Naturally, I was just as fast installing it. It seems the new grep is faster on simple (and common) searches; for more complicated cases it just exec's the appropriate old grep. -- Amos Shapir National Semiconductor (Israel) 6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel Tel. (972)52-522261 amos%nsta@nsc.com @{hplabs,pyramid,sun,decwrl} 34 48 E / 32 10 N