daleske@cbdkc1.UUCP (02/19/87)
Ok. I've looked through all of the various places I could try. A patch came through for ispell which implements "L", a lookup option. Based on some ifdefs it would use "look" or "egrep", both which use a dictionary in /usr/dict. The dictionary could be called "words" or "web2". What package does "look" come in? Is it public domain or part of some product? How does it function? Thanks for any help! I'll post a summary of responses. ___________________________________________________________________________ John Daleske Columbus, Ohio. 614-860-4335 | Default disclaimer... UUCP: {ihnp4,cbosgd,desoto}!cbdkc1!daleske | ____________________________________________|______________________________ "Now," said the butterfly, "look closer and tell me what you see." "I see a tiny horse with wings upon its back" said Flutterby. "Why that's me I see! But what am I?" "You are you. Just as I am me!" said the wise old butterfly. "Nothing more, nothing less."
campbell@maynard.UUCP (02/23/87)
In article <1916@cbdkc1.UUCP> daleske@cbdkc1.UUCP ( John Daleske x4335 3S324W FRAN) writes: > >What package does "look" come in? Is it public domain or part of some >product? How does it function? "look" (like "dbm") comes with V7-derived UNIX systems, which includes Berkeley UNIX and derivatives (i.e., Ultrix). It's seems to be missing from System V, so I presume it's yet another utility AT&T forgot to include in System V. An excerpt from the man page follows: SYNOPSIS look [-df] string [ file ] DESCRIPTION look consults a sorted file and prints all lines that begin with string. It uses binary search. Note that look searches for a literal string, not a regular expression. You could use grep, but for a sizable file look can be hundreds of times faster. -- Larry Campbell The Boston Software Works, Inc. Internet: campbell@maynard.uucp 120 Fulton Street, Boston MA 02109 uucp: {alliant,wjh12}!maynard!campbell +1 617 367 6846 ARPA: campbell%maynard.uucp@harvisr.harvard.edu MCI: LCAMPBELL
straka@ihlpf.UUCP (02/25/87)
> look consults a sorted file and prints all lines that begin with string. > It uses binary search. > Note that look searches for a literal string, not a regular expression. > You could use grep, but for a sizable file look can be hundreds of times > faster. What about using fgrep? Or possibly grep "^string"? -- Rich Straka ihnp4!ihlpf!straka
campbell@maynard.UUCP (02/27/87)
In article <1145@ihlpf.ATT.COM> straka@ihlpf.ATT.COM (Straka) writes: >> look consults a sorted file and prints all lines that begin with string. >> It uses binary search. >> Note that look searches for a literal string, not a regular expression. >> You could use grep, but for a sizable file look can be hundreds of times >> faster. > >What about using fgrep? >Or possibly grep "^string"? What about it? look is still hundreds of times faster. -- Larry Campbell The Boston Software Works, Inc. Internet: campbell@maynard.uucp 120 Fulton Street, Boston MA 02109 uucp: {alliant,wjh12}!maynard!campbell +1 617 367 6846 ARPA: campbell%maynard.uucp@harvisr.harvard.edu MCI: LCAMPBELL
ark@alice.UUCP (02/28/87)
In article <1145@ihlpf.ATT.COM>, straka@ihlpf.UUCP writes: > > look consults a sorted file and prints all lines that begin with string. > > It uses binary search. > > Note that look searches for a literal string, not a regular expression. > > You could use grep, but for a sizable file look can be hundreds of times > > faster. > > > What about using fgrep? > Or possibly grep "^string"? The run time of any program of the grep family increases linearly with the size of the file being searched (with some exceptions for complicated patterns in grep). Look, on the other hand, doesn't touch the entire file. That is, after all, the essence of binary search. So the run time increases only logarithmically with the size of the file. That difference is nothing to sneeze at. Consider a file with a million records as an example: a program that can find any record with only 20 probes into the file is going to beat almost any linear search you can imagine.
straka@ihlpf.UUCP (03/04/87)
In article <846@maynard.BSW.COM>, campbell@maynard.BSW.COM (Larry Campbell) writes: > >> It uses binary search. > >> Note that look searches for a literal string, not a regular expression. > >> You could use grep, but for a sizable file look can be hundreds of times > >> faster. > >What about using fgrep? > >Or possibly grep "^string"? > What about it? look is still hundreds of times faster. Yes, I apologize. I agree; it was simply a case of being asleep at the terminal. Binary searches of large sorted data are certainly faster than any simple sequential search. Moral: Have enough caffeine in your veins before responding to postings! -- Rich Straka ihnp4!ihlpf!straka
solomon@crystal.UUCP (03/07/87)
Look is much faster than any of the greps, since it assumes the file is sorted and uses binary search, for a run time proportional to the log of the number of lines rather than the number of lines (16 vs 24K in the case of /usr/dict/words). What's more surprising is the relative speeds of the various greps. I ran a few tests (4.3BSD on an unloaded VAX 780) and reproduce the data below. Fgrep is always the slowest (it really should be called 'dgrep' for 'dumb-grep'). Egrep is always the fastest. A few years ago, I had a visit from Al Aho (author of egrep). He was quite proud of egrep (as well he should be). He had me type on my terminal time wc /usr/dict/words time egrep -c '^' /usr/dict/words Egrep was the clear winner. He said he considered writing the inner loop in assembler, but decided he'd gone far enough. In most cases, the program is I/O bound anyhow. If only egrep understood all the options of the others (-w and -i from grep, and -x from fgrep), we could dump the others. (The comment in the man page about exponential behavior is a red herring. The exponential behavior is a theoretical worst case that never really comes up in practice, and besides, we're talking about *size* which is exponential in the length of the *pattern*, which is usually quite short. Here are my test results. Warning: These figures are for comparison only; actual milage may vary. first word + time grep -l a /usr/dict/words 4.8 real 4.1 user 0.3 sys + time fgrep -l a /usr/dict/words 0.2 real 0.0 user 0.1 sys + time egrep -l a /usr/dict/words 0.2 real 0.0 user 0.1 sys + time grep -l ^a /usr/dict/words 4.7 real 3.7 user 0.4 sys + time egrep -l ^a /usr/dict/words 0.2 real 0.0 user 0.1 sys + time look a 1.1 real 0.4 user 0.2 sys middle word + time grep -l jade /usr/dict/words 4.1 real 3.4 user 0.3 sys + time fgrep -l jade /usr/dict/words 3.0 real 2.4 user 0.3 sys + time egrep -l jade /usr/dict/words 2.7 real 1.3 user 0.3 sys + time grep -l ^jade /usr/dict/words 4.4 real 3.7 user 0.3 sys + time egrep -l ^jade /usr/dict/words 1.8 real 1.2 user 0.3 sys + time look jade 0.4 real 0.0 user 0.1 sys last word + time grep -l zygote /usr/dict/words 4.0 real 3.4 user 0.3 sys + time fgrep -l zygote /usr/dict/words 5.7 real 4.8 user 0.4 sys + time egrep -l zygote /usr/dict/words 3.6 real 2.5 user 0.5 sys + time grep -l ^zygote /usr/dict/words 5.1 real 3.7 user 0.3 sys + time egrep -l ^zygote /usr/dict/words 4.1 real 2.5 user 0.4 sys + time look zygote 0.2 real 0.0 user 0.1 sys missing word + time grep -l 2 /usr/dict/words 4.1 real 3.3 user 0.4 sys + time fgrep -l 2 /usr/dict/words 5.6 real 4.8 user 0.4 sys + time egrep -l 2 /usr/dict/words 3.7 real 2.5 user 0.5 sys + time grep -l ^2 /usr/dict/words 4.7 real 3.7 user 0.4 sys + time egrep -l ^2 /usr/dict/words 3.3 real 2.5 user 0.4 sys + time look 2 0.5 real 0.0 user 0.2 sys -- Marvin Solomon Computer Sciences Department University of Wisconsin, Madison WI solomon@gjetost.wisc.edu or seismo!uwvax!solomon
reid@decwrl.DEC.COM (Brian Reid) (03/08/87)
In article <271@crys.WISC.EDU> solomon@crys.WISC.EDU (Marvin Solomon) writes: > >Look is much faster than any of the greps, since ... >Fgrep is always the slowest ... >Egrep is always the fastest. Hi, Marvin. I'm always glad to see actual measured data on the network. But clearly you don't have the James Woods (ames!jaw) grep. At our site we call it ngrep. Here are the times for ngrep. Your machine CRYS.WISC.EDU is a Vax 780; I have a 785. Therefore I re-ran all of your tests, to avoid having to convert the numbers between dissimilar machines. Here is my output of your tests. Note that in every case, ngrep is faster than egrep by a huge amount in cpu time and by a significant amount in real time. $ time grep -l a /usr/dict/words 4.2 real 2.9 user 0.4 sys $ time fgrep -l a /usr/dict/words 0.5 real 0.0 user 0.1 sys $ time egrep -l a /usr/dict/words 0.2 real 0.0 user 0.1 sys $ time egrep -l \^a /usr/dict/words 0.2 real 0.0 user 0.1 sys $ time ngrep -l a /usr/dict/words 0.3 real 0.0 user 0.1 sys $ time ngrep -l a /usr/dict/words 0.2 real 0.0 user 0.1 sys $ $ $ time grep -l jade /usr/dict/words 3.7 real 2.3 user 0.4 sys $ time fgrep -l jade /usr/dict/words 2.5 real 1.5 user 0.3 sys $ time egrep -l jade /usr/dict/words 1.8 real 0.8 user 0.3 sys $ time egrep -l \^jade /usr/dict/words 2.5 real 0.9 user 0.3 sys $ time ngrep -l jade /usr/dict/words 1.0 real 0.1 user 0.2 sys $ time ngrep -l \^jade /usr/dict/words 2.3 real 0.3 user 0.8 sys $ $ $ time grep -l zygote /usr/dict/words 3.9 real 2.2 user 0.5 sys $ time fgrep -l zygote /usr/dict/words 4.5 real 3.1 user 0.5 sys $ time egrep -l zygote /usr/dict/words 3.3 real 1.8 user 0.4 sys $ time egrep -l \^zygote /usr/dict/words 3.1 real 1.7 user 0.4 sys $ time ngrep -l zygote /usr/dict/words 1.9 real 0.1 user 0.4 sys $ time ngrep -l \^zygote /usr/dict/words 2.7 real 0.2 user 0.8 sys $ However, that's just a Vax. If you want to see ngrep really sing, watch while I run it on a DECWRL's "Titan" machine, which is about 15 times the speed of a 780: $ time ngrep -l \^Zurich /usr/dict/words 3.7 real 0.0 user 1.5 sys The real time is slower because my Titan has only one disk, and the disk arms must move back and forth between /usr/dict and /bin/ngrep. I can reduce the effect of having only one disk arm by using a longer file to search: $ ls -l /usr/dict/words -rw-r--r-- 1 root 201314 Nov 5 10:43 /usr/dict/words $ ls -l /usr/dict/web2 -rw-r--r-- 1 reid 2498185 Aug 13 1985 /usr/dict/web2 $ tail -1 /usr/dict/web2 Zyzzogeton $ time egrep -l Zyzzogeton /usr/dict/web2 /usr/dict/web2 10.3 real 5.1 user 2.5 sys $ time ngrep -l Zyzzogeton /usr/dict/web2 /usr/dict/web2 7.8 real 0.4 user 2.3 sys Conclusion: be careful with claims about how fast egrep is, since ngrep is almost always significantly faster.
ajs@hpfcdt.HP.COM (Alan Silverstein) (03/13/87)
Re: look(1) If you want a nice routine to do the same thing look does -- bsearchstr -- I had it posted to mod.sources a while back. Also understands case sensitive (sort) versus insensitive (sort -f) searches. Alan Silverstein
cire@hpihoah.UUCP (03/17/87)
the key assumption is the file is already sorted so a binary search works. this is generally not true. to be true one then needs to include the time needed to sort files of the size we are talking about. fgrep wins.
chris@mimsy.UUCP (03/21/87)
In article <4670001@hpihoah.HP.COM> cire@hpihoah.HP.COM (Eric B. Decker) writes: >the key assumption is the file is already sorted so a binary search >works. this is generally not true. `Look' operates on sorted dictionaries. It is not `generally not true' *in the cases for which the program was designed*. >to be true one then needs to include the time needed to sort files >of the size we are talking about. fgrep wins. Only if the file changes often: If not, it is better first to sort it. In any case, fgrep does not win; see James A. Woods's article <706@ames.UUCP>, `Get Hep to Kanji-Ready Five-Algorithm [ef]?grep' for details. If you have only the standard family of greps, use egrep. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu