mlandau@bbn.com (Matt Landau) (12/30/88)
In comp.sys.sun (<2359@kalliope.rice.edu>), Our Moderator, in response to some questions about find, says: >Uhhh, I think you both came from the twilight zone. Neither "find >filename" nor "find '*filename*'" will have much effect. The first >argument(s) to find are the directories in which to start the recursive >search. Actually, if you give find a single non-option argument, it will look in /usr/lib/find/find.codes for strings that contain that argument as a substring. The find.codes file is essentially a pre-cached view of the filesystem space, arranged for very fast lookup. The trick is knowing how to build /usr/lib/find/find.codes, since there is absolutely no documentation on it! However, if you look in the /usr/lib/find directory, you'll find a csh script called "updatedb", which builds find.codes for you. I arrange to have it run every night from cron, after which I can go to some server and say "find core" to find every file that contains the string "core" without waiting for find to traverse all the filesystems. Updatedb only works on type 4.2 filesystems, so you have to run it on each of you servers, and it only builds a cache for 4.2 filesystems, so you have to do "find string" on each server to find all instances of what you're looking for. In spite of that, it's a big win over waiting for find to walk 3 gigabytes of disk every time you want to hunt something down. [[ Yeah, I kind of blew it. I was unaware of the "fast find" code that had been incorporated in 4.3. I've been too busy using Suns that I have not been able to keep up with the rest of the Unix world. Sun did indeed incorporate the fast find code in their distributions of "find", but they could not come up with an easy way of making the updatedb stuff work properly in a distributed (NFS) environment. So, they never documented that usage of find and never set up "updatedb" in crontab. That's why I got confused. --wnl ]]
jaw@eos.arc.nasa.gov (James A. Woods) (12/30/88)
Under BSD 4.3 Unix, "find filename" is short for find / -name '*filename*' -print (by far the most common usage of "find"), except that it runs in seconds rather than minutes. It is documented in the manual page. I wrote this code years ago; it was actually part of BSD 4.2, but got lost in the shuffle at Sun when they went the SVID route. Though undocumented in the manual page, it half-way works under SunOS 4.0, after first installing a nightly call to 'updatedb' from crontab. Unfortunately, as distributed, the compressed filename database is not portable across architectures because of bit-order problems with calls to putw()/getw(). [In the days of the DEC PDP, these word writes permitted the code to work on both DEC architectures-- NFS, with disparate machines reading one database, didn't yet exist.] However, the Sun 4 / Sun 3 'fastfind' problem is easily fixed by replacing the reference to c = getw(fp) in find.c with something like: c = getc (fp); c = ((unsigned char) c << 8) | getc(fp); plus a similar change to putw() in code.c. As to the syntax issues, it could be argued that filename matching "glob style" should be like 'egrep' rather than 'sh' -- this takes somewhat more work. Another area for improvement is database build time, now slow partly due to the use of 'awk'. Finally, the largest crime committed in my design of five-year old ffind is that it is not "eight-bit clean" for international character sets. I may remedy this someday (at the slight expense of compression efficiency), and donate the resulting code to the GNU project, unless someone has already done such. James A. Woods (ames!jaw) NASA Ames Research Center [[ My thanks to other readers who have also pointed out my gaffe. Good to see that everyone is still on their toes! --wnl ]]
ndd@sunbar.mc.duke.edu (Ned Danieley) (12/30/88)
(To the theme from "The Twilight Zone") Do de do do, do de do do. It turns out that, at least under 3.5 [[ as early as 3.2, actually --wnl ]], 'find' allowed (but Sun did not document) the 4.3 behaviour of find filename which depends on running /usr/lib/find/updatedb periodically; this sets up a database of names, allowing 'find' to work VERY quickly. I found this about a year ago, and told Sun; I think it even made it into an STB. Note that /usr/lib/find exists under 4.0, but that the man page still doesn't mention it, and that find filename only works as find '*filename*' As jbm says, Sun knows about it, and has acknowledged that it is a bug. It seems to work under Sys4-3.2, so you probably could get 'find' from that release and have it work. Ned Danieley (ndd@sunbar.mc.duke.edu) Basic Arrhythmia Laboratory Box 3140, Duke University Medical Center Durham, NC 27710 (919) 684-6807 or 684-6942
guy@uunet.uu.net (Guy Harris) (01/12/89)
>I wrote this code years ago; it was actually part of BSD 4.2, but got lost >in the shuffle at Sun when they went the SVID route. 4.2 or 4.3? It was added into SunOS 3.2 when the 4.3BSD "find" stuff was merged into the S5R2 "find" to make the 3.2 "find". (I know that for a fact; I did the merging.) >Though undocumented in the manual page, ...because the claim that find <file> will find all files whose names match "<file>" is an assertion about the local system administrator's policy as much as it is a statement about the behavior of "find"; Sun wasn't in a position to control the former, especially given that the "fast find" stuff doesn't scale in an immediately obvious way for NFS. (Has anybody actually tried 1) putting the appropriate "crontab" entry in on *all* machines on a network with many diskless workstations and 2) changing "updatedb" not to stop at NFS mount points If so, how much of a load does it impose?) [[ Which is precisely why Sun neither documented fast find nor put updatedb in crontab for their distributions. --wnl ]] It's also not clear how it should work if you use the automounter.... >Finally, the largest crime committed in my design of five-year old >ffind is that it is not "eight-bit clean" for international >character sets. Which is a problem in SunOS 4.0, which supports 8-bit characters in file names (such as the symlink "/UNIX(R)" that I had to "/vmunix", where "(R)" is the ISO Latin #1 "registered trademark symbol" character). [[ The Unix kernel has always supported 8 bit characters in file names (I *know* that BSD 4.1 did, and I think that pretty much every version of BSD and Bell Unix did). It's just that certain shells stepped on the eighth bit for their own devious reasons. But in C you've always been able to do 'creat("A\302C\304");'. --wnl ]]
david@sun.com (01/13/89)
In article <12397@silica.BBN.COM> mlandau@bbn.com (Matt Landau) writes: >Updatedb only works on type 4.2 filesystems, so you have to run it on each >of you servers, and it only builds a cache for 4.2 filesystems, so you >have to do "find string" on each server to find all instances of what >you're looking for. Well, not really. Updatedb is a (pretty simple) shell script and you can make it do whatever you want. For example, I have a diskful workstation but my home directory is on a server. Here's the updatedb I use; I only run it once a week, but actually it isn't that big a load on the server... #!/bin/csh -f # # @(#)updatedb.csh 1.1 86/07/08 SMI; from UCB 4.6 85/04/22 # set SRCHPATHS = ( / /usr ) # directories to be put in the database set EXCLUDE = '^/tmp|^/dev|^/usr/tmp' # directories to exclude set NFSPATHS = ~david # NFS directories set NFSUSER = daemon # userid for NFS find set LIBDIR = /usr/lib/find # for subprograms set FINDHONCHO = root # for error messages set FCODES = $LIBDIR/find.codes # the database set path = ( $LIBDIR /usr/ucb /bin /usr/bin ) set bigrams = /tmp/f.bigrams$$ set filelist = /tmp/f.list$$ set errs = /tmp/f.errs$$ # Make a file list and compute common bigrams. # Alphabetize '/' before any other char with 'tr'. # If the system is very short of sort space, 'bigram' can be made # smarter to accumulate common bigrams directly without sorting # ('awk', with its associative memory capacity, can do this in several # lines, but is too slow, and runs out of string space on small machines). nice +6 ( find ${SRCHPATHS} -xdev -print ; \ su $NFSUSER -c "find ${NFSPATHS} -xdev -print" -f ) | \ egrep -v "$EXCLUDE" | \ tr '/' '\001' | \ (sort -f; echo $status > $errs) | \ tr '\001' '/' > $filelist $LIBDIR/bigram <$filelist | \ (sort; echo $status >> $errs) | uniq -c | sort -nr | \ awk '{ if (NR <= 128) print $2 }' | tr -d '\012' > $bigrams if { grep -s -v 0 $errs } then echo "Subject: updatedb failed on `hostname`" | \ /bin/mail $FINDHONCHO exit 1 endif # code the file list $LIBDIR/code $bigrams < $filelist > $FCODES chmod 644 $FCODES rm -f $bigrams $filelist $errs exit 0 -- David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@sun.com