[comp.unix.questions] Dealing with large directories

cosell@PROPHET.BBN.COM (Bernie Cosell) (06/29/87)

I'd appreciate some shell-programming techniques for dealing with
large directories.  What would be really optimal would be techniques
that are effective enough that one could use them all the time and so
"bulletproof" ones shell files against running into a directory with
500 files in it or something like that.  Things like "[a-m]*/[n-z]*"
in two runs only work AFTER you get blown out of the water and you go
back and tune to the particular distribution to divide the dir up
correctly.  Are there tricks for making this sort of thing not be a
problem? 

Thanks
  /Bernie\

Bernie Cosell                       Internet:  cosell@bbn.com
Bolt, Beranek & Newman, Inc         USENET:    harvard!bbn.com!cosell
Cambridge, MA   02238               Telco:     (617) 497-3503

wescott@sauron.UUCP (06/29/87)

In article <8079@brl-adm.ARPA> cosell@PROPHET.BBN.COM (Bernie Cosell) writes:
> I'd appreciate some shell-programming techniques for dealing with large
> directories ...  Things like "[a-m]*/[n-z]*" in two runs only work AFTER you
> get blown out of the water.  Are there tricks for making this sort of thing
> not be a problem? 

Try xargs(1).  You'll find it in SysV and in mod.sources archives.  xargs(1)
takes a list of whatever on stdin and passes them as arguments to a program
(supplied as the first arg to xargs).

For example:

	find . -type f -print | xargs dosomething -x

ends up executing

	dosometing -x

on every file in the current working directory and all subdirectories.
This is much more efficient than

	find . -type f -exec dosomething -x "{}" ":"

Filtering the output of find through sed or grep and/or using ls to
generate filenames can let you be more specific in the criteria used.

Another example of xargs use:

	find . -type f -print | xargs file |  \
		grep 'text$' | sed 's/:^I//' > flist

This makes a list of all files that file(1) cosiders to be "text" of
one form or another.

-- 
	-Mike Wescott
	 wescott@ncrcae.Columbia.NCR.COM