[comp.unix.internals] Why is find so slow

flee@guardian.cs.psu.edu (Felix Lee) (10/10/90)

>I squirreled away a little program called 'descend' that does the 
>moral equivalent of a 'find . -print', except rather fast.

"descend" is fast because it recognizes leaf directories and avoids
stat()ing the files in that directory.  This is usually a big win,
since most files tend to be in leaf directories.

"find" can't do this in general, since most of its predicates require
stat()ing each file, but it wouldn't be too hard to add lazy stat()ing
to find.  And it may even be worth it.
--
Felix Lee	flee@cs.psu.edu

chris@mimsy.umd.edu (Chris Torek) (10/10/90)

In article <F6.rhxm2@cs.psu.edu> flee@guardian.cs.psu.edu (Felix Lee) writes:
>"descend" is fast because it recognizes leaf directories and avoids
>stat()ing the files in that directory.  This is usually a big win,
>since most files tend to be in leaf directories.
>
>"find" can't do this in general, since most of its predicates require
>stat()ing each file, but it wouldn't be too hard to add lazy stat()ing
>to find.  And it may even be worth it.

Quite likely.

4.3BSD-reno's `find' reimplementation (a redistributable version) uses
the C library `fts' routines (from POSIX) which take flags indicating
whether `stat's are desired.  If you say `no', it stats only when this
is required to search the directory.  The new find sets the `I need
stat information' only when at least one predicate requires it.  The
result:

	% cd /usr/src/local/games
	% time find . name obj
	./umoria/obj
	0.3u 0.8s 0:01 91% 35+56k 0+0io 2pf+0w
	% time find . type l
	./umoria/obj
	0.5u 5.2s 0:09 63% 42+66k 94+1io 3pf+0w
	% 

It makes a pretty big difference.  If find said `no stat' and did the
stats only when some predicate actually required it, that would help
make things like

	find . \( name foo or name bar \) and type l

run much faster.  Uncommon?  Not really:

	find / \( fstype local or prune \) \( \
		   \( name '[#,]*'				atime +1 \) \
		or \( \( name '*.bak' or name '.*.bak' \)	atime +3 \) \
		or \( name '.emacs_[0-9]*'			atime +7 \) \
		or \( name core						 \) \
	\) print exec /bin/rm -f {} \;

Incidentally, yes, find still accepts the old syntax (find . -name ...).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

pt@geovision.uucp (Paul Tomblin) (10/10/90)

flee@guardian.cs.psu.edu (Felix Lee) writes:
>"descend" is fast because it recognizes leaf directories and avoids
>stat()ing the files in that directory.  This is usually a big win,
>since most files tend to be in leaf directories.

>"find" can't do this in general, since most of its predicates require
>stat()ing each file, but it wouldn't be too hard to add lazy stat()ing
>to find.  And it may even be worth it.

What happened to "ftw", that was discussed at Usenix a few years ago?

Is it getting into unix distributions?  Are programs like "find" and 
"ls -R" being re-implemented using it, like they're supposed to?

It sure sounded like it was a cure for what ails find when it was
described.
-- 
Paul Tomblin, Department of Redundancy Department.    ! My employer probably 
I'm not fat..... I'm metabolically challenged.        ! does not agree with my
I'm not underpaid... I'm financially challenged.      ! opinions.... 
nrcaer!cognos!geovision!pt or uunet!geovision!pt      ! Me neither.

bhoughto@cmdnfs.intel.com (Blair P. Houghton) (10/11/90)

In article <1240@geovision.UUCP> pt@geovision.uucp (Paul Tomblin) writes:
>
>What happened to "ftw", that was discussed at Usenix a few years ago?

Someone mentioned it several days ago, but you know the net...

It exists under Ultrix 3.1, so it's been around for a while.

				--Blair
				  "Gnu?"

flee@dictionopolis.cs.psu.edu (Felix Lee) (10/12/90)

>What happened to "ftw", that was discussed at Usenix a few years ago?

ftw() as spec'd in System V.3 and SunOS 4.1 doesn't special-case leaf
directories, so you can't implement lazy stat()ing.
--
Felix Lee	flee@cs.psu.edu

mercer@npdiss1.StPaul.NCR.COM (Dan Mercer) (10/16/90)

In article <26919@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
:In article <F6.rhxm2@cs.psu.edu> flee@guardian.cs.psu.edu (Felix Lee) writes:
:>"descend" is fast because it recognizes leaf directories and avoids
:>stat()ing the files in that directory.  This is usually a big win,
:>since most files tend to be in leaf directories.
:>
:>"find" can't do this in general, since most of its predicates require
:>stat()ing each file, but it wouldn't be too hard to add lazy stat()ing
:>to find.  And it may even be worth it.
:
:Quite likely.
:
:4.3BSD-reno's `find' reimplementation (a redistributable version) uses
:the C library `fts' routines (from POSIX) which take flags indicating
:whether `stat's are desired.  If you say `no', it stats only when this
:is required to search the directory.  The new find sets the `I need
:stat information' only when at least one predicate requires it.  The
:result:

Maybe I'm dense,  but how does find know whether a file is a directory
(and should be descended) or another type of file without stat()ing the
file?



-- 

Dan Mercer
Reply-To: mercer@npdiss1.StPaul.NCR.COM (Dan Mercer)
"MAN - the only one word oxymoron in the English Language"

mercer@npdiss1.StPaul.NCR.COM (Dan Mercer) (10/16/90)

In article <F&gya5o2@cs.psu.edu> flee@dictionopolis.cs.psu.edu (Felix Lee) writes:
:>What happened to "ftw", that was discussed at Usenix a few years ago?
:
:ftw() as spec'd in System V.3 and SunOS 4.1 doesn't special-case leaf
:directories, so you can't implement lazy stat()ing.
:--
:Felix Lee	flee@cs.psu.edu


Sorry,  missed the beginning of this thread - what is lazy stat()ing?


-- 

Dan Mercer
Reply-To: mercer@npdiss1.StPaul.NCR.COM (Dan Mercer)
"MAN - the only one word oxymoron in the English Language"

dattier@ddsw1.MCS.COM (David W. Tamkin) (10/19/90)

mercer@npdiss1.StPaul.NCR.COM (Dan Mercer) wrote in
<638@npdiss1.StPaul.NCR.COM>:

| Maybe I'm dense,  but how does find know whether a file is a directory
| (and should be descended) or another type of file without stat()ing the
| file?

I think the idea is that after find does stat() the file and sees that it is
a directory with only two links, it shouldn't bother stat()ting the entries
in that directory because none of them will be subdirectories.  There will
be only one more level to descend.

David Tamkin  Box 7002  Des Plaines IL  60018-7002  708 518 6769  312 693 0591
MCI Mail: 426-1818  GEnie: D.W.TAMKIN  CIS: 73720,1570   dattier@ddsw1.mcs.com

kimcm@diku.dk (Kim Christian Madsen) (10/22/90)

dattier@ddsw1.MCS.COM (David W. Tamkin) writes:

=>I think the idea is that after find does stat() the file and sees that it is
=>a directory with only two links, it shouldn't bother stat()ting the entries
=>in that directory because none of them will be subdirectories.  There will
=>be only one more level to descend.

If some nasty user hasn't removed one of the "." and/or ".." entry and
created directories to replace them.


Kim Chr. Madsen