[comp.unix.shell] Leaf directories

gt0178a@prism.gatech.EDU (Jim Burns) (10/12/90)

in article <F&gya5o2@cs.psu.edu>, flee@dictionopolis.cs.psu.edu (Felix Lee) says:

> ftw() as spec'd in System V.3 and SunOS 4.1 doesn't special-case leaf
> directories, so you can't implement lazy stat()ing.

I'm confused - what's a leaf directory, and how does it relate to this
discussion?
-- 
BURNS,JIM
Georgia Institute of Technology, Box 30178, Atlanta Georgia, 30332
uucp:	  ...!{decvax,hplabs,ncar,purdue,rutgers}!gatech!prism!gt0178a
Internet: gt0178a@prism.gatech.edu

flee@guardian.cs.psu.edu (Felix Lee) (10/13/90)

Leaf directories are directories with no subdirectories.  A leaf
directory has a link count <= 2.

Consider running "find" on a directory /tex/lib/fonts which contains
8472 plain files.  Since /tex/lib/fonts is a leaf directory, you can't
descend any deeper, so you don't need to stat() any of the 8472 files
to look for subdirectories.

The difference in speed is similar to the difference between saying
	ls /tex/lib/fonts
and
	ls -F /tex/lib/fonts
--
Felix Lee	flee@cs.psu.edu

cpcahil@virtech.uucp (Conor P. Cahill) (10/13/90)

In article <15052@hydra.gatech.EDU> gt0178a@prism.gatech.EDU (Jim Burns) writes:
>in article <F&gya5o2@cs.psu.edu>, flee@dictionopolis.cs.psu.edu (Felix Lee) says:
>> ftw() as spec'd in System V.3 and SunOS 4.1 doesn't special-case leaf
>> directories, so you can't implement lazy stat()ing.
>
>I'm confused - what's a leaf directory, and how does it relate to this
>discussion?

A leaf directory is a directory that has no subdirectories.  If you special
case these directories, you can just read the file names within the
directory without having to stat each entry (since you know that they are
all just files).

You know it is a leaf directory when the number of links to the directory
itself is 2 ("." and the entry in the parent directory).

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

chris@mimsy.umd.edu (Chris Torek) (10/13/90)

In article <F&0u96p2@cs.psu.edu> flee@guardian.cs.psu.edu (Felix Lee) writes:
>Leaf directories are directories with no subdirectories.  A leaf
>directory has a link count <= 2.

It is easier than that (and harder than this, which leaves out error
checking):

	d = opendir(dir);
	fstat(dirfd(d), &st);
	nsubdirs = st.st_nlink - 2;
	while ((dp = readdir(d)) != NULL) {
		if (need_stat_anyway || nsubdirs > 0) {
			get_stat(dir, dp->d_name, &st);
			if (S_ISDIR(st.st_mode))
				nsubdirs--;
		}
		...
	}

(pre-POSIX programmers pwill pneed, er, will need to write
	(st.st_mode & S_IFMT) == S_IFDIR
rather than S_ISDIR(st.st_mode)).

This is susceptible to error if the file system is active, but that
is no change; find has always had this problem.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (10/14/90)

In article <26981@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
: In article <F&0u96p2@cs.psu.edu> flee@guardian.cs.psu.edu (Felix Lee) writes:
: >Leaf directories are directories with no subdirectories.  A leaf
: >directory has a link count <= 2.
: 
: It is easier than that (and harder than this, which leaves out error
: checking):
: 
: 	d = opendir(dir);
: 	fstat(dirfd(d), &st);
: 	nsubdirs = st.st_nlink - 2;
: 	while ((dp = readdir(d)) != NULL) {
: 		if (need_stat_anyway || nsubdirs > 0) {
: 			get_stat(dir, dp->d_name, &st);
: 			if (S_ISDIR(st.st_mode))
: 				nsubdirs--;
: 		}
: 		...
: 	}

If find is done this way, it would behoove us to sort subdirectories
before filenames in each directory so that nsubdirs goes to 0 as soon
as possible.

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov

chris@mimsy.umd.edu (Chris Torek) (10/18/90)

>In article <26981@mimsy.umd.edu> I demonstrate how to avoid extra
`stat' calls on Unix systems that happen also to be POSIX conformant.
(I am told that POSIX requires links, but not `..' links, so the code
that says `nsubdirs = st.st_nlink - 2' is Unix-specific.)  An important
feature of the algorithm is that stat()s stop as soon as all
subdirectories have been found.

In article <9959@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV
(Larry Wall) writes:
>If find is done this way, it would behoove us to sort subdirectories
>before filenames in each directory so that nsubdirs goes to 0 as soon
>as possible.

Yes.  Note that `restore'ing a file system Just Happens to do this.
(Actually, it does it for an obvious reason---the data `dump' writes
is in this order---but it turns out to be convenient.)  You can also
write a program that marches through a directory tree, moving all
subdirectories to the front---it looks something like this:

	# untested shell script to move files after directories
	# does not work in / or at mount points
	# does not handle file names with embedded shell metacharacters
	dirs=
	files=
	for i in `ls -a | tail +2`; do
		if [ -d $i ]; then
			dirs="$dirs $i"
			(cd $i; sh move_them_around)
		else
			files="$files $i"
		fi
	done
	mkdir ../move_tmp.$$
	case "$dirs" in "");; *) mv $dirs ../move_tmp.$$;; esac
	case "$files" in "");; *) mv $files ../move_tmp.$$;; esac
	here=`pwd | sed -e 's,.*/,,`
	cd ..
	rmdir $here; mv move_tmp.$$ $here

This can all be done better in perl. :-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris