[comp.unix.wizards] How to write a current directory finder?

mcgrath@paris.Berkeley.EDU (Roland McGrath) (04/02/89)

I've written a `getcwd' function (finds the absolute pathname of the current
working directory), and it works; but it's immensely slow!

The method I've used is this:

Starting with `.', get the device and i-node numbers of each directory, back
up to `..', search the directory, checking each file for matching numbers,
and when one is found, add its name to the beginning of the path.
Then, get the device and i-node numbers of that directory, back up to its
parent, and search for matching numbers, repeating until a directory and its
parent have the same device and i-node numbers (because /.. is /).

Since my function is about 10 times slower than /bin/pwd, this is obviously
not the optimum method.

Since the manpage for `getwd' on my system (Sun Unix 3.2) warns that a
failing `getwd' call may leave the current directory in the wrong place, I
assume that `getwd' moves around in the process of figuring it out.
But I don't see how this helps, using the same basic logic as my method.
You trade time constructing a big `../../../../..' name and using that in
all the lookups for moving around and cutting out all the `..'s.
But can that really make such a difference?
--
	Roland McGrath
	Free Software Foundation, Inc.
roland@wheaties.ai.mit.edu, mit-eddie!wheaties.ai.mit.edu!roland
Copyright 1989 Roland McGrath, under the GNU General Public License, version 1.

guy@auspex.auspex.com (Guy Harris) (04/02/89)

>Starting with `.', get the device and i-node numbers of each directory, back
>up to `..', search the directory, checking each file for matching numbers,
>and when one is found, add its name to the beginning of the path.
>Then, get the device and i-node numbers of that directory, back up to its
>parent, and search for matching numbers, repeating until a directory and its
>parent have the same device and i-node numbers (because /.. is /).

That's more-or-less what the 4.3BSD "getwd" does.

>Since my function is about 10 times slower than /bin/pwd, this is obviously
>not the optimum method.
>
>Since the manpage for `getwd' on my system (Sun Unix 3.2) warns that a
>failing `getwd' call may leave the current directory in the wrong place, I
>assume that `getwd' moves around in the process of figuring it out.

The 4.2BSD version did that (the SunOS 3.2 version is based on the
4.2BSD one).  "/bin/pwd" in 4.[23]BSD and SunOS (and probably in other
systems that picked it up from 4.[23]BSD) just calls "getwd" and prints
the result.

>But I don't see how this helps, using the same basic logic as my method.
>You trade time constructing a big `../../../../..' name and using that in
>all the lookups for moving around and cutting out all the `..'s.
>But can that really make such a difference?

Yes, since it avoids several levels of lookup; looking up "..", for
example, is cheaper than looking up "../../../../..".  Directory-name
lookup caching (which 4.3BSD provides) helps, but it can't eliminate the
performance hit entirely (you still have to copy longer pathnames into
the kernel on typical UNIX systems, and still have to do some work on
each component, even if you don't have to scan the directory).

mcgrath@paris.Berkeley.EDU (Roland McGrath) (04/03/89)

It did occur to me that looking up five levels of `..' could be quite slow,
and perhaps changing directories really would make a significant difference.
So I rewrote my function to change directories and stat "" (which past
experience tells me is a real quick way to get the current directory with no
name lookup).  It may have been somewhat faster than before, but it was
still orders of magnitude slower the the system `getwd' (Sun Unix 3.2).
--
	Roland McGrath
	Free Software Foundation, Inc.
roland@wheaties.ai.mit.edu, mit-eddie!wheaties.ai.mit.edu!roland
Copyright 1989 Roland McGrath, under the GNU General Public License, version 1.

flee@shire.cs.psu.edu (Felix Lee) (04/03/89)

Doing a "trace" shows that getwd() does not chdir().

Here is speculation on Sun's getwd() after trying to implement my own
and watching a "trace".

* It caches its result.  Repeated calls to getwd() just make two
stat() calls.  Trying to time the performance of getwd() by putting it
in a loop is meaningless.  This is the big win.

* It uses /tmp/.getwd to figure out where it is when it crosses a
mount point.  This is a minor gain.  In my version of getwd(), I
stat'ed around to find the mount point; commenting out the stat()
calls saved about .005s system time (on a Sun4 running SunOS4.0).

Any other differences in algorithm (using fstat() instead of stat(),
using chdir(), etc.) have little effect on performance.
--
Felix Lee	flee@shire.cs.psu.edu	*!psuvax1!shire!flee