mcgrath@paris.Berkeley.EDU (Roland McGrath) (04/02/89)
I've written a `getcwd' function (finds the absolute pathname of the current working directory), and it works; but it's immensely slow! The method I've used is this: Starting with `.', get the device and i-node numbers of each directory, back up to `..', search the directory, checking each file for matching numbers, and when one is found, add its name to the beginning of the path. Then, get the device and i-node numbers of that directory, back up to its parent, and search for matching numbers, repeating until a directory and its parent have the same device and i-node numbers (because /.. is /). Since my function is about 10 times slower than /bin/pwd, this is obviously not the optimum method. Since the manpage for `getwd' on my system (Sun Unix 3.2) warns that a failing `getwd' call may leave the current directory in the wrong place, I assume that `getwd' moves around in the process of figuring it out. But I don't see how this helps, using the same basic logic as my method. You trade time constructing a big `../../../../..' name and using that in all the lookups for moving around and cutting out all the `..'s. But can that really make such a difference? -- Roland McGrath Free Software Foundation, Inc. roland@wheaties.ai.mit.edu, mit-eddie!wheaties.ai.mit.edu!roland Copyright 1989 Roland McGrath, under the GNU General Public License, version 1.
guy@auspex.auspex.com (Guy Harris) (04/02/89)
>Starting with `.', get the device and i-node numbers of each directory, back >up to `..', search the directory, checking each file for matching numbers, >and when one is found, add its name to the beginning of the path. >Then, get the device and i-node numbers of that directory, back up to its >parent, and search for matching numbers, repeating until a directory and its >parent have the same device and i-node numbers (because /.. is /). That's more-or-less what the 4.3BSD "getwd" does. >Since my function is about 10 times slower than /bin/pwd, this is obviously >not the optimum method. > >Since the manpage for `getwd' on my system (Sun Unix 3.2) warns that a >failing `getwd' call may leave the current directory in the wrong place, I >assume that `getwd' moves around in the process of figuring it out. The 4.2BSD version did that (the SunOS 3.2 version is based on the 4.2BSD one). "/bin/pwd" in 4.[23]BSD and SunOS (and probably in other systems that picked it up from 4.[23]BSD) just calls "getwd" and prints the result. >But I don't see how this helps, using the same basic logic as my method. >You trade time constructing a big `../../../../..' name and using that in >all the lookups for moving around and cutting out all the `..'s. >But can that really make such a difference? Yes, since it avoids several levels of lookup; looking up "..", for example, is cheaper than looking up "../../../../..". Directory-name lookup caching (which 4.3BSD provides) helps, but it can't eliminate the performance hit entirely (you still have to copy longer pathnames into the kernel on typical UNIX systems, and still have to do some work on each component, even if you don't have to scan the directory).
mcgrath@paris.Berkeley.EDU (Roland McGrath) (04/03/89)
It did occur to me that looking up five levels of `..' could be quite slow, and perhaps changing directories really would make a significant difference. So I rewrote my function to change directories and stat "" (which past experience tells me is a real quick way to get the current directory with no name lookup). It may have been somewhat faster than before, but it was still orders of magnitude slower the the system `getwd' (Sun Unix 3.2). -- Roland McGrath Free Software Foundation, Inc. roland@wheaties.ai.mit.edu, mit-eddie!wheaties.ai.mit.edu!roland Copyright 1989 Roland McGrath, under the GNU General Public License, version 1.
flee@shire.cs.psu.edu (Felix Lee) (04/03/89)
Doing a "trace" shows that getwd() does not chdir(). Here is speculation on Sun's getwd() after trying to implement my own and watching a "trace". * It caches its result. Repeated calls to getwd() just make two stat() calls. Trying to time the performance of getwd() by putting it in a loop is meaningless. This is the big win. * It uses /tmp/.getwd to figure out where it is when it crosses a mount point. This is a minor gain. In my version of getwd(), I stat'ed around to find the mount point; commenting out the stat() calls saved about .005s system time (on a Sun4 running SunOS4.0). Any other differences in algorithm (using fstat() instead of stat(), using chdir(), etc.) have little effect on performance. -- Felix Lee flee@shire.cs.psu.edu *!psuvax1!shire!flee