jik@athena.mit.edu (Jonathan I. Kamens) (11/10/89)
In article <2586@unisoft.UUCP> greywolf@unisoft.UUCP (The Grey Wolf) writes: >Portable, almost. Usable by anyone, probably not. > >It would require being run as a super-user, and it would be fairly quick -- >whether it's faster than a stat or not, I'm not sure. What you could >do is, once you get the entry, try and chdir() to it. If it works, it's >a directory, otherwise it's a file. CAVEAT (obvious): this is not fool- >proof if you're running a system with symbolic links. Bad idea for several reasons. First of all, after you've used chdir() several times (or even once) to go down a directory tree, you have to use chdir("..") to get back up to the top. In general, I find that it is a bad idea to change the current working directory of a process unless you are *sure* that you can get back to where you started. You're not sure in this case. Now, I know that you said "it would require being run as a super-user", by which I assume that you meant to imply (among other things) that the program would have read access to all directories and therefore be able to get back to where it started no matter what. This is not necessarily true, now that we're in the age of remote filesystems (NFS, AFS, etc.). Root on my workstation does not have root access to NFS filesystems I have mounted. >I think, IMHO, you're better off going with stat(). Yup, I think so too. That's how I do it in the code I've written. One final note: an interesting question is whether it's faster to (a) stat() a file and use opendir() on it only if it's a directory, or (b) just do the opendir() on it and keep going if the opendir() succeeds. I've found that (a) is much faster because opendir() always does an open() on the file, even when it's not a directory. Therefore, when you try to opendir() a non-directory, it's got to do the open, then realize that it's not a directory using fstat, then close the file. Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8495 Home: 617-782-0710
dbin@norsat.UUCP (Dave Binette) (11/10/89)
The consensus for the question "Anything faster than stat(S)?" is "No" Now for plan "B" (ignore me if I am trying your patience) "Is there anything faster than opendir(S), and readdir(S)" I can't and don't want to use ftw(S) Or maybe I should just ask.... WHY does it take so LONG to count the number of directorys and files below "." I've written a little program called "ldir" that counts files and subdirs to the level you specify as in "ldir -d2" its not THAT slow but in my application it is used a lot and speed is... well you know the story. Tell ya what... I'll post it if you think you can do it faster, smaller and/or better. Here is what happens from vi when I do !!(cd /usr/spool/news/comp; time ldir -d2 [ab]*; w) ------------------------------------------------------------ ai ( 30 Files 6 Dirs) arch ( 64 Files 0 Dirs) archives ( 0 Files 0 Dirs) binaries (+ 9 Files 8 Dirs) bugs (+ 5 Files 5 Dirs) real 4.0 user 0.0 sys 0.5 11:05pm up 2 days, 4 mins, 2 users, load average: 0.19, 0.14, 0.01 ------------------------------------------------------------ Subsequent invocations produce: (obviously cached) ... real 0.8 user 0.0 sys 0.3 ... real 0.5 user 0.0 sys 0.3 Does the user time of 0.0 mean there is no hope? Where do the other 3.5 0.5 and 0.2 seconds go? (real - sys) Oh yeah, we are running: (Compaq 386-20, 5Meg Ram, 2 28ms ST4096) sysname=XENIX nodename=norsat release=2.3.1 version=SysV machine=i80386 -- My girlfriend is a screamer..., my computer just HUMMMMs uucp: {uunet,ubc-cs}!van-bc!norsat!dbin | 302-12886 78th Ave bbs: (604)597-4361 24/12/PEP/3 | Surrey BC CANADA voice: (604)597-6298 (Dave Binette) | V3W 8E7
jfh@rpp386.cactus.org (John F. Haugh II) (11/10/89)
In article <15769@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes: >In article <2586@unisoft.UUCP> greywolf@unisoft.UUCP (The Grey Wolf) writes: >>I think, IMHO, you're better off going with stat(). > > Yup, I think so too. That's how I do it in the code I've written. Just for the sake of disagreeing, what about other system calls that are able to distinguish between a file being a directory or not? The error return entries for access(1) tell me something useful - A component of the path prefix is not a directory. [ENOTDIR] The named file does not exist. [ENOENT] How about code like this - #include <errno.h> isadir (char *path) { char dir[PATH_MAX]; if (access (path, 0)) return 0; strcpy (dir, path); strcat (dir, "/x"); errno = 0; access (dir, 0); return errno == 0 || errno == ENOENT; } We know all of the initial path exists because of the first access() call. And with the second access() call we can discover if the last component of `path' isn't a directory since errno would be ENOTDIR rather than ENOENT. Ain't perfect either, but maybe better? -- John F. Haugh II +-Things you didn't want to know:------ VoiceNet: (512) 832-8832 Data: -8835 | The real meaning of EMACS is ... InterNet: jfh@rpp386.cactus.org | ... EMACS makes a computer slow. UUCPNet: {texbell|bigtex}!rpp386!jfh +--<><--<><--<><--<><--<><--<><--<><---
cpcahil@virtech.uucp (Conor P. Cahill) (11/11/89)
In article <17264@rpp386.cactus.org>, jfh@rpp386.cactus.org (John F. Haugh II) writes: > Just for the sake of disagreeing, what about other system calls that > are able to distinguish between a file being a directory or not? > > How about code like this - > [sample of using access() deleted] > > We know all of the initial path exists because of the first access() > call. And with the second access() call we can discover if the > last component of `path' isn't a directory since errno would be ENOTDIR > rather than ENOENT. So you want to replace a single call to stat() with multiple calls to access(). That doesn't make any sense since the major overhead to both the stat and access system calls is that the path must be traversed and since you are calling access twice, you have to traverse the path twice. stat() is the most effecient mechanism that can be used to obtain information about a file system entry since it just looks up an inode and copies the data to the user's data area. If you are stating all entries in a directory on a very heavily loaded system you could probably get some performance gain by chdir()ing to the directory and then stating the entities with just the basename (thereby not having to parse the path every time). This shouldn't have much of an effect on a lightly loaded system due to caching. -- +-----------------------------------------------------------------------+ | Conor P. Cahill uunet!virtech!cpcahil 703-430-9247 ! | Virtual Technologies Inc., P. O. Box 876, Sterling, VA 22170 | +-----------------------------------------------------------------------+
jfh@rpp386.cactus.org (John F. Haugh II) (11/12/89)
In article <1989Nov11.154312.6675@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes: >So you want to replace a single call to stat() with multiple calls to >access(). That doesn't make any sense since the major overhead to both >the stat and access system calls is that the path must be traversed and since >you are calling access twice, you have to traverse the path twice. The objective was to take advantage of path-name caching on BSD systems. Of course, if you know "/path/name" exists, you only need -one- call to access() with "/path/name/foo" and you save the mumbo-jumbo required to get data from kernel to user space. >stat() is the most effecient mechanism that can be used to obtain information >about a file system entry since it just looks up an inode and copies >the data to the user's data area. Probably true. Now, go off and actually run the benchmarks. =Always= question everything. On some machines copies from system to user address space are cheap. On others it can be =very= difficult. There is a big difference between a Vax where the supervisor and user have separate address spaces which can be directly addressed one from the other, and a PDP-11 where the supervisor and user occupy the same address space and have no [ MTPD and MFPD aren't implemented on all PDP-11 CPUs! ] easy way of communicating short of mapping memory all over God's creation. Anyway, it was only meant to stimulate discussion. The only portable and clean solution =is= to use stat(). I can't stand clever hacks, unless I write them myself ;-) -- John F. Haugh II +-Things you didn't want to know:------ VoiceNet: (512) 832-8832 Data: -8835 | The real meaning of EMACS is ... InterNet: jfh@rpp386.cactus.org | ... EMACS makes a computer slow. UUCPNet: {texbell|bigtex}!rpp386!jfh +--<><--<><--<><--<><--<><--<><--<><---