rokicki@rocky.STANFORD.EDU (Tomas Rokicki) (09/03/87)
Let's first calculate how many blocks must be read to locate an executable in a particular directory. On the Amiga, directories are organized as bucket hash tables of size, say, N. (I think N is something like 74 or 90 or so, but I don't have my manuals handy.) Under such a scheme, you have to read the directory, and you have to read M additional header blocks, where the file you want is Mth in that particular bucket. If you have F files in the directory randomly distributed, M_mean = a * (F/2N + 1/2) where a is some number greater than but close to 1. (This number takes into account the fact that the buckets do not fill exactly evenly.) If the file isn't found, you need to probe on the average M_mean = a * F/N before you realize that. So let us say you have f total files, distributed over d directories. Each directory has f/d files. On the average, you will search d/2-1/2 directories before searching the correct one. Thus, your total probes are: P = (d/2-1/2)(1 + af/dN) + 1 + a(f/2dN + 1/2) = a/2 + 1/2 + d/2 + af/2N Since f, a, and N are positive constants, you need to reduce d. Thus, you want to have fewer directories. Let's put some typical numbers in there. If a = 1.1, f = 150, and N=100, then the above reduces to 1.875 + d/2. Now we start adding some real world considerations. First of all, searching a directory such as ram: or vd0: is very nearly free, so put any files there that you can. Secondly, directories might be on different disks or different partitions of the same disk, so there is some time involved in going from one directory to the next. Thirdly, you want to put the most likely directory first. Thus, if we can assume that you will only need to search d/5-1/5 directories on average before finding the correct one (which is much more likely), then the equation becomes P = (d/5-1/5)(1 + af/dN) + 1 + a(f/2dN + 1/2) = a/2 + 4/5 + d/5 + af/5N + 3af/10dN which is slighly more complex. We can minimize this function with a little bit of calculus, to show that P is minimized when it is equal to sqrt(1.5aF/N), which, for our example numbers, is about 1.58. So again, few large directories. There may be errors in this analysis, as I did it extemporaneously (and all of the math in my head) but I believe it's fairly accurate. -tom
ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) (09/03/87)
In article <550@rocky.STANFORD.EDU> rokicki@rocky.STANFORD.EDU (Tomas
Rokicki) writes an industrial-strength article about relative efficiency in
scanning many directories versus fewer directories.
This should not be taken to mean that you should not use directories
at all. Witness Activision's "Portal". Do a 'dir' on it some time. You
should get a file list in about three minutes.
There are over 300 files in that root directory. When the program
tried to open one, the DOS takes about five hashing collisions before it
finds it. Evidently, the programmer never heard of a directory before.
So use the bloody things, and cut down on hashing collisions.
[ Note: The "Portal" disk is an extreme case, but still a real-world one. ]
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Leo L. Schwab -- The Guy in The Cape ihnp4!ptsfa -\
\_ -_ Bike shrunk by popular demand, dual ---> !{well,unicom}!ewhac
O----^o But it's still the only way to fly. hplabs / (pronounced "AE-wack")
"Work FOR? I don't work FOR anybody! I'm just having fun." -- The Doctor
jdm@pnet02.CTS.COM (John Mesiavech) (09/04/87)
This is just mentioned as an aside to Tomas Rokicki's excellent breakdown of which to use: Large directories or small...BUT there is a kicker in the hole, and that is the special case of an Amiga user with a hard disk. When you get => 100 files per directory, things become slow, simply because of the large amount of files on the disk. So, on a hard disk, having directories of < 100 files would be best. 'Sides, with truly large directories (I have a directory with 212 files in it), there's not a DirUtil in the known universe that will handle it! 8^> John UUCP: {ihnp4!crash, hplabs!hp-sdd!crash}!gryphon!pnet02!jdm INET: jdm@pnet02.CTS.COM
hcm@hpclla.HP.COM (Harry Muttart) (09/05/87)
There actually is a (generic) "Dirutil" that will handle large directoies... The one written in Modula-2! If you happen to have a TDI compiler, you can bump it up from the default 300-files-per-directory limit, trivially. Incidentally, the author (sorry, can't remember name), indicated that he wrote this tool, in part, to be able to list the M2: (module library) directory for TDI, which contains over 200 files. Other dirutils seem to quietly stop reading entries! Oh yeah...the program with source can be found on one the Fish disks in the 50's or 60's. The name is DuM2, which is probably why few have heard of it (who would bother browsing a directory whose name suggests "dumb, too"). Harry Muttart hpda!hcmutt (Just my opinions above, not my employer's.)