kazar+@andrew.cmu.EDU (Mike Kazar) (09/21/87)
The Andrew (a.k.a. Vice) file system developed here at Carnegie-Mellon U. (as well as Multics, by the way) uses hashed directories. We have found that they work very well. The performance improvment is especially visible with applications that stat every file in a directory. Our directories are divided up into 2K byte pages, and each page is divided into 64 32-byte file entries. We put a hash table at the beginning of the directory (128 hash pointers), and chain file entries off from this hash table. Representing an entry for a file may require more than one 32-byte file entry (if the file's name is a long one), in which case a contiguous block of file entries is allocated, always within one page, for that file. Because the blocks of file entries allocated for a particular file never cross page a boundary, we've have very little trouble with crashes corrupting directories. The biggest cost to this scheme, of course, is allocating 400-600 bytes of directory header for every directory. The advantage is speed. When using hashed directories instead of normal Unix ones, we've found that the cost of searching a directory essentially becomes a constant, independent of directory size. There are still fixed costs associated with searching any directory (about 1 ms per dir on an RT/PC running 4.2, using either hashed or linearly-searched dirs), but the search cost per-directory entry have dropped by a factor of 250. In short, we found that the disk space represents a small price to pay for the improved performance obtained. Michael Kazar Information Technology Center Carnegie-Mellon University Pittsburgh, PA 15213 Kazar@andrew.cmu.edu (ARPA)