[comp.unix.wizards] hashed directories

kazar+@andrew.cmu.EDU (Mike Kazar) (09/21/87)

The Andrew (a.k.a. Vice) file system developed here at Carnegie-Mellon U. (as
well as Multics, by the way) uses hashed directories.  We have found that
they work very well.  The performance improvment is especially visible with
applications that stat every file in a directory.

Our directories are divided up into 2K byte pages, and each page is divided
into 64 32-byte file entries.  We put a hash table at the beginning of the
directory (128 hash pointers), and chain file entries off from this hash
table.  Representing an entry for a file may require more than one 32-byte
file entry (if the file's name is a long one), in which case a contiguous
block of file entries is allocated, always within one page, for that file.
Because the blocks of file entries allocated for a particular file never
cross page a boundary, we've have very little trouble with crashes corrupting
directories.

The biggest cost to this scheme, of course, is allocating 400-600 bytes of
directory header for every  directory.

The advantage is speed.  When using hashed directories instead of normal Unix
ones, we've found that the cost of searching a directory essentially becomes
a constant, independent of directory size.  There are still fixed costs
associated with searching any directory (about 1 ms per dir on an RT/PC
running 4.2, using either hashed or linearly-searched dirs), but the search
cost per-directory entry have dropped by a factor of 250.

In short, we found that the disk space represents a small price to pay for
the improved performance obtained.

Michael Kazar
Information Technology Center
Carnegie-Mellon University
Pittsburgh, PA 15213

Kazar@andrew.cmu.edu		(ARPA)