[comp.lang.c] Dynamic Hashing

unhd (Jonathan W Miner) (10/27/90)

The interest in dynamic hashing was so great, and half my return e-mail
bounced.  Here is a brief numeric example:


 Dynamic Hashing ::: Fagin's Approach

Fagin, R., J.Nievergelt, N.Pippenger, and H.R.Strong. "Extendible
hashing -- a fast access method for dynamic files." ACM _Transactions
on Database Systems_ 4, no.3 (September 1979): 315-344


This method avoids collisions by expanding both the data storage
area and the hash function.  This explaination was used in a database
course at the University of New Hampshire.

Each database record has a key field, in this case a number, which will
be converted into binary.  The keys and their hashed values for this
example are:

    Key    Binary
    34     100010
    17     010001   These records will be inserted in this order.
     5     000101
    28     011100
    25     011001
    50     110010
    55     110111
     8     001000

Data will be stored as pages, with two records per page.

            Start Condition:

       Directory        Pages
         +---+
         | 0 | ------>(    ,    )      Two pages allocated.
         +---+
         | 1 | ------>(    ,    )
         +---+
The data is inserted, using the leftmost bit as the hash value.
After insertion, it looks like this:
      
       Directory        Pages
         +---+
         | 0 | ------>(  5 , 17 )  
         +---+
         | 1 | ------>( 34 ,    )
         +---+
The next value, 28 can not be put in the first page, so the directory
size is doubled, and a new page is allocated:

       Directory        Pages
         +----+
         | 00 | ------>(  5 ,    )  
         +----+
         | 01 | ------>( 17 , 28 )
         +----+
         | 10 | ------>( 34 ,    )
         | 11 |
         +----+
The next value, 25 causes the same problem, and the directory is split
again:

       Directory        Pages
         +-----+
         | 000 | ------>(  5 ,    )  
         | 001 |
         +-----+
         | 010 | ------>( 17 ,    )
         +-----+
         | 011 | ------>( 25 , 28 )
         +-----+
         | 100 | ------>( 34 ,    )
         | 101 |
         | 110 |
         | 111 |
         +-----+
The 50 can be inserted with the 34.

       Directory        Pages
         +-----+
         | 000 | ------>(  5 ,    )  
         | 001 |
         +-----+
         | 010 | ------>( 17 ,    )
         +-----+
         | 011 | ------>( 25 , 28 )
         +-----+
         | 100 | ------>( 34 , 50 )
         | 101 |
         | 110 |
         | 111 |
         +-----+
The 55 causes the directory to remain the same, but a new page
must be allocated.  After that the 8 is inserted with the 5.

       Directory        Pages
         +-----+
         | 000 | ------>(  5 ,  8 )  
         | 001 |
         +-----+
         | 010 | ------>( 17 ,    )
         +-----+
         | 011 | ------>( 25 , 28 )
         +-----+
         | 100 | ------>( 34 ,    )
         | 101 |
         +-----+
         | 110 | ------>( 50 , 55 )
         | 111 |
         +-----+
The example is easier than a textual description.  And this should give
you a good idea as to what goes on.


-- 
-----------------------------------------------------------------
Jonathan Miner        | I don't speak for UNH, and UNH does not 
jwm775@unhd.unh.edu   | speak for me! 
(603)868-3416         | Rather be downhill skiing... or hacking...