[comp.lang.c] Hashing

U23405@UICVM (Michael J. Steiner) (08/27/88)

I hate to interupt the hashing discussion with something so trivial, but
I don't know what hashing is. Could someone point me to a reference about
it and/or tell me a little about what it is. (All that I know about it is
that it is a way of searching and storing information.)

                                                 Michael Steiner
                                                 Email: U23405@UICVM.BITNET

stacey@hcr.UUCP (Stacey Campbell) (09/11/88)

Michael Steiner writes:
>I don't know what hashing is. Could someone tell me a little about what it is.

I remember when hashing was first explained to me at university...

Lecturer: As you are all aware searching for an item in a linear list
  is an order N operation.

Students: Groan.  Kids stuff.

Lecturer: I am going to discuss today a method of storing and retrieving
  information that is much faster than this.  Storing and retrieving an
  item from a binary tree can be done as a order log N operation.

Captive Audience: Not trees again!

Lecturer: The method I will discuss can be achieved with order one
  operations.

Audience (agog): What? In your dreams perhaps!

Lecturer: First we select a field from the item to be stored, it
  doesn't really matter what the data type of the field is, but for
  the purposes of this discussion it will be an integer...

Audience: (cries of 'Rigged!')

Lecturer: ...then we use it as a seed for a pseudo uniform random
  number generator.  The random number that is generated is used
  as an index into a table where the item is then stored.  Providing
  the index fields are different for each item then the items will be
  stored evenly throughout the table.

Students: What's the use of that?  The data is now stored at some random
  spot in the table and it will need an order N search to get to it
  again.

Lecturer: To access an item in order 1 you merely give its index
  field to the random number generator, which returns the same index
  returned when you originally stored the item.  You then look up the
  table using the index.

Students: (getting very excitable) What happens if the table fills up?
  What if the random number generator returns the same value for two
  different index fields? Where does hashing come into this?
  What a stupid idea.

Lecturer: This whole process is called 'hashing'.  The table is called
  a 'hash table' and the random number generator is called a 'hash
  function', other types of strategies can be used to generate hash
  numbers.  Obviously it is going to be hard to design a hash function
  that will generate completely uniform hash numbers across the hash
  table.  At some stage an item will be hashed to a spot that is already
  used.  This is called a 'collision' and a number of strategies can
  be used for 'collision resolution'.
      If you are sure that there are more empty entries in the table than
  items to be stored you can keep re-hashing the index field until you
  reach an empty spot in the table.
      If it is possible to completely fill up the table then you can
  store items in a linear list at the table entry they hash to.  So
  looking up an entry would involve hashing the index field, then
  doing a linear search for the correct item in the list stored at
  that table entry.  You are not limited to using a linear list either,
  you can implement the storage for items that have collided as a
  binary tree, or even as another hash table.

Students: Unbelievable.  What will they think of next?
-- 
{lsuc,utzoo,utcsri}!hcr!hcrvax!stacey Stacey Campbell, HCR Corporation,
130 Bloor St W, Toronto, Ontario, Canada. +1 416 922 1937 X48

PAWKA@NOSC-TECR.ARPA (09/14/88)

> I hate to interupt the hashing discussion with something so trivial, but
> I don't know what hashing is. Could someone point me to a reference about
> it and/or tell me a little about what it is. (All that I know about it is
> that it is a way of searching and storing information.)

>                                                  Michael Steiner
>                                                  Email: U23405@UICVM.BITNET


     char ref[] = "ALGORITHMS by Robert Sedgewick, Chapter 16, pp. 201";

     						Mike