U23405@UICVM (Michael J. Steiner) (08/27/88)
I hate to interupt the hashing discussion with something so trivial, but I don't know what hashing is. Could someone point me to a reference about it and/or tell me a little about what it is. (All that I know about it is that it is a way of searching and storing information.) Michael Steiner Email: U23405@UICVM.BITNET
stacey@hcr.UUCP (Stacey Campbell) (09/11/88)
Michael Steiner writes: >I don't know what hashing is. Could someone tell me a little about what it is. I remember when hashing was first explained to me at university... Lecturer: As you are all aware searching for an item in a linear list is an order N operation. Students: Groan. Kids stuff. Lecturer: I am going to discuss today a method of storing and retrieving information that is much faster than this. Storing and retrieving an item from a binary tree can be done as a order log N operation. Captive Audience: Not trees again! Lecturer: The method I will discuss can be achieved with order one operations. Audience (agog): What? In your dreams perhaps! Lecturer: First we select a field from the item to be stored, it doesn't really matter what the data type of the field is, but for the purposes of this discussion it will be an integer... Audience: (cries of 'Rigged!') Lecturer: ...then we use it as a seed for a pseudo uniform random number generator. The random number that is generated is used as an index into a table where the item is then stored. Providing the index fields are different for each item then the items will be stored evenly throughout the table. Students: What's the use of that? The data is now stored at some random spot in the table and it will need an order N search to get to it again. Lecturer: To access an item in order 1 you merely give its index field to the random number generator, which returns the same index returned when you originally stored the item. You then look up the table using the index. Students: (getting very excitable) What happens if the table fills up? What if the random number generator returns the same value for two different index fields? Where does hashing come into this? What a stupid idea. Lecturer: This whole process is called 'hashing'. The table is called a 'hash table' and the random number generator is called a 'hash function', other types of strategies can be used to generate hash numbers. Obviously it is going to be hard to design a hash function that will generate completely uniform hash numbers across the hash table. At some stage an item will be hashed to a spot that is already used. This is called a 'collision' and a number of strategies can be used for 'collision resolution'. If you are sure that there are more empty entries in the table than items to be stored you can keep re-hashing the index field until you reach an empty spot in the table. If it is possible to completely fill up the table then you can store items in a linear list at the table entry they hash to. So looking up an entry would involve hashing the index field, then doing a linear search for the correct item in the list stored at that table entry. You are not limited to using a linear list either, you can implement the storage for items that have collided as a binary tree, or even as another hash table. Students: Unbelievable. What will they think of next? -- {lsuc,utzoo,utcsri}!hcr!hcrvax!stacey Stacey Campbell, HCR Corporation, 130 Bloor St W, Toronto, Ontario, Canada. +1 416 922 1937 X48
PAWKA@NOSC-TECR.ARPA (09/14/88)
> I hate to interupt the hashing discussion with something so trivial, but > I don't know what hashing is. Could someone point me to a reference about > it and/or tell me a little about what it is. (All that I know about it is > that it is a way of searching and storing information.) > Michael Steiner > Email: U23405@UICVM.BITNET char ref[] = "ALGORITHMS by Robert Sedgewick, Chapter 16, pp. 201"; Mike