joel@decwrl.UUCP (Joel McCormack) (01/31/85)
Hashing IS NOT an O(1) operation! Hashing is an O(N)/buckets operation, where many times you can use enough buckets to make it very fast. -- - Joel McCormack {ihnp4 decvax ucbvax allegra sequent utcsrgv}!decwrl!joel joel@decwrl.arpa
ee163acp@sdcc13.UUCP (DARIN JOHNSON) (02/01/85)
> > Hashing IS NOT an O(1) operation! > > Hashing is an O(N)/buckets operation, where many times you can use enough > buckets to make it very fast. > True, but the beauty of the whole thing is that on the average, hash seems like O(1), and only rarely will you get a bad case. Darin Johnson @UCSD
gvcormack@watdaisy.UUCP (Gordon V. Cormack) (02/04/85)
> > Hashing IS NOT an O(1) operation! > > Hashing is an O(N)/buckets operation, where many times you can use enough > buckets to make it very fast. > > -- > - Joel McCormack {ihnp4 decvax ucbvax allegra sequent utcsrgv}!decwrl!joel > joel@decwrl.arpa Thus, if we use O(N) buckets, we have O(1) (average case) retrieval time. If O(1) worst-case retrieval time is required, a perfect hash scheme is necessary. I have written about one such scheme in Computer Journal 28:1 (Feb. 85). Complexity considerations aside, there are almost no circumstances where binary search is superior to hashing for unique-key unordered retrievals. It seems there should be a better newsgroup in which to continue discussions such as this, but I can't think of one. -- Gordon V. Cormack CS Department, University of Waterloo gvcormack@watdaisy.uucp gvcormack%watdaisy@waterloo.csnet
jeff@gatech.UUCP (Jeff Lee) (02/07/85)
> > Hashing IS NOT an O(1) operation! > > Hashing is an O(N)/buckets operation, where many times you can use enough > buckets to make it very fast. > I would sure like to know how you came up with this information. It seems to me that you must include the collision handling into this before you can come up with a blanket statement about what the results are. If fact, I think that you'll see that with the appropriate collision handling, you come up with an algorithm that is essentially O(1). Remember, if one operation takes 3 seconds and another takes 7 seconds they may both be O(1) if they take the same amount of time no matter what size of input you are dealing with. Before I go into the different collision schemes, what about a minimal perfect hash ?? The operation is guaranteed to be O(1) and the table is completely full. In the following, let (a = number of items to hash/number of buckets) which is the loading density expressed as a percentage-type thing (eg, .4). Hash tables operate a little differently in that there are two times associated with them: 1) the time to locate an item in the table; 2) the time to find out that an item is not in the table. With linear open addressing (use the next open slot) we get 1) the time to locate an item in the table is .5 * (1 + (1 / (1 - a))) 2) the time to find out it isn't in the table is .5 * (1 + (1 / (1 - a)**2)) With rehashing, random probing, and quadratic probing 1) 1 / (1 - a) 2) - (1 / a) * ln(1 - a) And finally, for linear chaining (which is normally the best) 1) a 2) 1 + (a / 2) You may also chain the collision elements in a binary tree and end up with some sort of log to the base 2 instead of the linear element on 'a' and get an even greater increase in speed which increases logorithmically instead of linearly (like most of the rest of them). Some work a friend of mine did was with fixed size tables using a portion of the table as an overflow chain area (the hash table was 85% of total space allocated and the overflow area reserved was 15%). When the entire table was full, the total number of probes was never over about 4.1. This was with a totally full table of about 75 to 100 elements. It is essentially chained linear hashing with a fixed amount of space (unlike above which assumes that you have infinite overflow space available). When the overflow section fills up, you start allocating space from the hash table so that when the table is full, there are no empty slots. Hope this helps, -- Jeff Lee CSNet: Jeff @ GATech ARPA: Jeff.GATech @ CSNet-Relay uucp: ...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!jeff
jeff@gatech.UUCP (Jeff Lee) (02/07/85)
Oh,... I forgot. You can get this information out of Knuth (searching and sorting) or Horowitz and Sahni (Fundamentals of Data Structures). -- Jeff Lee CSNet: Jeff @ GATech ARPA: Jeff.GATech @ CSNet-Relay uucp: ...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!jeff