[net.lang.pascal] Pointers and hashing

joel@decwrl.UUCP (Joel McCormack) (01/31/85)

    Hashing IS NOT an O(1) operation!

    Hashing is an O(N)/buckets operation, where many times you can use enough
buckets to make it very fast.

-- 
- Joel McCormack {ihnp4 decvax ucbvax allegra sequent utcsrgv}!decwrl!joel
		 joel@decwrl.arpa

ee163acp@sdcc13.UUCP (DARIN JOHNSON) (02/01/85)

> 
>     Hashing IS NOT an O(1) operation!
> 
>     Hashing is an O(N)/buckets operation, where many times you can use enough
> buckets to make it very fast.
> 
   True, but the beauty of the whole thing is that on the average, hash
   seems like O(1), and only rarely will you get a bad case.


   Darin Johnson @UCSD

gvcormack@watdaisy.UUCP (Gordon V. Cormack) (02/04/85)

> 
>     Hashing IS NOT an O(1) operation!
> 
>     Hashing is an O(N)/buckets operation, where many times you can use enough
> buckets to make it very fast.
> 
> -- 
> - Joel McCormack {ihnp4 decvax ucbvax allegra sequent utcsrgv}!decwrl!joel
> 		 joel@decwrl.arpa

Thus, if we use O(N) buckets, we have O(1) (average case) retrieval
time.  If O(1) worst-case retrieval time is required, a perfect hash
scheme is necessary.  I have written about one such scheme in
Computer Journal 28:1 (Feb. 85).

Complexity considerations aside, there are almost no circumstances 
where binary search is superior to hashing for unique-key unordered
retrievals.

It seems there should be a better newsgroup in which to continue 
discussions such as this, but I can't think of one.
-- 
 
Gordon V. Cormack      CS Department, University of Waterloo
   gvcormack@watdaisy.uucp     gvcormack%watdaisy@waterloo.csnet

jeff@gatech.UUCP (Jeff Lee) (02/07/85)

> 
>     Hashing IS NOT an O(1) operation!
> 
>     Hashing is an O(N)/buckets operation, where many times you can use enough
> buckets to make it very fast.
> 

I would sure like to know how you came up with this information. It seems to
me that you must include the collision handling into this before you can come
up with a blanket statement about what the results are. If fact, I think that
you'll see that with the appropriate collision handling, you come up with an
algorithm that is essentially O(1). Remember, if one operation takes 3 seconds
and another takes 7 seconds they may both be O(1) if they take the same amount
of time no matter what size of input you are dealing with. Before I go into
the different collision schemes, what about a minimal perfect hash ?? The
operation is guaranteed to be O(1) and the table is completely full.

In the following, let (a = number of items to hash/number of buckets) which
is the loading density expressed as a percentage-type thing (eg, .4). Hash
tables operate a little differently in that there are two times associated
with them:
	1) the time to locate an item in the table;
	2) the time to find out that an item is not in the table.

With linear open addressing (use the next open slot) we get

	1) the time to locate an item in the table is
		.5 * (1 + (1 / (1 - a)))
	2) the time to find out it isn't in the table is
		.5 * (1 + (1 / (1 - a)**2))

With rehashing, random probing, and quadratic probing

	1)	1 / (1 - a)
	2)	- (1 / a) * ln(1 - a)

And finally, for linear chaining (which is normally the best)

	1)	a
	2)	1 + (a / 2)

You may also chain the collision elements in a binary tree and end up with
some sort of log to the base 2 instead of the linear element on 'a' and get
an even greater increase in speed which increases logorithmically instead
of linearly (like most of the rest of them).

Some work a friend of mine did was with fixed size tables using a portion of
the table as an overflow chain area (the hash table was 85% of total space
allocated and the overflow area reserved was 15%). When the entire table was
full, the total number of probes was never over about 4.1. This was with
a totally full table of about 75 to 100 elements. It is essentially chained
linear hashing with a fixed amount of space (unlike above which assumes that
you have infinite overflow space available). When the overflow section fills
up, you start allocating space from the hash table so that when the table is
full, there are no empty slots.

	Hope this helps,
-- 
Jeff Lee
CSNet:	Jeff @ GATech		ARPA:	Jeff.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!jeff

jeff@gatech.UUCP (Jeff Lee) (02/07/85)

Oh,... I forgot. You can get this information out of Knuth (searching and
sorting) or Horowitz and Sahni (Fundamentals of Data Structures).

-- 
Jeff Lee
CSNet:	Jeff @ GATech		ARPA:	Jeff.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!jeff