[comp.lang.prolog] indexing of dynamic code with hashing or binary search

bimbart@kulcs.uucp (Bart Demoen) (10/27/89)

At NACLP'89, some people asked why we choose hashing for dynamic indexing over
binary search: they argued that binary trees are easier to update and
that hashing could outperform binary search only for really big tables. The
former is difficult to prove or disprove, but I felt compelled to test the
latter. I implemented a builtin predicate bin_index/1, similar to the rehash/2
builtin (of BIM_Prolog), but which 'binary indexes' a chosen - dynamic -
predicate and I did some testing. The test program looks essentially like this

test(N) :-
	M is 2 ^ N - 1 ,
	assert_facts(M)) ,	% asserts fact(1) . fact(2) . ... fact(M) .
	rehash(fact/1,M) ,	% no collisions because of this choice
	time(call_fact(M),Thash) ,
	bin_index(fact/1) ,	% the binary tree is perfectly balanced
	time(call_fact(M),Tbin) ,
	write(Thash) , nl , write(Tbin) .

call_fact(0) :- ! .
call_fact(N) :- fact(N)  , M is N - 1 , call_fact(M) .

The results for different N are summarised in the table below. On purpose,
neither the machine, nor the repetition factor of the test is given.


N	#facts	hash-success	bin-success	overhead of the test harness

2	3	0.17		0.17		0.1
3	7	0.37		0.36		0.19
4	15	0.75		0.78		0.37
5	31	1.51		1.62		0.72
6	63	3.03		3.43		1.44
7	127	6.08		7.12		2.87

The table shows - as expected - that hash-success and overhead double as the
number of facts doubles, but that bin-success more than doubles.
But it also shows that hashing is a reasonable choice even for a small number of
facts, while binary indexing might be unreasonable for a large number of facts.
So, I feel we have made the correct choice. I only worry about possible comments
coming from the southern hemisphere ...

I have similar results for  hash-fail and bin-fail - where calls to fact(0.5),
fact(1.5), ...  , fact(N+0.5) are made (yes, the type of an integer and a float
are the same internally in BIM_Prolog, so that this was a reasonable test).

Just to take away some suspicion you might have: hashing and binary search were
both implemented in C, with the same relevant variables declared as registers
and in the same style of coding.

I am interested in similar or contradicting experiences.

This mail expresses my personal opinion and experience, not BIM's nor KUL's !


bimbart@kulcs.uucp