[comp.lang.prolog] Re : Elegance=efficiency, the continuing story

bimbart@kulcs.uucp (Bart Demoen) (07/28/88)

In <177@quintus.UUCP>, 20 Jul 88, ok compares three versions of
permute.
As a reminder :
	permute_1([], []).
	permute_1(X, [A|Z]) :-
		select(A, X, Y),
		permute_1(Y, Z).


	permute_2([], []).
	permute_2([X|Xs], Ys1) :-
		permute_2(Xs, Ys),
		select(X, Ys1, Ys).

	permute_3(Xs, Ys) :-
		permute_3(Xs, Ys, Ys).

	permute_3([], [], []).
	permute_3([X|Xs], Ys1, [_|Zs]) :-
		permute_3(Xs, Ys, Zs),
		select(X, Ys1, Ys).

	select(X, [X|L], L) .
	select(X, [Y|L], [Y|SL]) :-
		select(X,L,SL) .

We added 3 more versions :

	permute_4([], []).
	permute_4([X|L1], [A|Z]) :-
		select(A, [X|L1], Y),
		permute_4(Y, Z).

	permute_5([], []).
	permute_5([X|L1], [A|Z]) :-
		select(A, Y, X, L1),
		permute_5(Y, Z).

	select(X, L1, X, L1) .
	select(A, [Y|L2], Y, [X|L1]) :-
		select(A, L2, X, L1) .

	permute_6(X, Y) :-
		same_length(X,Y),
		permute_2(X,Y) .

	same_length([], []) .
	same_length([X|L1],[Y|L2]) :-
			same_length(L1,L2) .

Then we compared these versions.

I. permute_3 <--> permute_6

permute_3:

	permute_3(Xs, Ys) :-
		permute_3(Xs, Ys, Ys).

	permute_3([], [], []).
	permute_3([X|Xs], Ys1, [_|Zs]) :-
		permute_3(Xs, Ys, Zs),
		select(X, Ys1, Ys).

permute_6:

	permute_6(X, Y) :-
		same_length(X,Y),
		permute_2(X,Y) .

	same_length([], []) .
	same_length([X|L1],[Y|L2]) :-
			same_length(L1,L2) .

They execute in the same time.
But :
 . permute_3 is intended to be symmetrical in use,
   so one would hope for a symmetrical definition, but permute_3 isn't
 . permute_6 is symmetric, so it is easier to understand that it can
   be used in a symmetric way
 . programs should never be obscured by tricks unless one is sure
   it pays ... : elegance=efficiency

Therefore : permute_6 is superior to permute_3 .

II. permute_1 <--> permute_2

The first comparison is a time comparison between permute_1 and permute_2.
In his article, ok writes :

	... (permute_2 is) 5 times faster (than permute_1) for
	10 element lists ...

The same difference is found using BIM_Prolog.
ok does not explain this difference, we'll try to do that here.
permute_4 is derived from permute_1, but the first argument of the second
clause of permute was explicitly written as a list. The motivation
is that this way indexing will be possible. The BIM_prolog compiler
would generate indexing code for the second argument of permute_1, which
is never usefull if called with pattern : permute_1(+,-) .
This indexing can be turned off with a directive.

As noted in other articles in this newsgroup, common subterm elimination
while using the type info for indexing, would be nice.

This reduces the gap between permute_1 and permute_2, but it remains big.

Permute_5 is derived from permute_4 in an attemp to reduce the overhead
of analyzing the argument and then just reconstructing it.
Permute_5 is about twice as fast as the original permute_1.
Another improvement is made, but the time difference is still big :
permute_5 is 4 times slower than permute_2.

So, there must be a more fundamental problem with permute_1.
The reason for the time difference can be completely explained
by the backtracking behaviour.

permute_1 proceeds as follows :
	call select (choicepoint)
	select succeeds
	call permute_1 (environment)
	call select (choicepoint)
	select succeeds
	call permute_1 (environment)
	...
	call select
	select succeeds
	permute_1 succeeds

Upon backtracking, all calls and choicepoint creations have to be repeated.
A recursive description of number of Environment frames and Choicepoint
frames can be obtained (N is the length of the list to be permuted):

Environment frames :

	E_1(N) = 1 + N * E_1(N-1)

Choicepoints :

	C_1(N) = N + N * C_1(N-1)

permute_2 proceeds as follows :
	call permute_2 (environment)
	call permute_2 (environment)
	call permute_2 (environment)
	...
	permute_2 succeeds
	exec select (choicepoint)
	exec select (choicepoint)
	exec select (choicepoint)
	...
	select succeeds

Environment frames :

	E_2(N) = N

Choicepoints :

	C_2(N) = C_2(N-1) + N!

Now there is a difference in weigth between E's and C's.

the labels 1,2,4,8 are weight factors (f): 1 C ~ 1,2,4,8 E
the labels 6,7,8,9,10 are list lenghts N
the table entries are (E_1(N) + f * C_1(N)) / (E_2(N) + f * C_2(N))

	1		2		4		8
 6	3.632537	2.938927	2.590337	2.415594
 7	3.776858	3.047241	2.682108	2.499461
 8	3.868450	3.119590	2.745112	2.557861
 9	3.935107	3.173108	2.792102	2.601597
10	3.987050	3.214961	2.828916	2.635894

This suggest a factor of 2.5 - 3.0 between the two methods.

Timings :

length(L)	permute_2	permute_5	permute_5/permute_2

 7		  0.200000	  0.740000	3.7
 8		  1.600000	  6.100000	3.8
 9		 14.040001	 54.080002	3.8
10		139.559998	541.219971	3.8

the estimation indicates that the cost associated with E is comparable to the
cost of C. Notice that the cost of E is not just the cost of pushing an
environment, it also includes the passing of the arguments, the call ...

The remarkable thing in this case is that the natural
compulsion to write tail recursive procedures is wrong in
this kind of procedures !

Tail recursive
	process([],Res,Res) .
	process([X|L],ResIn,ResOut) :-
		action(X,ResIn,ResBetw),
		process(L,ResBetw,ResOut) .

Not tail recursive
	process([],Res,Res) .
	process([X|L],ResIn,ResOut) :-
		process(L,ResIn,ResBetw),
		action(X,ResBetw,ResOut) .

If action/3 is deterministic, one should clearly write the tail recursive
version. If it is not, the best way to write process/3 is not tail recursive.


Andre' Marien Bart Demoen

BIM-KUL

US : sun!sunuk!sunbim!kulcs!bimandre	sun!sunuk!sunbim!kulcs!bimbart
Europe : prlb2!kulcs!bimandre		prlb2!kulcs!bimbart