[comp.lang.prolog] Parallel: lists or trees ?

ok@quintus (07/22/88)

I was looking at some FCP code recently, and I was irritated by the way
it was using lists all over the place.  If you have a collection of N
elements represented as a list, it is going to take you N reductions to
visit them all, do what you will.  Of course, the N visits can then
proceed in parallel, but it takes so long to get started.

So I decided to try a small experiment.  The task isn't very interesting:
it is simply to make a collection of the integers from 1 to N and determine
how many of them are divisible by 3.  Here's the version using lists:

three_list(N, Count) :-
	make_list(N, Numbers),
	remainder_list(Numbers?, Remainders),
	zero_count_list(Remainders?, 0, Count).

make_list(0, []).
make_list(N, [N|Rest]) :- N > 0, M := N-1 |
	make_list(M, Rest).

remainder_list([], []).
remainder_list([N|Ns], [R|Rs]) :- R := N \ 3 |
	remainder_list(Ns?, Rs).

zero_count_list([], C, C).
zero_count_list([R|Rs], C0, C) :- C1 := (3-R)/3 + C0 |
	zero_count_list(Rs?, C1, C).

The make_list, remainder_list, and zero_count list processes run
in parallel, but make_list is sequential because it is counting N down,
zero_count_list is sequential because it is adding the 1s (R=0) and 0s
(R!=0) up in order, and remainder_list is sequential because it is waiting
for make_list.  

For the other representation, I used trees of the form
	{SingleItem}
or	[LeftSon | RightSon]
as this seemed to minimise the space requirements.  I surmise that
a collection represented this way takes about twice the space of a
collection represented as a list.  The code is:

three_tree(N, Count) :-
	make_tree(1, N, Numbers),
	remainder_tree(Numbers?, Remainders),
	zero_count_tree(Remainders?, Count).

make_tree(N, N, {N}).
make_tree(L, U, [LM|NU]) :- L < U, M := (L+U)/2, N := (L+U)/2 + 1 |
	make_tree(L, M, LM),
	make_tree(N, U, NU).

remainder_tree({N}, {R}) :- R := N \ 3 | true.
remainder_tree([N1|N2], [R1|R2]) :-
	remainder_tree(N1?, R1),
	remainder_tree(N2?, R2).

zero_count_tree({R}, C) :- C := (3-R)/3 | true.
zero_count_tree([T1|T2], C) :-
	zero_count_tree(T1?, C1),
	zero_count_tree(T2?, C2),
	plus(C1?, C2?, C).

In this scheme, the depth between the top level call of make_tree/3
and one of its "leaves" is only lg(N) rather than N, e.g. the generation
of 1..N/2 can proceed in parallel with the generation of N/2+1..N.

The next step was to actually measure these things.
Fortunately, we have a(n old) copy of Logix.  I used the "time" and "rcp"
builtins to make the measurements, and compiled the code in mode(failsafe).

	three_list(300,_)
	cputime = 0.860 sec, reductions = 1201, creations = 74
	reductions = 993, cycles = 30, reductions_per_cycle = 33

	three_tree(300,_)
	cputime = 1.500 sec, reductions = 2355, creations = 1266
	reductions = 2186, cycles = 27, reductions_per_cycle = 80

	three_list(1000,_)
	cputime = 2.160 sec, reductions = 3108, creations = 19
	reductions = 3093, cycles = 57, reductions_per_cycle = 54

	three_tree(1000,_)
	cputime = 4.280 sec, reductions = 7101,
	reductions = 7086, cycles = 28, reductions_per_cycle = 253

The tree version runs slower on a uniprocessor.
Oddly enough, the time ratio is about the same as my estimate of the
space ratio for the data structures, and the same ratio is obtained in
plain Prolog.

Since the tree version appears to have more parallelism available,
it would be very interesting to run this code on a multiprocessor Logix.

(By the way, I am only a beginner with FCP,
 and would welcome suggestions for improving my code.)

-- 
quintus!ok@SUN.COM;  The proper study of man is *everything* -- C.S.Lewis

abc@cs.nott.ac.uk (Andy Cheese) (08/05/88)

This is a response to richard o'keefe's message of some time ago, i meant
to follow it up ages ago but have only just got around to it.

the reason lists are so prevalent in FCP, parlog, FGHC etc is that they are
all concurrent logic programming languages ( i do wish people wouldn't use
the term logic language, this has a clear definition i understand from my
mathematics days and doesn't have a lot of relation to what people mean when
they use the term, please use logic programming language). this paradigm more
or less forces you to think in terms of communicating processes and hence
streams, hence lists.

> make_list(0, []).
> make_list(N, [N|Rest]) :- N > 0, M := N-1 |
>         make_list(M, Rest).

i don't understand why you have the unification "M := N-1" in the guard,
if N is an integer this will always succeed, surely the clause should be

make_list(N, [N|Rest) :- N > 0 | 
        M := N-1,
        make_list(M?, Rest).

you want as little computation in the guard as possible, only what is necessary
to decide that this is the clause you want, the same as cut in Prolog, but you
know this anyway.

> remainder_list([], []).
> remainder_list([N|Ns], [R|Rs]) :- R := N \ 3 |
>       remainder_list(Ns?, Rs).

looking at the code for make_list/2, you can see that the second argument of
a call to remainder list will always be an unbound variable. 
remainder_list/2 is constructing its second argument so you want the
ouput unification "R := N \ 3" in the body of the second clause, the fact
that you've got non-overlapping constructor functions for first argument
is good enough to ensure the correct clause is chosen. The second clause
should be

remainder_list([N|Ns], [R|Rs]) :-
      R : = N \ 3,
      remainder_list(Ns?, Rs).

This same argument also holds for the second clause of zero_count_list/3,
put output unification in the body of clauses.

Exactly the same for the tree version, tests in the guard, output unification
in the body. The correct code is (i think because i don't have logix online
at the moment so i can't test it, i have been known to bungle things before).

three_tree(N, Count) :-
	make_tree(1, N, Numbers),  % Numbers guaranteed unbound at call
	remainder_tree(Numbers?, Remainders), % Remainders guaranteed unbound
	zero_count_tree(Remainders?, Count).

make_tree(N, N, {N}).
make_tree(L, U, [LM|NU]) :- L < U |
	M := (L+U)/2, make_tree(L, M?, LM),
        N := (L+U)/2 + 1, make_tree(N?, U, NU).

remainder_tree({N}, {R}) :- R := N \ 3.
remainder_tree([N1|N2], [R1|R2]) :-
	remainder_tree(N1?, R1),
	remainder_tree(N2?, R2).

zero_count_tree({R}, C) :- C := (3-R)/3.
zero_count_tree([T1|T2], C) :-
	zero_count_tree(T1?, C1),
	zero_count_tree(T2?, C2),
        plus(C1?, C2?, C).
    
what we really need is a system which knows about the underlying architecture
and can transform a program to use the most appropriate data structures to
give the right grain of parallelism and which allows me to debug it as if it
were still my original program.
---------------------------------------------------------------------

JANET : abc@uk.ac.nott.cs              Andy Cheese
ARPA  : abc%nott.cs@ucl-cs.arpa        Department of Computer Science,
                                       University of Nottingham,
Functional vs. Logic Programming       University Park,
If You Can't Decide Between Em,        Nottingham.
-- Join Em                             NG7 2RD.
                                       England.
 
                                       Tel. No. : (0602) 484848 ext. 2765

From October 1988 :-

ECRC
European Computer-Industry Research Centre GMBH
(Forschungszentrum)
Arabellastrasse 17
D-8000 Muenchen 81
West Germany

-- 
Andy Cheese

ok@quintus.uucp (Richard A. O'Keefe) (08/10/88)

In article <2939@robin.cs.nott.ac.uk> abc@cs.nott.ac.uk (Andy Cheese) writes:
>This is a response to Richard O'Keefe's message of some time ago, i meant
>to follow it up ages ago but have only just got around to it.

>The reason lists are so prevalent in FCP, Parlog, FGHC etc is that they are
>all concurrent logic programming languages .... This paradigm more
>or less forces you to think in terms of communicating processes and hence
>streams, hence lists.

But as my example showed, "this paradigm" does nothing of the kind.  It is
just as easy to write code using binary trees as it is to write code using
lists (this is hardly a novel observation, I got the idea from a 1985
parallel functional language article).  Not only that, the experiment showed
that writing code using trees *can* increase the available parallelism.

>> make_list(0, []).
>> make_list(N, [N|Rest]) :- N > 0, M := N-1 |
>>         make_list(M, Rest).

>I don't understand why you have the unification "M := N-1" in the guard,

BECAUSE IT WORKS.  I read through the LOGIX manual several times to try
to find out where the arithmetic should be put.  I was having enough
trouble with read-only annotations as it was.  When I tried putting the
arithmetic in the bodies I ran into trouble.  I never did figure out
what was going on (we have an early copy of LOGIX, and the debugger
crashed after about 10 minutes), but putting the arithmetic in the
guards fixed it.  I did say in my message that I was a beginner with
FCP, so having found that putting the arithmetic in the guard gave me
absolutely no trouble, I stuck with it.

That's the real reason.  But there is an excuse.  What I would *really*
like to do in a "logic" programming language is write

	make_list(0, []).
	make_list(s(M), [s(M)|Rest]) :-
		make_list(M, Rest).

So we see that the "N > 0, M := N-1" stuff is _conceptually_ the
pattern match N = s(M), so it is arguable that the pieces belong together
in the guard.

abc@cs.nott.ac.uk (Andy Cheese) (08/14/88)

In article <267@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>In article <2939@robin.cs.nott.ac.uk> abc@cs.nott.ac.uk (Andy Cheese) writes:
>>This is a response to Richard O'Keefe's message of some time ago, i meant
>>to follow it up ages ago but have only just got around to it.
>
>>The reason lists are so prevalent in FCP, Parlog, FGHC etc is that they are
>>all concurrent logic programming languages .... This paradigm more
>>or less forces you to think in terms of communicating processes and hence
>>streams, hence lists.
>
>But as my example showed, "this paradigm" does nothing of the kind.  It is
>just as easy to write code using binary trees as it is to write code using
>lists (this is hardly a novel observation, I got the idea from a 1985
>parallel functional language article).  Not only that, the experiment showed
>that writing code using trees *can* increase the available parallelism.
>
ok, what it should have said was :
This paradigm more or less forces you to think in terms of communicating
processes, and thus the communication model that first springs to mind is
streams, and hence lists.
i didn't say it was daft to use binary trees, in most cases it would be better.

>>> make_list(0, []).
>>> make_list(N, [N|Rest]) :- N > 0, M := N-1 |
>>>         make_list(M, Rest).
>
>>I don't understand why you have the unification "M := N-1" in the guard,
>
>BECAUSE IT WORKS.  I read through the LOGIX manual several times to try
>to find out where the arithmetic should be put.  I was having enough
>trouble with read-only annotations as it was.  When I tried putting the
>arithmetic in the bodies I ran into trouble.  I never did figure out
>what was going on (we have an early copy of LOGIX, and the debugger
>crashed after about 10 minutes), but putting the arithmetic in the
>guards fixed it.  I did say in my message that I was a beginner with
>FCP, so having found that putting the arithmetic in the guard gave me
>absolutely no trouble, I stuck with it.

BECAUSE IT WORKS - not a good reason for doing it but if your logix wasn't
working properly then i suppose you didn't have much choice, 
maybe you had the read-only annotations wrong.
Unnecessary goals in guards increases the time spent in selecting
a clause and maybe could mean more time unnecessarily spent reducing guards
in other clauses for a relation. if N < 0 in the above i still do M := N-1.
-- 
Andy Cheese