[comp.parallel] Embedding Linda and other joys of concurrent logic programming

udi%wisdom.weizmann.ac.il@CUNYVM.CUNY.EDU (Udi Shapiro) (04/15/89)

In the recent Communications of the ACM (April 1989, pp. 444-458)
Carriero and Gelernter compare Linda to "the Big Three" families of
parallel languages: concurrent object oriented languages, concurrent
logic languages, and functional languages.

While I accept many of their statements concerning concurrent object
oriented languages and functional languages, I find the comparison
with concurrent logic languages misguided.

The dining philosophers
-----------------------

Carriero and Gelernter did not have to set up a straw-man for the
purpose of comparison  --- Ringwood in the January 1988 issue of
Communications has done a fine job for them.  With its 70 lines of
(incomplete) code, six procedures and four illustrative figures,
Ringwood's Parlog86 program for solving the dining philosophers
problem is indeed an easy turf.  This embarrassingly cumbersome
program is not an appropriate  basis for comparison with concurrent
logic languages.  The reason is that a dining philosopher can be
specified by a much simpler concurrent logic program:

phil(Id,[eating(LeftId,done)|Left],Right) :-    % Left is eating, wait till
        phil(Id,Left,Right).                    % he is done.
phil(Id,Left,[eating(RightId,done)|Right]) :-   % Right is eating  wait till
        phil(Id,Left,Right).                    % he is done.
phil(Id,[eating(Id,Done)|Left]^,[eating(Id,Done)|Right]^) :-
                                                % Atomically grab both forks
        ... eat, when done, unify Done=done ...,
        phil(Id,Left,Right).

The program is independent of the number of philosophers dining. A
dinner of n philosophers can be specified by the goal:

phil(1,Fork1,Fork2), phil(2,Fork2,Fork3),... , phil(n,Forkn,Fork1).

whose execution results in each of the Fork variables being incrementally
instantiated to a stream of terms eating(Id,done), with the Id's on
each Fork reflecting the order in which its two adjacent philosophers
use it.

The program is written in the language FCP(^) (a variant of Saraswat's
FCP(\downarrow)).  The unannotated head arguments are matched against
the corresponding terms in the goal atom, like in Parlog86 and GHC.  In
contrast, the ^-annotated terms are unified with the corresponding
terms, as in Prolog. A goal atom can be reduced with a clause if both
the input matching and the unification succeed.  The reduction of
a goal with a clause is an atomic action: it either suspends or fails
without leaving a trace of the attempt, or it succeeds with some
unifying substitution, which is applied atomically to the variables it
shares with other processes.

The key to the simplicity of the program is indeed the ability of
FCP(^) to specify atomic test unification:  the philosopher atomically
tries to grab both forks, excluding other philosophers from grabbing
them. The mutual exclusion is obtained  by unifying the head of the
Fork stream with a term containing the unique Id of the philosopher.
(Atomic test unification is incorporated in all the languages in the
FCP family, but not in Parlog86 or GHC.  Programs exploting atomic test
unification cannot be easily written in the latter two languages.)

The deadlock-freedom of the program is guaranteed by the language
semantics; specifically, if two or more processes compete for unifying
the same variable(s), one succeeds and the other will see the
variable(s) instantiated  (deadlock-free atomic unification can be
implemented using a  variant of the two-phase locking protocol; see
Taylor et al., Chapter 38 in the Concurrent Prolog book).  However,
like the Parlog86 and the Linda programs, this program is not
starvation-free. It can be further enhanced to achieve this as well.

The need for assigning unique identifiers to philosophers can be
eliminated if the language is strengthened with read-only variables or
with a test-and-set primitive.  We consider the second extension: in
FCP(^,!), an !-annotated term in the clause head denotes an output
assignment of the term to a variable in the corresponding position in
the goal. The assignment fails if the corresponding position is a
non-variable term. It can be mixed (but not nested) with
^-annotations, and all assignments and unifications should be done
atomically. In FCP(^,!), an anonymous dining philosopher can be
specified by:


phil([eating(done)|Left],Right) :-      % Left is eating, wait till
        phil(Left,Right).               % he is done.
phil(Left,[eating(done)|Right]) :-      % Right is eating  wait till
        phil(Left,Right).               % he is done.
phil([eating(Done)|Left]!,[eating(Done)|Right]!) :-
                                        % Atomically grab both forks
        ... eat, when done, unify Done=done ...,
        phil(Left,Right).

A dinner of n anonymous philosophers:

phil(Fork1,Fork2), phil(Fork2,Fork3),... , phil(Forkn,Fork1).

is of course not as interesting.  Each Fork variable is bound to a
stream of eating(done).


Embedding Linda
---------------

Sure the picture of Linda vs. concurrent logic languages looks different
after rehabilitating the dining philosophers.  But is this the whole story?

Carriero and Gelernter like to think of Linda as very different from
all the three models they explore.  I don't think this is so, and
would like to claim that Linda incorporates fundamental aspects of the
concurrent logic programming model, although perhaps in a degenerated
form. One aspect is the matching that takes place in an 'in' and 'rd'
operation: it is a degenerate form of unification.  Another aspect is
shared data structures. Both in Linda and in concurrent logic programs
processes communicate by creating a shared data structure.  A stream
of messages between two concurrent logic processes is just a shared
data structure constructed incrementally from binary tuples. However,
a shared structure in Linda cannot contain the equivalent of logical
variables (although the proposed extension to Linda of nesting Tuple
Spaces is a step in this direction); the ability to construct and
communicate structures with logical variables (incomplete terms,
incomplete messages) is the source of the most powerful logic
programming techniques.  Another difference is that in a concurrent
logic language a shared structure can only be increased monotonically
(by further instantiating variables in it), and can only be reclaimed
by garbage collection when unreferenced. In Linda, tuples can be
explicitly deleted from the shared Tuple Space.

The similarity between the cooperative incremental construction of
logical terms and Linda Tuple Space operations is perhaps best
illustrated by showing a concurrent logic program implementation of
Linda's primitives.  For simplicity, the concurrent logic program
represents the Tuple Space as an incomplete list of terms of the form
tuple(OutId,InId,T). (The same ideas can be applied with a more
efficient data structure, such as a search tree or a hash table.) The
operations are defined, of course, so that several  processes can
perform them concurrently on the same Tuple Space.  Each 'in' and
'out' operation must be given a unique identifier for the purpose of
mutual exclusion.  An 'out' operation with tuple T and identifier OutId
inserts the term  tuple(OutId,_,T) at the tail of the Tuple Space list.

% out(Id,T,Ts) :-  Tuple T inserted to Tuple Space Ts with an 'out'
%               operation with identifier Id.

out(Id,T,[tuple(Id,_,T)|Ts]^).          % Found tail of Ts, insert T.
out(Id,T,[_|Ts]) :-                     % Slot taken,
        out(Id,T,Ts).                   % keep looking.

'out' can be enhacned to return the new tail of the Ts stream. A
subsequent 'out' can be applied to this tail, thus avoiding searching
the entire Ts for each 'out'.

An 'in' operation with tuple template T and identifier InId finds a term
tuple(_,InId',T') in the Tuple Space such that T' unifies with T and InId'
unifies with InId.

% in(Id,T,Ts) :- Tuple T in Tuple Space Ts deleted by an 'in' operation with
%               identifier Id.

in(Id,T,[tuple(_,Id^,T^)|Ts]).          % Found unifiable tuple, delete it.
in(Id,T,[tuple(_,Id',T')|Ts]) :-
        (Id,T)=\=(Id',T') |             % Tuple deleted or doesn't unify,
        in(Id,T,Ts).                    % keep looking.

A third argument, which returns a Tuple Space in which deleted tuples are
removed can be added as an optimization.

The precise interrelation between 'in' and 'rd' are not specified in
the Linda papers.  Assuming it is defined so that a tuple deleted from
the tuple space by 'in' should not be visible to any subsequent 'rd',
the corresponding concurrent logic program is:

rd(T,[tuple(_,InId,T^)|Ts]) :-          % Found tuple
        var(InId) | true.               % that has not been deleted yet.
rd(T,[tuple(_,InId,T')|Ts]) :-
        (foo,T)=\=(IdIn,T') |           % Tuple deleted or doesn't unify,
        rd(T,Ts).                       % keep looking.


The program assumes that 'foo' is not used as an identifier.  It uses
the 'var' guard construct to check that InId is still a  variable,
i.e. the tuple has not been deleted by a concurrent 'in' process. If
weaker synchronization is allowed between 'in' and 'rd', then the
corresponding logic program may use weaker synchronization as well.

These programs can be executed 'as is' under the Logix system (a
concurrent logic programming environment developed at the Weizmann
Institute).  A Logix run of the following goal, consisting of several
concurrent operations on an initially empty Tuple Space Ts:

out(1,hi(there),Ts), in(2,hi(X),Ts), rd(hi(Y),Ts), in(3,hi(Z),Ts),
        out(4,hi(ho),Ts)

results in the following substitution:

X = there
Z = ho
Ts = [tuple(1, 2, hi(there)), tuple(4, 3, hi(ho)) | _]

indicating that hi(there) was inserted to Ts with operation 1, deleted
with 2 and hi(ho) was inserted with 4 and deleted with 3.  In this
particular run, in(3,hi(Z),Ts) reached the second tuple before
rd(hi(Y),Z), so the 'rd' process remains suspended.

If we execute an additional 'out' operation:

out(5,hi(bye),Ts)

then both 'out' and the suspended 'rd' terminate, resulting in the
substitution:

Y = bye
Ts = [tuple(1, 2, hi(there)), tuple(4, 3, hi(ho)), tuple(5, _,hi(bye)) | _]

Note that the last tuple inserted has not been deleted yet, so its
InId is still uninstantiated.

Using test-and-set, 'in' and 'out' can be implemented even more simply
in FCP(^,!), without using unique identifiers.  A Tuple Space is
represented by an incomplete list of terms tuple(T,Gone), where Gone
is instantiated to 'gone' if T was deleted.

out(T,[tuple(T,Gone)|Ts]!).             % Found tail of Ts, insert T
out(Id,T,[_|Ts]) :-                     % Slot taken
        out(Id,T,Ts).                   % keep looking

in(T,[tuple(T^,gone!)|Ts]).             % Found unifiable tuple, delete it
in(T,[tuple(T',Gone)|Ts]) :-
        (T,foo)=\=(Id,Gone) |           % Tuple deleted or doesn't unify
        in(T,Ts).                       % keep looking

Supporting 'eval' in a language that already supports dynamic
process creation seems redundant, so we do not address it.

Embeddings of concurrent object oriented languages and of functional
languages in concurrent logic languages have been reported elsewhere.
The trivial embeddings of Linda in FCP(^) and in FCP(^,!) are a further
evidence to the generality and versatility of concurrent logic
programming.


The DNA example
---------------

Another example in Carriero and Gelernter's paper is the DNA sequence
analyzer program.  They show a functional program and a Linda program
for solving it, and argue that the functional program is
only slightly preferable on readability and elegance grounds, and that
the Linda program is better in giving explicit control of parallelism
to the programmer, and in not relying on a sophisticated compiler to
decide on its parallel execution.

We show below a concurrent logic program solving this problem that, we
believe, is uniformly better than the other two.  It is shorter,
clearer (to us at least) and specifies a more efficient algorithm than
the other two.

The program specifies a "soft-systolic" algorithm, along the lines of
the verbal description of the problem by Carriero and Gelernter (See
Chapter 7 in the Concurrent Prolog book for the concept of systolic
programs, and Chapter 8 for an analysis of their complexity). An array
process network is spawned, one 'cell' process for each pair (i,j), with
near-neighbours connections.  Each 'cell' process receives the
right H and P values from its right neighbour and the bottom H and Q
values, as well as the diagonal (bottom right) H value from the bottom
neighbour, computes the local values of the functions p, q, and h, and
sends them to its top and right neighbours.  The top neighbour also
receives the H value from the right.

% h(Xv,Yv,Hm) :- Hm is the similarity matrix of vectors Xv and Yv.

h(Xv,Yv,Hm) :-
        matrix(Xv,Yv,Hm,Ts).

matrix([A|Xv],Yv,[Hv|Hm]^,Ts) :-
        row(A,Yv,Hv,Left,Ts,Bs),
        matrix(Xv,Yv,Hm,Bs).
matrix([],Yv,[]^,Bs) :-
        initialize_bottom(Bs).

row(X,[Y|Yv],[H|Hv]^,Left,[Top|Ts]^,[Bottom|Bs]^) :-
        cell(X,Y,H,Left,Right,Top,Bottom),
        row(X,Yv,Hv,Right,Ts,Bs).
row(X,[],[]^,(0,0)^,[]^,[]^).

initialize_bottom([(0,0,0)^|Ts]) :-
        initialize_bottom(Ts).
initialize_bottom([]).

cell(X,Y,H^,(H,P)^,(Hr,Pr),(Hr,H,Q)^,(Hbr,Hb,Qb)) :-
        compare(X,Y,D),
        P := max(Hr-1.333,Pr-0.333),
        Q := max(Hb-1.333,Qb-0.333),
        H := max(Hbr+D,max(P,max(Q,0))).

compare(X,X,1^).
compare(X,Y,-0.3333^) :- X=\=Y | true.


The program is more efficient than the other two in two respects.
First, it avoids recomputation of the auxiliary functions p and q
(although some future Crystal compiler may automatically transform the
functional program into a more intricate functional program that
avoids the recomputation).   Second, it specifies a regular process
structure with local communication only.  This structure can be
mapped --- manually, using Turtle programs, or automatically --- to a
fine grain concurrent multicomputer in a way that near processes
reside in near processors.  Mapping efficiently the other two
programs, which are not specified by process networks and have
non-local communication, is more difficult.

Conclusions
-----------

Carriero and Gelernter conclude that "the tools in concurrent logic
languages are too policy-laden and inflexible to serve as a good basis
for most parallel programs".  Knowing that the policy of concurrent
logic languages results in the "more elegant alternative" (i.e. Linda)
being just 6 clauses away, I wonder if Carriero and Gelernter still
stand by this statement.

The major question is what price one has to pay for the generality of
concurrent logic languages. Implementations of concurrent logic
languages on sequential computers show that, although they can't
compete with conventional languages like C, they are certainly usable,
and score well when compared with efficient Prolog implementations.
(This is in spite of the large number of processes created during a
computation. In the short Logix session required to compile, test and
debug the programs shown in this note, 145,611 processes were created
and 6 minutes and 29 seconds of Sun/3 CPU were consumed). Parallel
implementations show good speedups but yet low speeds.  A fair
estimate is that more than 30,000 lines of FCP code were developed
using the Logix system, including the Logix system itself. So
concurrent logic languages are certainly for real. However, the time
in which the loss in performance in using concurrent logic languages
is small enough to be easily compensated by the gain they offer
in expressiveness has yet to come.

carriero@YALE.EDU (Nicholas Carriero) (04/17/89)

We've gotten a few email messages about the CACM article.  I suspect
we will being getting a few more.  So expect a response to Udi's
interesting comments in a forthcoming note that will attempt to
address his and others' remarks.  We will probably send it sometime
towards the end of next week.

Udi, I wouldn't want to keep you in suspense:

  Carriero and Gelernter conclude that "the tools in concurrent logic
  languages are too policy-laden and inflexible to serve as a good basis
  for most parallel programs". ... I wonder if Carriero and Gelernter still
  stand by this statement.

Yes, foursquare.

Sometime ago someone posted a request for a bibliography.  I sent
something (probably to the wrong address) which hasn't appeared, so
I've appended it below.

-Nick Carriero


In response to a request for a bibliography on Linda, I've cut and
pasted from four sources to produce this somewhat strange collection
of citations.  Sorry about the variety of formats, but I haven't time
to clean this up.  While I believe I have eliminated duplicate
citations, I have not tried to rationalize the citation labels used,
so there might be some duplicates or inconsistencies in those.

Two refs with incomplete data:

1) U Tokyo paper in OOPSLA 88 on Objects and Tuple Space.

2) Lothar Borrmann and Martin Herdieckerhoff, Parallel Processing
   Performance in a Linda System.  Siemens AG, Munich, Germany

Three new Yale TRs:
1) The Implementation and Performance of Hypercube Linda
 Robert Bjornson, Nicholas Carriero and David Gelernter
 March, 1989

2) Multiple tuple spaces in Linda
 David Gelernter
 March, 1989

3) Information Management in Linda
 David Gelernter
 April, 1989

Now for TeX-like citations:

\bibitem{BoHeKl88} Borrmann, Herdieckerhoff, Klien,
{\it Implementation of the Linda Concept on a Hierarchical
Multiprocessor}\/, Conpar Proceedings, 1988.

% To appear in CACM:
\bibitem{CaGele88} Carriero and Gelernter,
{\it Linda in Context}\/, Yale, April 1988

% To appear in Computing Surveys
\bibitem{CaGe88} Carriero and Gelernter,
{\it How to write parallel Programs}\/, Yale, May 1988.

\bibitem{Gel88} David Gelernter,
{\it Parallelism to the People}\/, BYTE, Jan. 1989.

\bibitem{KrAhCaGe88} Krishnaswarmy, Ahuja, Carriero, Gelernter,
{\it The Architecture of a Linda Coprocessor}\/, Proceeding Int.
Architecture conference, 1988.

\bibitem{Xu88} Xu, {\it A fault-tolerant network kernel for Linda}\/,
MIT/LCS/TR-424, MIT Lab. for Comp. Sci., Aug. 1988.


% BIBTEX file for the Linda publications.
%
% Entries are alphabetical with respect to the first author's
% last name.  An author's works are ordered chronologically
% with the most recent article first.  The key for an entry
% is the author's last name concatenated with the last two
% digits of the publication year.  A lower case character is
% added if an author has more than one publication in a single year.
%
% DJ Berndt

@article{Ahuja88,
author = {Ahuja, Sudhir and Carriero, Nicholas J. and
Gelernter, David H. and Krishnaswamy, Venkatesh},
title = {Matching Language and Hardware for Parallel Computation
in the {L}inda Machine},
journal = {IEEE Transactions on Computers},
volume = {37},
number = {8},
pages = {921--929},
month = {August},
year = {1988}
}

@article{Ahuja86a,
author = {Ahuja, Sudhir and Carriero, Nicholas J. and Gelernter, David H.},
title = {{L}inda and Friends},
journal = {IEEE Computer},
volume = {19},
number = {8},
pages = {26--34},
month = {August},
year = {1986},
annote = {A good presentation of Linda stressing its support
for an uncoupled programming style.  MA}
}

@inproceedings{Ahuja86b,
author = {Ahuja, Sudhir and Carriero, Nicholas J. and Gelernter, David H.},
title = {Progress Towards a {L}inda Machine},
booktitle = {Proceedings of the International Conference on
Computer Design},
month = {October},
year = {1986},
pages = {97--101}
}

@manual{Berndt89,
author = {Berndt, Donald J.},
title = {{C}-{L}inda Reference Manual},
organization = {Scientific Computing Associates},
month = {January},
year = {1989}
}

@techreport{Bjornson88,
author = {Bjornson, Robert and Carriero, Nicholas J. and
Gelernter, David H. and Leichter, Jerrold S.},
title = {{L}inda, the Portable Parallel},
institution = {Yale University Department of Computer Science},
month = {January},
year = {1988},
type = {Research Report},
number = {520}
}

@inproceedings{Carriero88,
author = {Carriero, Nicholas J. and Gelernter, David H.},
title = {Applications Experience with {L}inda},
booktitle = {Proceedings of the ACM Symposium on
Parallel Programming},
month = {July},
year = {1988}
}

@phdthesis{Carriero87,
author = {Carriero, Nicholas J.},
title = {Implementing Tuple Space Machines},
school = {Yale University},
address = {New Haven, Connecticut},
year = {1987},
note = {Department of Computer Science}
}

@inproceedings{Carriero86a,
author = {Carriero, Nicholas J. and Gelernter, David H. and
Leichter, Jerrold S.},
title = {Distributed Data Structures in {L}inda},
booktitle = {Proceedings of the ACM Symposium on
Principles of Programming Languages},
month = {January},
year = {1986},
address = {St. Petersburg, Florida}
}

@article{Carriero86b,
author = {Carriero, Nicholas J. and Gelernter, David H.},
title = {The {S}/{N}et's {L}inda Kernel},
journal = {ACM Transactions on Computer Systems},
pages = {110--129},
month = {May},
year = {1986}
}

@inproceedings{Gelernter87,
author = {Gelernter, David H. and Jagganathan, S. and London, T.},
title = {Environments as First Class Objects},
booktitle = {Proceedings of the ACM Symposium on
Principles of Programming Languages},
month = {January},
year = {1987}
}

@article{Gelernter86,
author = {Gelernter, David H.},
title = {Domesticating Parallelism},
journal = {IEEE Computer},
volume = {19},
number = {8},
month = {August},
year = {1986},
pages = {12--16},
note = {Guest editor's introduction.}
}

@article{Gelernter85a,
author = {Gelernter, David H.},
title = {Generative Communication in {L}inda},
journal = {ACM Transactions on Programming Languages and Systems},
volume = {7},
number = {1},
month = {January},
year = {1985},
pages = {80--112}
}

@inproceedings{Gelernter85b,
author = {Gelernter, David H. and Carriero, Nicholas J. and Chandran, S. and
Chang, S.},
title = {Parallel Programming in {L}inda},
booktitle = {Proceedings of the International Conference on
Parallel Processing},
month = {August},
year = {1985}
}

@phdthesis{Gelernter82,
author = {Gelernter, David H.},
title = {An Integrated Microcomputer Network for Experiments in
Distributed Processing},
school = {State University of New York at Stony Brook},
address = {Stony Brook, New York},
year = {1982},
note = {Department of Computer Science}
}

@inproceedings{Lucco86,
author = {Lucco, S.},
title = {A Heuristic {L}inda Kernel for Hypercube Multiprocessors},
booktitle = {Proceedings of SIAM Conference on
Hypercube Multiprocessors},
month = {September},
year = {1986}
}

@inproceedings{Whiteside88,
author = {Whiteside, Robert A. and Leichter, Jerrold S.},
title = {Using {L}inda for Supercomputing on a Local Area Network},
booktitle = {Proceedings of Supercomputing 88},
year = {1988}
}