[comp.parallel] Linda: specific sharing vs. generative communication

kale@kale.cs.uiuc.edu (L. V. Kale') (05/05/89)

Regarding  Carrierro and Gelernter's "Linda in Context" article in a
recent CACM issue:

I like various aspects of Linda for various reasons.
However, I had trouble swallowing their arguments showing SUPERIORITY
of Linda over ALL the rest of approaches.

Here, I want to talk about one particular aspect of Linda:
the tuple space provides a uniform abstraction:
One uses these same tuple-space operations to specify different kinds
of sharing of information. Is this good or bad?

To set up the contrast, let me briefly introduce Chare Kernel, our own
machine independent parallel programming system.
At run time, the system consists of (a large number of) processes
(called chares) which communicate with each other via messages AND
share data in specific ways.
Programmers write the chare definitions, and use calls for
creating a chare and sending a message to another chare whose instance ID
is known to the sending chare.
In addition, they specify different types of variables ("pseudo global variables")
that are used for specific types of sharing of information between chares.
For (the simplest) example,
if there is information that is fixed at the initialization time, and
many chare instances need access to it, it is declared as a read-only variable.
There are currently about 6-7 types of such psueo-global variable
types (**).
Among them is a type called dynamic table.
Variables of this type have functionality that subsumes the
functionality of tuple-spaces.
(Take my word for it, for now, for the sake of argument)

The questions are: Which one is a better programming model? Which one is
more efficient to implement? The chare kernel provides
all ways of sharing of information that Linda can provide.
However, it requires programmers to specify the type of sharing.

It may be argued that tuple-space is conceptually easier because, say, the
programmers do not have to think about who the recepient of a message is.
However, in most (or at least some?) cases, the programmers ALREADY know that.
Linda simply forbids them from stating it!
For example, in a divide-and-conquer application, when a child
process produces a result, it can simply send it as a message meant
for its parent, (whose address it had received in the creation message, say).

I think this is CONCEPTUALLY simpler than just depositing the response
in the tuple space with some tag for identification.
In the situations when the programmers really would find it difficult
to think about whom they are communicating with, or when they are
creating genuinely globally shared data, sure, they could use the more
general primitives.
However, it seems pointless to force an abstraction when it does not fit.

>From the efficiency stand-point, the tuple-space abstraction 
takes its toll on non-shared memory machines. The compiler has to
figure out information that the programmer was willing to provide,
and the compiler may not be able to figure it out in all cases.

So, compared to "specific sharing" in chare kernel, "generative
communication" of Linda is potentially a source of inefficieny,
and (more important) is not conceptually better either.

**: in the Chare Kernel design. The core language implementation
exists on shared memory machines as well as non shared memory ones (ipsc/2).
Most of the pseudo-global variables are implemented on shared memory
machines, and will be implemented on ipsc/2 soon.
The algorithms for all the pseudo-global variables have been designed.
>From the point of view of the discussion above, all that matters is
that such primitives can be efficiently supported (without any compile
time analysis).

braner@tcgould.tn.cornell.edu (Moshe Braner) (05/08/89)

[responding to the claim that Linda enforces excessive abstraction]

With Chares: parent process tells child it's ID, child is programmed
send return result to that destination (the parent).

With Linda: parent process looks for result with a specific tag, e.g.,
a specific string in the first element of the tuple.  Child is programmed
send return result in a tuple with that tag.

I don't see how Linda is restrictive there.  In fact, it is less restrictive,
but easily allows what is needed here.  The tag, suitably chosen, can make
the program more readable.  The fact that the tuple is put into a common
space allows a debugger to examine it before the parent gets it.  Etc.

As for efficiency, only actual benchmarks can tell.  All the approaches
for global data are easier to implement on shared-memory machines, of
course.  It's tough with distributed memory, and will always be less
efficient than straight message-passing.  But the goal is (for some of
us, anyway) an 80%-efficient application created and maintained with
one-third the human work as compared with the 90%-efficient optimized
implementation...

- Moshe

Exclaimer:  Wow!