kale@m.cs.uiuc.edu (05/05/89)
Cross posted from comp.parallel Regarding Carrierro and Gelernter's "Linda in Context" article in a recent CACM issue: I like various aspects of Linda for various reasons. However, I had trouble swallowing their arguments showing SUPERIORITY of Linda over ALL the rest of approaches. Here, I want to talk about one particular aspect of Linda: the tuple space provides a uniform abstraction: One uses these same tuple-space operations to specify different kinds of sharing of information. Is this good or bad? To set up the contrast, let me briefly introduce Chare Kernel, our own machine independent parallel programming system. At run time, the system consists of (a large number of) processes (called chares) which communicate with each other via messages AND share data in specific ways. Programmers write the chare definitions, and use calls for creating a chare and sending a message to another chare whose instance ID is known to the sending chare. In addition, they specify different types of variables ("pseudo global variables") that are used for specific types of sharing of information between chares. For (the simplest) example, if there is information that is fixed at the initialization time, and many chare instances need access to it, it is declared as a read-only variable. There are currently about 6-7 types of such pseudo-global variable types (**). Among them is a type called dynamic table. Variables of this type have functionality that subsumes the functionality of tuple-spaces. (Take my word for it, for now, for the sake of argument) The questions are: Which one is a better programming model? Which one is more efficient to implement? The chare kernel provides all ways of sharing of information that Linda can provide. However, it requires programmers to specify the type of sharing. It may be argued that tuple-space is conceptually easier because, say, the programmers do not have to think about who the recipient of a message is. However, in most (or at least some?) cases, the programmers ALREADY know that. Linda simply forbids them from stating it! For example, in a divide-and-conquer application, when a child process produces a result, it can simply send it as a message meant for its parent, (whose address it had received in the creation message, say). I think this is CONCEPTUALLY simpler than just depositing the response in the tuple space with some tag for identification. In the situations when the programmers really would find it difficult to think about whom they are communicating with, or when they are creating genuinely globally shared data, sure, they could use the more general primitives. However, it seems pointless to force an abstraction when it does not fit. From the efficiency stand-point, the tuple-space abstraction takes its toll on non-shared memory machines. The compiler has to figure out information that the programmer was willing to provide, and the compiler may not be able to figure it out in all cases. So, compared to "specific sharing" in chare kernel, "generative communication" of Linda is potentially a source of inefficiency, and (more important) is not conceptually better either. **: in the Chare Kernel design. The core language implementation exists on shared memory machines as well as non shared memory ones (ipsc/2). Most of the pseudo-global variables are implemented on shared memory machines, and will be implemented on ipsc/2 soon. The algorithms for all the pseudo-global variables have been designed. From the point of view of the discussion above, all that matters is that such primitives can be efficiently supported (without any compile time analysis).