baldwin@cs.rochester.edu (Douglas Baldwin) (06/02/88)
[ I picked this off comp.lang.misc -Steve ] Having thought a fair bit about parallel programming and why it's hard to do and how it ought to be done, here're my comments (for what they're worth). Coordinating a multiprocessor to do something IS harder than getting a uniprocessor to do it - someone has to indicate how the overall application is broken up into processes, how and when those processes communicate, how they have to be synchronized in order to avoid a whole host of problems (deadlock, starvation, violation of data dependencies in accessing data, etc. etc. etc.). When programmers have to describe all of this explicitly, in addition to just describing the basic function they want computed, it's no wonder that they have a hard time. This suggests that IMPLICIT parallelism (i.e., parallelism detected and managed by a compiler or run-time system) is preferable to explicit parallelism. Conventional imperative languages essentially describe state machines - not necessarily machines with a finite number of states, but still machines whose current actions are dependent on the state they are currently in. This means that programs (i.e., state machines) written in these languages had better sequence through their states in just the right order if they are to produce the results that programmers expect. This dependence on the order of states makes all imperative languages (i.e., the whole Algol family and its close relatives, object-oriented languages) inherently sequential, and any attempt to extract parallelism from these languages fights a losing battle against this sequentiality. I'd even argue that attempts to reason about the correctness of programs written in explicitly parallel imperative languages face the same battle. Declarative languages (functional, dataflow, logic, etc.) do seem better for implicitly parallel programming, although they have problems of their own. The main one is that to varying degrees all of the declarative languages of which I am aware have problems expressing general data parallel computations - the problem is that the cleanest way to describe a computation done on all elements of a data structure is via some sort of recursive traversal of that structure, and the recursion introduces apparent data dependencies that serialize the computation. This problem could probably be fixed by proper language design, and in fact that's what my research is about - things called "constraint languages" that I think avoid a lot of the problems that other languages have in supporting implicitly parallel programming. Just to put in a plug for myself, more details on the above argument can be found in "Why We Can't Program Multiprocessors the Way We're Trying to Do It Now", Technical Report number 224, University of Rochester Computer Science Department, by me. I'll be glad to have copies sent to anyone who asks. One final comment on Mark's "would loops help GHC?" question. I suspect not. I haven't used GHC myself, although I've dabbled in Prolog and followed some of the literature on parallel logic languages. To use any language you really need to think in its idioms and paradigms, and if GHC is anything like other logic languages the right paradigms are recursion, backtracking, simultaneous search for all solutions to a goal, etc., not explicit loops. Maybe as you get more used to GHC you'll stop wishing for loops so much.