larsa@nada.kth.se (Lars Andersson) (12/02/89)
(sorry if this appears twice, but the first posting didn't seem to make it) Dear NET! I want to develop (or rather: I want some one to give me :-) ) a few SIMPLE tools to use for building "compute servers" running on a set of machines connected in a network. I'm talking about coarse grained parallelism here but I guess this is relevant also for shared memory and hypercube type machines. I'm sure there are a number of ways to approach this problem and even a number of solutions around, but faced with the rather bewildering fauna of "parallel operating systems", models for parallel computations etc. (LINDA, ISIS, Cosmic Environment, MACH (?) ...) one becomes reluctant to invest work in any one of these, without knowing which will have the broadest base in the near future. There is a large class of problems that can require a huge number of similar, SCALAR DOMINATED, calculations be carried out, with perhaps a periodic gathering and analysis of the solutions. Hence for the parallel subproblems, vectorization is not interesting, nor is communication speed between the processors critical. I believe this is called trivial parallelism. In this situation, one doesn't really care what machine the thing runs on as long as one is able to access as many cheap CPU cycles as possible, such as the joint resources of the local net. What I have in mind is therefore solutions which allow one to run the subproblems on machines with different architecture, not necessarily with a common NFS server. Consider the following: One "master" puts "tasks" in a batch, there to be picked up by the "slaves" as each completes its current task. When a slave completes a task it puts the solution in an appropriate batch, there to be picked up and stored by the master (depending on the problem, one might want to write directly to a file ...). Within my limited knowledge, the only system that has something like this "built in" is ISIS (the NEWS service). However, it's not clear to me that this is correct or suited for this kind of application, or if it's the only or best (most portable) solution. Any comments on the above would be appreciated. In particular, examples or code fragments pertaining to this situation would be gratefully recieved. Where lies the future? Yours, Lars Andersson larsa@math.kth.se disclaimer: I'm a mathematician, not a computer scientist.
ken@gvax.cs.cornell.edu (Ken Birman) (12/08/89)
In article <2455@draken.nada.kth.se> larsa@math.kth.se (Lars Andersson) writes: > >I want to develop a few SIMPLE tools to use for building "compute servers"... >...coarse grained parallelism ... shared memory and hypercube machines.... >... faced with the rather bewildering fauna of "parallel operating systems", > models for parallel computations etc. (LINDA, ISIS, Cosmic Environment, MACH) >one becomes reluctant to invest work in any one of these, without knowing >which will have the broadest base in the near future.... > >Consider the following: One "master" puts "tasks" in a batch, there to be >picked up by the "slaves" as each completes its current task. When a slave >completes a task it puts the solution in an appropriate batch, there to be >picked up and stored by the master (depending on the problem, one might want >to write directly to a file ...). > >Any comments on the above would be appreciated... Where lies the future? > This is quite an interesting posting, and seems touch on several different issues at once: 1) Distinguishing between ISIS/MACH/etc 1') Predicting where future standards might live 3) Understanding the technology needed to build this class of compute servers 1. Categorizing systems, predicting future A lot of people have asked us about ISIS in this context; examples include some signal processing groups like the one at SAIC (maybe they could post something on their system?), the various financial trading developers (we hear from them, but mostly on technical issues lately), NASA, and now your group. As I see it, there are several issues here. 1.1 Kernel technologies This is the question of "which kernel should I build on?". (ISIS tends to live at a higher level than this). Example kernels that I hear about include SUN/ATT, OSF/MACH, AIX, OS/2, CHORUS, VMS, TROLLIUS, and maybe COSMIC. If the issues are purely commercial, I think the choice narrows to one of the interface the kernel presents and then the implementation that works best. Personally, I think that MACH is a better system than the SUN/ATT system at the kernel level, but that many aspects of the MACH interface are less mature than for SUN/ATT. If I had to bet on the future, I guess I would estimate based on current market share.... if I had a chance to impose my preferences on people, I would probably weakly favor MACH or CHORUS. A little down the road, SUN/ATT will certainly offer a competitive new system as well, though. VMS has a huge market that won't go away soon, but locks you into one vendor. Although VMS is in some ways a better system than any of the UNIX systems, I would probably think twice before putting a major new software development on that kernel -- if nothing else, than because DEC's philosophy seems unclear at this stage. OS/2 is a crummy system for distributed computing, but has quite a hefty market share. It would probably help if SUN/ATT, IBM and OSF could agree on the kernel interface to UNIX; but because these companies are in a very shortsighted mood lately, they all seem to prefer bleeding to death slowly to agreeing on anything at all. What this all adds up to is that if you need to do something new, you should probably base it on MACH... but if expect to ship a commercial version, better figure out how you'll port it to SUN/ATT UNIX later. 1.2 Computing abstractions I don't see the various UNIX systems as offering fundamentally different computing abstractions. In the above list, I do a few abstractions: UNIX -- basically, the abstraction here is that the systems almost all give you lightweight threads and ways for heavy processes to share memory. For parallelism, they expect you to design conventional looking code based on this concept. The idea makes a lot of sense, actually, but doesn't give you fault-tolerance or any particuarly elegant synchronization mechanism LINDA -- I think of this as the "relational database" of parallelism techniques. As with a relational database, you get an update, a kind of query, and a way to compile arbitrary data structures into a really awful, unnatural tuple representation. The effect is that Linda seems super if you want to do just what the original model intended, and seems innappropriate for other things. The model intended that processes be NUMA machines (fast local memory, slow access to remote memory) or message passing system, that fault-tolerance is NOT the issue, that the set of processes not change dynamically, and that the application be based on interactions through a shared "bag" of tuples. Presumably, you all know the papers, so you have seen plenty of examples where this model fits well. The Linda people have written papers arguing that Linda can be made fault-tolerant, used on other types of machines, can deal with arbitrary data structures, etc. I personally am unconvinced by this argument -- that is, if I had those kinds of problems, Linda isn't what I would chose first. TROLLIUS/COSMIC -- These are kernels for the hypercube machines. They tend to be fairly specialized to that kind of parallelism (many machines, neighbors are faster to talk to, file system is remote, etc). I like them both, and both have nice UNIX interfaces. I hear that Trollius is more specialized in development/debugging and that Cosmic has more packaged numerical routines and so forth -- but I am no expert on this. Neither system seems oriented towards a "compute server" abstraction at all, and neither seems to do much for fault-tolerance. ISIS, CAMELOT, etc -- These are examples of higher level software packages that live over UNIX or MACH and provide higher level features. ISIS, in particular, focuses on what servers need "inside" to be consistent and dynamically reconfigurable after failures; CAMELOT or ARGUS are more oriented towards "persistent" objects and data, such as files that many programs update concurrently. That is, ISIS focuses on the in-core, on-line aspect of a server, and the transactional systems focus on the external, shared data side of things. This leads to an argument that ISIS is much more appropriate for a class of applications in which dynamic reconfiguration or fault-tolerance matters and where processes are loosely coupled and don't communicate a great deal by comparison to the amount of computation they do for each message. The problm, of course, is that ISIS is pretty heavy weight compared to a lightweight threads package. Communication is slow; even the faster stuff we are adding in ISIS V2.0 is nowhere near the costs of spin-locks and RPC in a multi-threaded shared address space with real parallelism. So, for our model of LAN-type systems, ISIS is probably the best choice but for, say, a butterfly or something ISIS might be nice to have around but LINDA or COSMIC or TROLLIUS make much more sense -- and MACH threads make the most sense of all. 1.3 Standards? Hard to say what the standards will be in the stuff I do. I would love to see ISIS "standardized", but not before we use it long enough to be fully satisfied with the system model. Frankly, I would like to see a few other ISIS-like systems from elsewhere (DEC CRL, say, or Arizona's Psync group) to let us compare interfaces, performance, etc. Then, it will make sense to talk standards. But, the reality is that ISIS exists and you can have it; those systems are all years off and might never be publically useable. This might turn ISIS into a defacto standard. For now, at any rate, there seems to be no alternative to ISIS that could be proposed as a standard for the sorts of things ISIS does. Similarly, unless the vast majority of the world starts arguing that Linda is the solution, I can't conceive of a meaningful standard in that arena. As for threads, POSIX threads seem to be the closest thing to an accepted standard so far. The DEC implementation is said to be pretty good. I like Cthreads, but thats because the ISIS threads model matches well with Cthreads. SUN LWP is a real pain in many ways; my guess is that I am one of the few users of the package. Both Cthreads and SUN LWP are slow compared to ISIS "native" threads, and this worries me too. And, they are big: 65k of code gets hauled in when you link an ISIS client to SUN LWP -- my native stuff was about 1k of code. 2) Technology needed to build servers With that cleared up :->, let me comment on this technology issue. To me, the issue here is whether the service is static or dynamic; the remaining problems partition accordingly. Static servers are cases where the processes involved are known at the time the server is spawned and don't change over time. E.g., you code a cobegin or fork/join for each column of a matrix. For this, I suggest you go with threads; if processes have disjoint address spaces, Linda might make more sense. In a single application that doesn't need to be fault-tolerant, this is probably the model to go with. Dynamic problems involve variable numbers of server processes, processes that fail or join when the application is up, etc. A solution to this depends on a strong consistency model. If you can't tell me how to understand "correct" behavior, there won't be any way to solve problems in this class. I tend to live in the world that wishes servers of this sort were state machines -- an idea proposed by Lamport and developed by him and Fred Schneider over a long period. I don't have a reference handy. ISIS implements this sort of model through its virtual synchrony mechanisms. We strongly believe that the cost of our scheme is tied to providing properties needed in ANY solution to this class of problems, and that once one is in this model, ISIS is the fastest and cleanest available structuring for it. (You quorum people can argue otherwise, but the bottom line is that ISIS operations are asynchronous and yours are synchronous, and as we speed our stuff up more and more, we're gonna beat the pants off you...) 3. The old man of the mountain speaks... So, whats the answer? I'm afraid there isn't a single answer yet, nor will be a single answer any time soon. For loosely coupled non-shared memory machines where a server stays up a long time and application programs come and go, ISIS is very nice. Ditto for applications where you build a "control" program that monitors some other subsystem and now and then steps in to tell it what to do, restart a failed piece, or whatever. ISIS has its drawbacks too, though, and one would hope the standards stuff could hold off until we have a version of ISIS that makes everyone happy. E.g., very fast multicast (ISIS V2.0), tolerates LAN partitions (next summer), hierarchical process groups (spring?) For the closely coupled shared memory case, I guess that I would go with Mach using lightweight threads directly, or with Linda. By and large these both assume a "static" model. I don't know of anything for "dynamic" servers in this model, and this probably explains why many of the parallel machine types have been interested in ISIS -- on the face of it, a poor match, but perhaps the best solution around anyhow... Hope this helps. Ken
ken@gvax.cs.cornell.edu (Ken Birman) (12/09/89)
This is a followup on my posting responding to larsa@nada.kth.se (Lars Andersson) >Consider the following: One "master" puts "tasks" in a batch, there to be >picked up by the "slaves" as each completes its current task. When a slave >completes a task it puts the solution in an appropriate batch, there to be >picked up and stored by the master (depending on the problem, one might want >to write directly to a file ...). >Within my limited knowledge, the only system that has something like this >"built in" is ISIS (the NEWS service). However, it's not clear to me that this >is correct or suited for this kind of application, or if it's the only or best >(most portable) solution. Any comments on the above would be appreciated. My prior message ("where lies the future") didn't respond to this more narrow question. Using a process group, we would normally implement this Linda-style of system directly. In our ISIS manual, discussion of this technique can be found in Chapter 8. Essentially, we recommend that one cbcast both requests and slave solution messages to a shared group; if you want to go further and actually make sure that everyone knows who is doing what, a slight embellishment suffices. The latter, with code example, is included in the chapter on replicated data in the recent textbook that Adison Wesley's ACM Press produced from the Arctic 88/Fingerlakes 89 short course. The book is called "Distributed Computing" and is now available; the chapter is also available from us in TR form. The basic idea is the same: multicast the request and multicast what the slaves do. Dave George of Cornell's graphic's group has a neat variation on this in a real application that does scene rendering; he has a single process that farms out work to do and gets a fancier and more efficient, but not fault-tolerant, solution. Unfortunately, he doesn't get this news group can can only be contacted by email (dwg@graphics. cornell.edu) The NEWS service implements this same mechanism and can be used if you prefer its higher level interface. The advantage of NEWS is that it even works if the processes come and go without all being up at the same time. The disadvantage is that this imposes some overhead because the requests get flushed to a disk file that NEWS uses to maintain its persistent state. Let me know if you are still unclear on this and I will be happy to post something more detailed... Ken (TR version of that chapter is called "Exploiting replication" and is available on request from croft@cs.cornell.edu. It assumes that you know something about CBCAST and ABCAST)