[comp.sys.isis] network machines as compute servers

larsa@nada.kth.se (Lars Andersson) (12/02/89)

(sorry if this appears twice, but the first posting didn't seem to make it)
Dear NET!

I want to develop (or rather: I want some one to give me :-) )
a few SIMPLE tools to use for building "compute servers"
running on a set of machines connected in a network. I'm talking about
coarse grained parallelism here but I guess this is relevant also for shared
memory and hypercube type machines. I'm sure there are a number
of ways to approach this problem and even a number of solutions around, but 
faced with the rather bewildering fauna of "parallel operating systems", models
for parallel computations etc. (LINDA, ISIS, Cosmic Environment, MACH (?) ...)
one becomes reluctant to invest work in any one of these, without knowing 
which will have the broadest base in the near future.

There is a large class of problems that can require a huge number of similar,
SCALAR DOMINATED, calculations be carried out, with perhaps a periodic 
gathering and analysis of the solutions. Hence for the parallel subproblems,
vectorization is not interesting, nor is communication speed between the
processors critical. I believe this is called trivial parallelism.

In this situation, one doesn't really care what machine the thing runs on as
long as one is able to access as many cheap CPU cycles as possible, such as the
joint resources of the local net.

What I have in mind is therefore solutions which allow one to run the
subproblems on machines with different architecture, not necessarily 
with a common NFS server. 

Consider the following: One "master" puts  "tasks" in a batch, there to be
picked up by the "slaves" as each completes its current task. When a slave
completes a task it puts the solution in an appropriate batch, there to be
picked up and stored by the master (depending on the problem, one might want
to write directly to a file ...). 

Within my limited knowledge, the only system that has something like this 
"built in" is ISIS (the NEWS service). However, it's not clear to me that this 
is correct or suited for this kind of application, or if it's the only or best 
(most portable) solution. Any comments on the above would be appreciated. 
In particular, examples or code fragments pertaining to this situation would 
be gratefully recieved. Where lies the future?

Yours,

Lars Andersson

larsa@math.kth.se

disclaimer: I'm a mathematician, not a computer scientist.

ken@gvax.cs.cornell.edu (Ken Birman) (12/08/89)

In article <2455@draken.nada.kth.se> larsa@math.kth.se (Lars Andersson) writes:
>
>I want to develop a few SIMPLE tools to use for building "compute servers"...
>...coarse grained parallelism ... shared memory and hypercube machines....
>... faced with the rather bewildering fauna of "parallel operating systems",
> models for parallel computations etc. (LINDA, ISIS, Cosmic Environment, MACH)
>one becomes reluctant to invest work in any one of these, without knowing 
>which will have the broadest base in the near future....
>
>Consider the following: One "master" puts  "tasks" in a batch, there to be
>picked up by the "slaves" as each completes its current task. When a slave
>completes a task it puts the solution in an appropriate batch, there to be
>picked up and stored by the master (depending on the problem, one might want
>to write directly to a file ...). 
>
>Any comments on the above would be appreciated...  Where lies the future?
>

This is quite an interesting posting, and seems touch on several different
issues at once:
1) Distinguishing between ISIS/MACH/etc
1') Predicting where future standards might live
3) Understanding the technology needed to build this class of compute servers

1.  Categorizing systems, predicting future
A lot of people have asked us about ISIS in this context; examples
include some signal processing groups like the one at SAIC (maybe they
could post something on their system?), the various financial trading
developers (we hear from them, but mostly on technical issues lately),
NASA, and now your group.

As I see it, there are several issues here.

1.1 Kernel technologies

This is the question of "which kernel should I build on?".  (ISIS tends to
live at a higher level than this).  Example kernels that I hear about
include SUN/ATT, OSF/MACH, AIX, OS/2, CHORUS, VMS, TROLLIUS, and maybe COSMIC.

If the issues are purely commercial, I think the choice narrows to one of
the interface the kernel presents and then the implementation that works best.
Personally, I think that MACH is a better system than the SUN/ATT system
at the kernel level, but that many aspects of the MACH interface are less
mature than for SUN/ATT.  If I had to bet on the future, I guess I would
estimate based on current market share.... if I had a chance to impose
my preferences on people, I would probably weakly favor MACH or CHORUS.
A little down the road, SUN/ATT will certainly offer a competitive new
system as well, though.

VMS has a huge market that won't go away soon, but locks you into one vendor.
Although VMS is in some ways a better system than any of the UNIX systems,
I would probably think twice before putting a major new software development
on that kernel -- if nothing else, than because DEC's philosophy seems
unclear at this stage.

OS/2 is a crummy system for distributed computing, but has quite a hefty
market share.  It would probably help if SUN/ATT, IBM and OSF could agree on
the kernel interface to UNIX; but because these companies are in a very
shortsighted mood lately, they all seem to prefer bleeding to death slowly 
to agreeing on anything at all.  

What this all adds up to is that if you need to do something new, you should
probably base it on MACH... but if expect to ship a commercial version,
better figure out how you'll port it to SUN/ATT UNIX later.

1.2 Computing abstractions

I don't see the various UNIX systems as offering fundamentally different
computing abstractions.  In the above list, I do a few abstractions:

UNIX -- basically, the abstraction here is that the systems almost all
  give you lightweight threads and ways for heavy processes to share memory.
  For parallelism, they expect you to design conventional looking code based
  on this concept.  The idea makes a lot of sense, actually, but doesn't
  give you fault-tolerance or any particuarly elegant synchronization mechanism

LINDA -- I think of this as the "relational database" of parallelism techniques.
  As with a relational database, you get an update, a kind of query, and a way
  to compile arbitrary data structures into a really awful, unnatural tuple
  representation.  The effect is that Linda seems super if you want to do just
  what the original model intended, and seems innappropriate for other things.

  The model intended that processes be NUMA machines (fast local memory, slow
  access to remote memory) or message passing system, that fault-tolerance is
  NOT the issue, that the set of processes not change dynamically, and that the
  application be based on interactions through a shared "bag" of tuples.
  Presumably, you all know the papers, so you have seen plenty of examples
  where this model fits well.  The Linda people have written papers arguing
  that Linda can be made fault-tolerant, used on other types of machines,
  can deal with arbitrary data structures, etc.  I personally am unconvinced
  by this argument -- that is, if I had those kinds of problems, Linda isn't
  what I would chose first.

TROLLIUS/COSMIC -- These are kernels for the hypercube machines.  They
  tend to be fairly specialized to that kind of parallelism (many machines,
  neighbors are faster to talk to, file system is remote, etc).  I like them
  both, and both have nice UNIX interfaces.  I hear that Trollius is more
  specialized in development/debugging and that Cosmic has more packaged
  numerical routines and so forth -- but I am no expert on this.

  Neither system seems oriented towards a "compute server" abstraction at all,
  and neither seems to do much for fault-tolerance.

ISIS, CAMELOT, etc -- These are examples of higher level software packages
  that live over UNIX or MACH and provide higher level features.  ISIS,
  in particular, focuses on what servers need "inside" to be consistent
  and dynamically reconfigurable after failures; CAMELOT or ARGUS are 
  more oriented towards "persistent" objects and data, such as files that
  many programs update concurrently.

  That is, ISIS focuses on the in-core, on-line aspect of a server, and
  the transactional systems focus on the external, shared data side of things.

  This leads to an argument that ISIS is much more appropriate for a class
  of applications in which dynamic reconfiguration or fault-tolerance matters
  and where processes are loosely coupled and don't communicate a great deal
  by comparison to the amount of computation they do for each message.

  The problm, of course, is that ISIS is pretty heavy weight compared to
  a lightweight threads package.  Communication is slow; even the faster
  stuff we are adding in ISIS V2.0 is nowhere near the costs of spin-locks
  and RPC in a multi-threaded shared address space with real parallelism.
  So, for our model of LAN-type systems, ISIS is probably the best choice
  but for, say, a butterfly or something ISIS might be nice to have around
  but LINDA or COSMIC or TROLLIUS make much more sense -- and MACH threads
  make the most sense of all.

1.3 Standards?

Hard to say what the standards will be in the stuff I do.  I would love
to see ISIS "standardized", but not before we use it long enough to be
fully satisfied with the system model.  Frankly, I would like to see a
few other ISIS-like systems from elsewhere (DEC CRL, say, or Arizona's
Psync group) to let us compare interfaces, performance, etc.  Then, it
will make sense to talk standards.

But, the reality is that ISIS exists and you can have it; those systems
are all years off and might never be publically useable.  This might turn
ISIS into a defacto standard.  For now, at any rate, there seems to be
no alternative to ISIS that could be proposed as a standard for the
sorts of things ISIS does.

Similarly, unless the vast majority of the world starts arguing that
Linda is the solution, I can't conceive of a meaningful standard in that
arena.

As for threads, POSIX threads seem to be the closest thing to an accepted
standard so far.  The DEC implementation is said to be pretty good.  I
like Cthreads, but thats because the ISIS threads model matches well with
Cthreads.  SUN LWP is a real pain in many ways; my guess is that I am
one of the few users of the package.  Both Cthreads and SUN LWP are slow
compared to ISIS "native" threads, and this worries me too.  And, they
are big: 65k of code gets hauled in when you link an ISIS client to
SUN LWP -- my native stuff was about 1k of code.

2) Technology needed to build servers

With that cleared up :->, let me comment on this technology issue.

To me, the issue here is whether the service is static or dynamic; the
remaining problems partition accordingly.

Static servers are cases where the processes involved are known at the
time the server is spawned and don't change over time.  E.g., you 
code a cobegin or fork/join for each column of a matrix.  For this,
I suggest you go with threads; if processes have disjoint address
spaces, Linda might make more sense.  In a single application that
doesn't need to be fault-tolerant, this is probably the model to go with.

Dynamic problems involve variable numbers of server processes, processes
that fail or join when the application is up, etc.  A solution to this
depends on a strong consistency model.  If you can't tell me how to understand
"correct" behavior, there won't be any way to solve problems in this class.

I tend to live in the world that wishes servers of this sort were
state machines -- an idea proposed by Lamport and developed by him
and Fred Schneider over a long period.  I don't have a reference handy.

ISIS implements this sort of model through its virtual synchrony mechanisms.
We strongly believe that the cost of our scheme is tied to providing
properties needed in ANY solution to this class of problems, and that
once one is in this model, ISIS is the fastest and cleanest available
structuring for it.  (You quorum people can argue otherwise, but
the bottom line is that ISIS operations are asynchronous and yours are
synchronous, and as we speed our stuff up more and more, we're gonna
beat the pants off you...)

3.  The old man of the mountain speaks...

So, whats the answer?  I'm afraid there isn't a single answer yet,
nor will be a single answer any time soon.  For loosely coupled
non-shared memory machines where a server stays up a long time
and application programs come and go, ISIS is very nice.  Ditto for
applications where you build a "control" program that monitors some
other subsystem and now and then steps in to tell it what to do, restart
a failed piece, or whatever.

ISIS has its drawbacks too, though, and one would hope the standards
stuff could hold off until we have a version of ISIS that makes everyone
happy.  E.g., very fast multicast (ISIS V2.0), tolerates LAN partitions
(next summer), hierarchical process groups (spring?)

For the closely coupled shared memory case, I guess that I would go
with Mach using lightweight threads directly, or with Linda.  By and
large these both assume a "static" model.  I don't know of anything
for "dynamic" servers in this model, and this probably explains why
many of the parallel machine types have been interested in ISIS -- on
the face of it, a poor match, but perhaps the best solution around
anyhow...

Hope this helps.

Ken

ken@gvax.cs.cornell.edu (Ken Birman) (12/09/89)

This is a followup on my posting responding to larsa@nada.kth.se (Lars Andersson)

>Consider the following: One "master" puts  "tasks" in a batch, there to be
>picked up by the "slaves" as each completes its current task. When a slave
>completes a task it puts the solution in an appropriate batch, there to be
>picked up and stored by the master (depending on the problem, one might want
>to write directly to a file ...). 

>Within my limited knowledge, the only system that has something like this 
>"built in" is ISIS (the NEWS service). However, it's not clear to me that this 
>is correct or suited for this kind of application, or if it's the only or best 
>(most portable) solution. Any comments on the above would be appreciated. 

My prior message ("where lies the future") didn't respond to this
more narrow question.

Using a process group, we would normally implement this Linda-style of
system directly.  In our ISIS manual, discussion of this technique can
be found in Chapter 8.  Essentially, we recommend that one cbcast both
requests and slave solution messages to a shared group; if you want to
go further and actually make sure that everyone knows who is doing what,
a slight embellishment suffices.  The latter, with code example, is 
included in the chapter on replicated data in the recent textbook that
Adison Wesley's ACM Press produced from the Arctic 88/Fingerlakes 89
short course.  The book is called "Distributed Computing" and is now
available; the chapter is also available from us in TR form.

The basic idea is the same: multicast the request and multicast what 
the slaves do.  Dave George of Cornell's graphic's group has a neat
variation on this in a real application that does scene rendering; he
has a single process that farms out work to do and gets a fancier and
more efficient, but not fault-tolerant, solution.  Unfortunately, he
doesn't get this news group can can only be contacted by email (dwg@graphics.
cornell.edu)

The NEWS service implements this same mechanism and can be used if you
prefer its higher level interface.  The advantage of NEWS is that it 
even works if the processes come and go without all being up at the
same time.  The disadvantage is that this imposes some overhead because
the requests get flushed to a disk file that NEWS uses to maintain its
persistent state.

Let me know if you are still unclear on this and I will be happy to
post something more detailed...

Ken

(TR version of that chapter is called "Exploiting replication" and is
available on request from croft@cs.cornell.edu.  It assumes that you
know something about CBCAST and ABCAST)