[comp.parallel] Looking for a message passing package to run on the Sun

baden@lbl-csam.arpa (Scott Baden [CSR/Math]) (09/13/89)

I'd like to tie together our local bevvy of suns into a multicomputer
for solving scientific problems. I need simple facilties to
fork off processes and to handle message passing.
Are there any software packages out there in netland to do this?


	Scott B. Baden

	Lawrence Berkeley Laboratory
	Berkeley, California
	baden@csam.lbl.gov
	...!ucbvax!csam.lbl.gov!baden

okrieg@eecg.toronto.edu (Orran Y. Krieger) (09/16/89)

   I'd like to tie together our local bevvy of suns into a multicomputer
   for solving scientific problems. I need simple facilties to
   fork off processes and to handle message passing.
   Are there any software packages out there in netland to do this?


	   Scott B. Baden

	   Lawrence Berkeley Laboratory
	   Berkeley, California
	   baden@csam.lbl.gov
	   ...!ucbvax!csam.lbl.gov!baden

Since you specified message passing this is probably not of
interest to you. However, I developed for my MASc an
implementation of distributed shared data on a network of Sun's. 
A brief abstract from a talk I am giving is included below (in
latex format). The code is free and (hopefully) mainly debugged. 

\centerline{\Large{\bf Distributed Shared Data:}}
\centerline{\Large{\bf An Alternative Approach to Interprocess Communication}}
\centerline{\Large{\bf for Distributed Systems}}
\vspace*{.1 in}
\centerline{Orran Krieger and Michael Stumm}
\centerline{University of Toronto}
\vspace*{.1 in}

In this talk we will describe algorithms that implement the {\em
shared data model} of interprocess communication for distributed
systems.

Traditionally, communication among processes in a distributed
system is based on the {\em data passing} model, which includes
message passing mechanisms.  Although the data passing model is a
logical extension of the underlying communication hardware, the
{\em shared data} model has a number of advantages, which have
made it the focus of recent study\footnote{For example, work by
Li, Bisiani and Forin, Cheriton, and Gelernter.}. The shared data
model provides a simpler abstraction to the application
programmer. Processes use shared data in the same way they use
normal local memory; that is, shared data is accessed through
{\tt read} and {\tt write} operations.  Since the access protocol
is consistent with the way sequential applications access data,
there is a more natural transition from sequential to distributed
applications.  Another advantage of the shared data model is that
it hides the remote communication mechanism from the processes,
again substantially simplifying the programming of distributed
applications.  We have found the size of programs\footnote{as
measured by lines of source code} using the shared data model
typically to be half the size of equivalent programs that use the
data passing model.  With a shared data facility, complex
structures can be passed by reference. In contrast, with the data
passing model complex data structures that are passed between
processes must be explicitly packed and unpacked.

In this talk we describe how the shared data model can be
realized and then focus our attention on a new algorithm we have
developed called TORiS. The most original feature of TORiS is
that it makes {\em optimistic} assumptions in order to reduce the
amount of communication required between processes.

All of the algorithms we describe, including TORiS, have been
implemented on a network of SUN workstations. We compare
performance results of the TORiS implementation with other
implementations of the shared data model.

childers@decwrl.dec.com (Richard Childers) (09/18/89)

baden@lbl-csam.arpa (Scott Baden [CSR/Math]) writes:

>I'd like to tie together our local bevvy of suns into a multicomputer
>for solving scientific problems. I need simple facilties to
>fork off processes and to handle message passing.
>Are there any software packages out there in netland to do this?

I recall reading about a remote execution daemon that had been developed
experimentally a few years ago ... at UC Berkeley.

I recall reading about it in the USENIX journal, oh, about three years ago.

I'll bet a few hours skimming through old USENIX journals might benefit
you. As I recall, the daemon, when given a job, polled the network to see
what machines weren't busy and passed the job out to appropriate servers,
also running this daemon. When they were done, it was returned.

I think it didn't fly because the overhead was too high, networking and on
the machines that ran the daemon, to make it cost-effective in the environment
it was conceived in. But it would still seem like the ideal core for a loose
approach to multiprocessing, provided you had a front end that was capable
of dividing the task up into independent pieces that could be processed w/o
dependencies ....

>	Scott B. Baden
>
>	Lawrence Berkeley Laboratory
>	Berkeley, California
>	baden@csam.lbl.gov
>	...!ucbvax!csam.lbl.gov!baden

-- richard

-- 
 *                                                                            *
 *          Intelligence : the ability to create order out of chaos.          *
 *                                                                            *
 *      ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho     *

kyw@cs.purdue.edu (Ko-Yang Wang) (09/18/89)

In article <6489@hubcap.clemson.edu> avsd!childers@decwrl.dec.com (Richard Childers) writes:
>baden@lbl-csam.arpa (Scott Baden [CSR/Math]) writes:
>
>>I'd like to tie together our local bevvy of suns into a multicomputer
>>for solving scientific problems. I need simple facilties to
>>fork off processes and to handle message passing.
>>Are there any software packages out there in netland to do this?
You may want to look at ISIS from Cornell.
ISIS provides some useful functions that allow you to implement distributed
and fault-tolerant systems relatively easy.
It contains utlities for lightweight tasks, message passing, broadcast,
synchronizations, etc.
They have a beta version (V1.3.1) for an array of different architectures
including Suns(SunOS), Vaxs(4.3Bsd), Gould, HP Spectirum 300(HPUX), PC/RT(AIX),
MACH, Next, APOLLO.
Better yet, this package is free and is actively supported by the ISIS group.
Contact:
	ISIS Project (Attn: M. Schimizzi)
	Department of Computer Science
	Upson Hall
	Cornell University
	Ithaca, New York 14853}
	Phone: 607-255-9198
	E-mail: schiz@cs.cornell.edu
	Contact person: Margaret Schimizzi

>I recall reading about a remote execution daemon that had been developed
>experimentally a few years ago ... at UC Berkeley.
>
>I recall reading about it in the USENIX journal, oh, about three years ago.
>
>I'll bet a few hours skimming through old USENIX journals might benefit
>you. As I recall, the daemon, when given a job, polled the network to see
>what machines weren't busy and passed the job out to appropriate servers,
>also running this daemon. When they were done, it was returned.
It is not very difficult to implement (call back) rpc to do this.

>
>I think it didn't fly because the overhead was too high, networking and on
>the machines that ran the daemon, to make it cost-effective in the environment
>it was conceived in. But it would still seem like the ideal core for a loose
This depends on your application.
I designed a project for a graduate course of computer networks in which
the students split a very long inner-product into a set of independent
vector operations and run them on a set of Suns. If the vector is long enough
the overhead is tolerable. This applies to matrix multiply and the gangs also.
However, for more general problems, the data dependences make spliting the
program into tasks very difficult.
>approach to multiprocessing, provided you had a front end that was capable
>of dividing the task up into independent pieces that could be processed w/o
>dependencies ....
Well, this front end is much, much more difficult to build than the
distributed daemons. In general, the technology for
generating code (such as compilers) for distributed machines is still at
its infancy.
The last requirement (w/o dependences) is too strong for most applications.
You should be willing to handle few dependences with message passing.
I don't think there is any tools out there that can handle this yet.
>
>>	Scott B. Baden
>>
>>	Lawrence Berkeley Laboratory
>>	Berkeley, California
>>	baden@csam.lbl.gov
>>	...!ucbvax!csam.lbl.gov!baden
>
>-- richard
>
> *      ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho     *

Ko-Yang
-------------------------------------------------------------------------------
Ko-Yang Wang                    | kyw@cs.purdue.edu    |  -----  ------ ___`___
Department of Computer Sciences | Parallel/Distributed |  __|__   [] |    ---
Purdue University               | Processing, CAPO     |    |        |    ---
West Lafayette, IN 47907        | (317) 494-0813       |`-------,   \|   [===]
-------------------------------------------------------------------------------

brian@june.cs.washington.edu (Brian Bershad) (09/19/89)

avsd!childers@decwrl.dec.com (Richard Childers) writes:

>I recall reading about a remote execution daemon that had been developed
>experimentally a few years ago ... at UC Berkeley.

>I recall reading about it in the USENIX journal, oh, about three years ago.

>I'll bet a few hours skimming through old USENIX journals might benefit
>you. As I recall, the daemon, when given a job, polled the network to see
>what machines weren't busy and passed the job out to appropriate servers,
>also running this daemon. When they were done, it was returned.

>I think it didn't fly because the overhead was too high, networking and on
>the machines that ran the daemon, to make it cost-effective in the environment
>it was conceived in. But it would still seem like the ideal core for a loose
>approach to multiprocessing, provided you had a front end that was capable
>of dividing the task up into independent pieces that could be processed w/o
>dependencies ....

The system was called Maitrd and it 'flew' quite well.  I think it is still
in use on a cluster of 750's and a 785 at Berkeley for balancing compiles
and text formatting.

The majority of the overhead came from not having any kind of network file
system.  File preprocessing was done locally and shipped to whichever server
was currently available.  For C programs, this was easy.  It was a bit hairier
for pascal code, which had intermodule type checking (with files being treated
as types).

There was almost no overhead to running the daemons.  Servers decided whether
they were willing to accept or not, and that information was only broadcast
when it changed.  Clients round-robined the jobs off to the available servers
only when their own load was above threshold.  

The biggest drawback was the effort required to build a new remote service
which had to act as a shell for the job on the remote end.    But, since
there were only a few tasks that had to be load-balanced,  the one-time
investment was worth it.  

RPC would not have made this any easier.


A paper showed up in a Usenix newsletter in early 1986: 
	"Load Balancing With Maitrd"

People have been doing this kind of stuff for years.

    Brian

jevans@uunet.UU.NET (David Jevans) (09/25/89)

The JIPC system from the University of Calgary is an inter-sun
ipc system with remote creation etc.  one of the features of
jipc is its transparent support of non-homogenous networks
(ie data type conversion if required).  It is simple to
get distributed programs running under jipc, and a network
traffic monitor/debugger is available.

A number of sites use jipc on small to large networks of
suns and vaxen, including several universities and
the Naval Research Lab in DC.  Contact the University of Calgary
dept of comp sci.

-dj



David Jevans
University of Calgary, Department of Computer Science
Calgary, Alberta, Canada T2N 1N4