[mod.os] Why no "real" distributed systems?

darrell@sdcsvax.UUCP (02/10/87)

I see that there hasn't been any traffic in this newsgroup for  a
while.   I  hope  it's  the  snow  back  east,  and not a lack of
interest!

Let  me  pose  a  question:  Why  haven't  we  seen  any   "real"
distributed  systems?   I know of many academic projects, but few
of them are widely used.  Many never reach fruition.  The closest
thing  in  wide  use  is  4.3BSD,  but  that  is  certainly NOT a
distributed system -- it's a conventional operating  system  with
network  facilities.   On  the  industry  side  we  have the VAX-
Cluster, but that is  more  akin  to  a  multi-processor  than  a
distributed system.

What, exactly, IS a distributed system?  How does it differ  from
a  system such as, say, 4.3BSD?  Will any of the current academic
projects see wide acceptance, like MACH or the V-System?

Your opinions, views and comments are most welcome!

DL

darrell@sdcsvax.UUCP (02/10/87)

Okay, I'll bite.

In the early 1970s, when I started hacking across the ARPAnet, the idea
of a dataset (I was in an IBM environment, we did not call them files)
existing on a machine and you not caring where that DS was a really neat
idea.  The thought that a running program might migrate for load
balancing or fault-tolerant reasons was also there.  Ten years later,  I
find Dick Watson, working first at SRI and now LLNL, still working on
these ideas.

The problem of distributed computing is not well defined.  DL mentions
multiprocessors but does not attempt to define a boundary.  Another
problem is that putting a lot of heterogeneous stuff together is hard.

We have a tendency to solve easy problems and leave the hard stuff as
"exercises for the reader ;-)."  To put together really heterogeneous
systems (and I don't mean VAXen and PDP-11s, but Crays, Suns, Britton-Lee
boxes, Evans and Sutherland CT6 systems, etc.) is
1) expensive, 2) hard, 3) lacking in pay-off, appreciation, etc.  Why
work on it when you can make a buck selling PC sofware (sorry, too
cynical).  Look at Ethernet: has an incredible number of detractors, but
every one is using it.  Developing a new network system from hardware to
protocols will probably be as difficult as developing new operating
systems or programming languages.  I wonder how long Boggs/Metcalfe's
Ethernet will be around?  Modification of a joke I wrote for Usenix 1986
Winter:

	What type of networking technology will be around for the 22th
	Century?
	I don't know, but it will be called Ethernet. . . .

I would hope things like Proteon would catch on, but Network Systems
and Xerox technology are out there.  They exist.  People want what
exists.  It's easy to be a success and get support, and this is only the
lowest (hardware) level.  I'm glad NCP is basically gone, but it shocks
me how entrenched DECnet has become.  I also wonder about the
bureaucratic minds behind DECnet (not in DEC, but like in NASA's
communications schemes at a shall remain nameless Center).
The idea that some young scientist (future adminstrator)'s concepts about
the use of computing are being formed now with FORTRAN, DECnet, and
other things sends shivers up my spine.  I can see him decreeing
something solely because it was in "his experience" not because he
studied the issues.

James Martin in his infinite wisdom (sic) noted somewhere that there
needs to be at least two more layers over the ISO model:  I think they
were Accounting, and Administration.  Real network people gawked.  "Real"
world people noded "Yes."

Oh, back to Dick.  He has a neat diagram of the problems:

                            Communications
                                 /  \
                                /    \
                               /      \
                              /        \
             Operating Systems -------- Programming Languages

These three communities don't talk to one another.  He harps that the OS
people are too infatuated with Unix (with relatively poor network support,
brewed, too early on too small a machine), the PL people are infatuated
with Ada, and the networking people, well, they have not been around as
either community.  But is communications all there is to distributed
systems?  I would hope not.  But so long as we get bogged down in things
like character set mapping, byte and bit order, word size, and
instruction set, reliability, we won't see the light at the end of
the tunnel.

Note: Standards alone, won't help. (as a Standard's person)
More experimental work is needed in all areas.

Another antecdote:
There are two ways to a mistake in building a computer communiations net:
	The first mistake is to build it such that it looks like a
	computer: this is SNA.  It's when computer people try to build a
	communications network without really understanding
	communications.
	The second mistake is to build it such that it looks like a
	phone system: this is X.25.  It's when phone people try to build a
	communications network without really understanding computers.

Marty Fouts, John Mashey, others, want to add any thing?

>From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

Ethernet is a trademark of Xerox Corp.
DECnet is a trademark of Digital Equip. Corp.
Ada is a trademark of the US DOD AJPO.
and Star Wars is a trademark of Lucasfilm, Ltd.

darrell@sdcsvax.UUCP (02/13/87)

In article <2693@sdcsvax.UCSD.EDU> darrell@sdcsvax.uucp (Darrell Long) writes:
>Let  me  pose  a  question:  Why  haven't  we  seen  any   "real"
>distributed  systems?   I know of many academic projects, but few
>of them are widely used.  Many never reach fruition.  The closest
>thing  in  wide  use  is  4.3BSD,  but  that  is  certainly NOT a
>distributed system -- it's a conventional operating  system  with
>network  facilities.   On  the  industry  side  we  have the VAX-
>Cluster, but that is  more  akin  to  a  multi-processor  than  a
>distributed system.

Tolerant Systems makes a fault-tolerant, distributed system.  It is
composed of tightly coupled, multiple processor nodes called SBBs (2
main processors with other processors to manage I/O channels and
communications functions).  These SBBs are in turn loosely coupled to
each other in configurations ranging up to 40 in a *single* system (a
single system image contained within a global name space).  Users and
processes "see" *one* system.

The operating system, called TX, is derived from Unix.  Internally it
is vastly different, however the system calls and utilities remain
compatible.  Many systems are in use at customer sites around the
world, with the largest single system installed at a customer site so
far being composed of 33 SBBs managing about 100 GB of disk.

Of course, these systems may be (and sometimes are) networked together
with each other as well as with other systems in general via ethernet,
TCP/IP, FTP, telnet, etc. and with BSD systems in particular via
rlogin, rexec, rwho, rsh, and friends over the basic networking
facilities.

[I'd like to see some papers about this.  Got any? -DL]
-- 
UUCP:	...ihnp4!akgua!rebel!george
	...{hplabs,seismo}!gatech!rebel!george
Phone:	(404) 662-1533
Snail:	Tolerant Systems, 6961 Peachtree Industrial, Norcross, GA  30071

-- 
Darrell Long
Department of Computer Science & Engineering, UC San Diego, La Jolla CA 92109
ARPA: Darrell@Beowulf.UCSD.EDU  UUCP: darrell@sdcsvax.uucp
Operating Systems submissions to: mod-os@sdcsvax.uucp

darrell@sdcsvax.UUCP (02/13/87)

In article <2693@sdcsvax.UCSD.EDU> darrell@sdcsvax.uucp (Darrell Long) writes:
>Let  me  pose  a  question:  Why  haven't  we  seen  any   "real"
>distributed  systems?   I know of many academic projects, but few
>of them are widely used.  Many never reach fruition.  The closest
>thing  in  wide  use  is  4.3BSD,  but  that  is  certainly NOT a
>distributed system -- it's a conventional operating  system  with
>network  facilities.   On  the  industry  side  we  have the VAX-
>Cluster, but that is  more  akin  to  a  multi-processor  than  a
>distributed system.

Tolerant Systems makes a fault-tolerant, distributed system.  It is
composed of tightly coupled, multiple processor nodes called SBBs (2
main processors with other processors to manage I/O channels and
communications functions).  These SBBs are in turn loosely coupled to
each other in configurations ranging up to 40 in a *single* system (a
single system image contained within a global name space).  Users and
processes "see" *one* system.

The operating system, called TX, is derived from Unix.  Internally it
is vastly different, however the system calls and utilities remain
compatible.  Many systems are in use at customer sites around the
world, with the largest single system installed at a customer site so
far being composed of 33 SBBs managing about 100 GB of disk.

Of course, these systems may be (and sometimes are) networked together
with each other as well as with other systems in general via ethernet,
TCP/IP, FTP, telnet, etc. and with BSD systems in particular via
rlogin, rexec, rwho, rsh, and friends over the basic networking
facilities.

UUCP:	...ihnp4!akgua!rebel!george
	...{hplabs,seismo}!gatech!rebel!george
Phone:	(404) 662-1533
Snail:	Tolerant Systems, 6961 Peachtree Industrial, Norcross, GA  30071

-- 
Darrell Long
Department of Computer Science & Engineers, UC San Diego, La Jolla CA 92109
ARPA: Darrell@Beowulf.UCSD.EDU  UUCP: darrell@sdcsvax.uucp
Operating Systems submissions to: mod-os@sdcsvax.uucp

darrell@sdcsvax.UUCP (02/14/87)

What about Apollo's OS (don't know its name)?  What about Stanford's
V system?  I've never used either one, but from what I've read they
both seem to be truly distributed.  One could also argue that the VAX/VMS
cluster facility also represents a distributed OS - it provides a truly
transparently distributed file system.

[ I would argue that calling Apollo's Domain a "distributed system" ]
[ would  be close to calling 4.3BSD a "distributed system."  I must ]
[ confess that I do not have an Apollo on which  to  base  this  -- ]
[ it's just folk-lore.                                              ]
[                                                                   ]
[ As for the V-System, I would call that  a  "distributed  system," ]
[ but it is an extremely experimental system from what I can gather ]
[ from the papers.  From what I have heard from folks who have used ]
[ it, there is little or no protection for data.                    ]
[                                                                   ]
[ In summary, when I asked "Why are  there  no  `real'  distributed ]
[ systems?",  I  was  asking why aren't any commercially available? ]
[ It is interesting that this discussion has evolved into the  very ]
[ appropriate question: "What is a distributed system anyhow?"      ]
[                                                                   ]
[ --DL  darrell@beowulf.ucsd.edu                                    ]

-- 
Larry Campbell                                The Boston Software Works, Inc.
Internet: campbell@maynard.uucp             120 Fulton Street, Boston MA 02109
uucp: {alliant,wjh12}!maynard!campbell              +1 617 367 6846
ARPA: campbell%maynard.uucp@harvisr.harvard.edu      MCI: LCAMPBELL

darrell@sdcsvax.UUCP (02/17/87)

Apollo's native operating system (AEGIS) supports a truly distributed
file server, which is completely transparent to the user.

It does not do any automatic load-balancing. Manually, you can spawn
processes on any processor in the network.

darrell@sdcsvax.UUCP (02/18/87)

In article <2693@sdcsvax.UCSD.EDU> darrell@sdcsvax.uucp (Darrell Long) writes:

>What, exactly, IS a distributed system?  How does it differ  from
>a  system such as, say, 4.3BSD?  Will any of the current academic
>projects see wide acceptance, like MACH or the V-System?

A good description of distributed operating systems can be found in:
Tanenbaum, A. S. and van Renesse R. 1985.  Distributed Operating Systems.
ACM Computing Surveys 17, 4 (Dec.) 419-470.

Different people have different definitions for what defines a distributed 
system.  The primary criteria for establishing a distributed system are:
    1) transparency of file access / naming
    2) transparency of process execution
    3) transparency of protection / system accounting

The key concept is the transparency of operations across the machines that
consitute the distributed system.  

Bsd4.3 is not a distributed system.  It does not meet any of the criteria
we note above.  
    1) file access is not transparent.  In order to access a remote file a 
        special command must be executed in order to transfer the file to 
        the local host (e.g., rcp, ftp).

    2) Process execution is not transparent.  Special commands must be executed
        in order to get a process running on a remote host (e.g., rsh, rlogin).
        Execution of these commands further requires the use to be aware of
        which host they desire to access.

    3) System accounting is not transparent.  Each machine administers its own
        passwd file.

Apollo's Domain system is much closer to a commercial distributed system.
It supports transparent file naming and transparent system accounting.  
A Bsd4.2 and a System 5 UNIX port is co-resident with Apollo's Aegis kernel.
This allows any unix program to take advantage of the distributed resources
naturally.

At this time Apollo's system does not yet support transparent process 
execution, but its implementation of the Network Computing System provides
considerable leverage for the development of fully distributed applications.
The Network Computing System (NCS) is a portable system composed of an 
RPC mechanism (complete with an interface description language and compiler 
for the automatic generation of client and server side stubs); and a 
location broker which allows applications to dynamically determine the 
location of transient objects.

- Joe Pato
  Apollo Computer Inc.
  apollo!pato@mit-eddie.arpa

* Network Computing System and NCS are trademarks of Apollo Computer Inc.
-------

-- 
Darrell Long
Department of Computer Science & Engineering, UC San Diego, La Jolla CA 92093
ARPA: Darrell@Beowulf.UCSD.EDU  UUCP: darrell@sdcsvax.uucp
Operating Systems submissions to: mod-os@sdcsvax.uucp

darrell@sdcsvax.UUCP (02/19/87)

Joe Pato makes a case for distributed systems based on Tanenbaum's
recent ACM CS survey.  The problem, again, is degree of coupling.
Consider another, recent text:

%A M. Ajmone Marsan
%A G. Balbo
%A G. Conte
%T Performance Models of Multiprocessor Systems
%S Computer Systems Series
%I MIT Press
%D 1986
%K Book, text, Torino Multiprocessor, TOMP, Markov modeling,
queueing network models, MVA, Stochastic Petri Networks, common memory,
shared memory, bus architectures,
%X As pointed out by the authors, the book does not cover more advanced
multiprocessor interconnection networks (MINs).

Marsan, et al. put distributed systems into three classes (pp. 101):
computer networks, multiprocessors, and `special' parallel machines
(quotes are mine).  I find it hard to try and separate some of these
issues when I maintain The Parallel Processing (multiprocessor/distributed
processing) Bibliography [ACM CAN, March 1985].  Many of the issues
are identical, only the timing is off to protect the names of the innocent.  

Note: I am not recommending this book, only that a diversity of
definitions exist including Philip Enslow's book, Kuck's papers,
Satyanarayanan's book, numerous IEEE Tutorials and so forth.
Distributed systems should be for than just file systems (Newcastle,
Clusters, LOCUS, etc.).  Additionally, it appears the application is
also important: Tandem, Parallel, Stratus, 3B20Ds, etc. etc. have
different topology requirements than follow-ons to Crays, etc.

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

darrell@sdcsvax.UUCP (02/20/87)

A very pleasant system which was 2/3 of a distributed system existed about 8
years ago on DEC-10's (TOPS-10 with extensive non-DEC modifications).  It had:
  Transparent distributed file & peripheral system
  Transparent distributed accounting & privilege system
It did not automatically load-balance, however, processes could be auto-
matically spawned on other CPUs (including machine-code-incompatible PDP11's).
This was NOT DECnet - ISCnet predated DECnet and to the end had considerably
more functionality as a moderately tightly coupled system.

This was implemented by a small company (Interactive Sciences Corp., Braintree,
MA) with an average of five programmers over four years.  Many features still
equal to the best currently available.  (I can bend your ear for hours :-) )
	Geoff Steckel (steckel@alliant.UUCP)

darrell@sdcsvax.UUCP (02/25/87)

In article <2721@sdcsvax.UCSD.EDU> darrell@sdcsvax.UUCP writes:
>[ In summary, when I asked "Why are  there  no  `real'  distributed ]
>[ systems?",  I  was  asking why aren't any commercially available? ]
>[ .... ]
>[ --DL  darrell@beowulf.ucsd.edu                                    ]

The LOCUS operating system appears to be a true distributed system, running
across heterogeneous cpu's. Unfortunately, it is also vaporware, as they
have not released it, and don't seem to be planning to any time real soon.

(To be fair, part of the problem seems to be that they want to track both
BSD and S5, and apparently by the time they get LOCUS up to compatibility
with one version of {BSD, S5}, {UCB, ATT} goes and releases the next one!
From what I understand, the initial version(s) of LOCUS were based on 4.1BSD.)

Non-flaming rebuttals from someone at LOCUS are welcome.
-- 
Arnold Robbins
CSNET:	arnold@emory	BITNET:	arnold@emoryu1
ARPA:	arnold%emory.csnet@csnet-relay.arpa
UUCP:	{ akgua, decvax, gatech, sb1, sb6, sunatl }!emory!arnold

-- 
Darrell Long
Department of Computer Science & Engineering, UC San Diego, La Jolla CA 92093
ARPA: Darrell@Beowulf.UCSD.EDU  UUCP: darrell@sdcsvax.uucp
Operating Systems submissions to: mod-os@sdcsvax.uucp

darrell@sdcsvax.UUCP (03/05/87)

Geoff, do you have any info on the system available either by email or usmail?
I am working on a distributed system and would be interested in lit describing
features/philosophy of that system.  Were you one of the 5 programmers on the
project?

Scott M. Hinnrichs
NetServices, Inc.
212 Oak Grove
Atherton, CA 94025