darrell@sdcsvax.UUCP (02/10/87)
I see that there hasn't been any traffic in this newsgroup for a while. I hope it's the snow back east, and not a lack of interest! Let me pose a question: Why haven't we seen any "real" distributed systems? I know of many academic projects, but few of them are widely used. Many never reach fruition. The closest thing in wide use is 4.3BSD, but that is certainly NOT a distributed system -- it's a conventional operating system with network facilities. On the industry side we have the VAX- Cluster, but that is more akin to a multi-processor than a distributed system. What, exactly, IS a distributed system? How does it differ from a system such as, say, 4.3BSD? Will any of the current academic projects see wide acceptance, like MACH or the V-System? Your opinions, views and comments are most welcome! DL
darrell@sdcsvax.UUCP (02/10/87)
Okay, I'll bite.
In the early 1970s, when I started hacking across the ARPAnet, the idea
of a dataset (I was in an IBM environment, we did not call them files)
existing on a machine and you not caring where that DS was a really neat
idea. The thought that a running program might migrate for load
balancing or fault-tolerant reasons was also there. Ten years later, I
find Dick Watson, working first at SRI and now LLNL, still working on
these ideas.
The problem of distributed computing is not well defined. DL mentions
multiprocessors but does not attempt to define a boundary. Another
problem is that putting a lot of heterogeneous stuff together is hard.
We have a tendency to solve easy problems and leave the hard stuff as
"exercises for the reader ;-)." To put together really heterogeneous
systems (and I don't mean VAXen and PDP-11s, but Crays, Suns, Britton-Lee
boxes, Evans and Sutherland CT6 systems, etc.) is
1) expensive, 2) hard, 3) lacking in pay-off, appreciation, etc. Why
work on it when you can make a buck selling PC sofware (sorry, too
cynical). Look at Ethernet: has an incredible number of detractors, but
every one is using it. Developing a new network system from hardware to
protocols will probably be as difficult as developing new operating
systems or programming languages. I wonder how long Boggs/Metcalfe's
Ethernet will be around? Modification of a joke I wrote for Usenix 1986
Winter:
What type of networking technology will be around for the 22th
Century?
I don't know, but it will be called Ethernet. . . .
I would hope things like Proteon would catch on, but Network Systems
and Xerox technology are out there. They exist. People want what
exists. It's easy to be a success and get support, and this is only the
lowest (hardware) level. I'm glad NCP is basically gone, but it shocks
me how entrenched DECnet has become. I also wonder about the
bureaucratic minds behind DECnet (not in DEC, but like in NASA's
communications schemes at a shall remain nameless Center).
The idea that some young scientist (future adminstrator)'s concepts about
the use of computing are being formed now with FORTRAN, DECnet, and
other things sends shivers up my spine. I can see him decreeing
something solely because it was in "his experience" not because he
studied the issues.
James Martin in his infinite wisdom (sic) noted somewhere that there
needs to be at least two more layers over the ISO model: I think they
were Accounting, and Administration. Real network people gawked. "Real"
world people noded "Yes."
Oh, back to Dick. He has a neat diagram of the problems:
Communications
/ \
/ \
/ \
/ \
Operating Systems -------- Programming Languages
These three communities don't talk to one another. He harps that the OS
people are too infatuated with Unix (with relatively poor network support,
brewed, too early on too small a machine), the PL people are infatuated
with Ada, and the networking people, well, they have not been around as
either community. But is communications all there is to distributed
systems? I would hope not. But so long as we get bogged down in things
like character set mapping, byte and bit order, word size, and
instruction set, reliability, we won't see the light at the end of
the tunnel.
Note: Standards alone, won't help. (as a Standard's person)
More experimental work is needed in all areas.
Another antecdote:
There are two ways to a mistake in building a computer communiations net:
The first mistake is to build it such that it looks like a
computer: this is SNA. It's when computer people try to build a
communications network without really understanding
communications.
The second mistake is to build it such that it looks like a
phone system: this is X.25. It's when phone people try to build a
communications network without really understanding computers.
Marty Fouts, John Mashey, others, want to add any thing?
>From the Rock of Ages Home for Retired Hackers:
--eugene miya
NASA Ames Research Center
eugene@ames-aurora.ARPA
"You trust the `reply' command with all those different mailers out there?"
"Send mail, avoid follow-ups. If enough, I'll summarize."
{hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene
Ethernet is a trademark of Xerox Corp.
DECnet is a trademark of Digital Equip. Corp.
Ada is a trademark of the US DOD AJPO.
and Star Wars is a trademark of Lucasfilm, Ltd.
darrell@sdcsvax.UUCP (02/13/87)
In article <2693@sdcsvax.UCSD.EDU> darrell@sdcsvax.uucp (Darrell Long) writes: >Let me pose a question: Why haven't we seen any "real" >distributed systems? I know of many academic projects, but few >of them are widely used. Many never reach fruition. The closest >thing in wide use is 4.3BSD, but that is certainly NOT a >distributed system -- it's a conventional operating system with >network facilities. On the industry side we have the VAX- >Cluster, but that is more akin to a multi-processor than a >distributed system. Tolerant Systems makes a fault-tolerant, distributed system. It is composed of tightly coupled, multiple processor nodes called SBBs (2 main processors with other processors to manage I/O channels and communications functions). These SBBs are in turn loosely coupled to each other in configurations ranging up to 40 in a *single* system (a single system image contained within a global name space). Users and processes "see" *one* system. The operating system, called TX, is derived from Unix. Internally it is vastly different, however the system calls and utilities remain compatible. Many systems are in use at customer sites around the world, with the largest single system installed at a customer site so far being composed of 33 SBBs managing about 100 GB of disk. Of course, these systems may be (and sometimes are) networked together with each other as well as with other systems in general via ethernet, TCP/IP, FTP, telnet, etc. and with BSD systems in particular via rlogin, rexec, rwho, rsh, and friends over the basic networking facilities. [I'd like to see some papers about this. Got any? -DL] -- UUCP: ...ihnp4!akgua!rebel!george ...{hplabs,seismo}!gatech!rebel!george Phone: (404) 662-1533 Snail: Tolerant Systems, 6961 Peachtree Industrial, Norcross, GA 30071 -- Darrell Long Department of Computer Science & Engineering, UC San Diego, La Jolla CA 92109 ARPA: Darrell@Beowulf.UCSD.EDU UUCP: darrell@sdcsvax.uucp Operating Systems submissions to: mod-os@sdcsvax.uucp
darrell@sdcsvax.UUCP (02/13/87)
In article <2693@sdcsvax.UCSD.EDU> darrell@sdcsvax.uucp (Darrell Long) writes: >Let me pose a question: Why haven't we seen any "real" >distributed systems? I know of many academic projects, but few >of them are widely used. Many never reach fruition. The closest >thing in wide use is 4.3BSD, but that is certainly NOT a >distributed system -- it's a conventional operating system with >network facilities. On the industry side we have the VAX- >Cluster, but that is more akin to a multi-processor than a >distributed system. Tolerant Systems makes a fault-tolerant, distributed system. It is composed of tightly coupled, multiple processor nodes called SBBs (2 main processors with other processors to manage I/O channels and communications functions). These SBBs are in turn loosely coupled to each other in configurations ranging up to 40 in a *single* system (a single system image contained within a global name space). Users and processes "see" *one* system. The operating system, called TX, is derived from Unix. Internally it is vastly different, however the system calls and utilities remain compatible. Many systems are in use at customer sites around the world, with the largest single system installed at a customer site so far being composed of 33 SBBs managing about 100 GB of disk. Of course, these systems may be (and sometimes are) networked together with each other as well as with other systems in general via ethernet, TCP/IP, FTP, telnet, etc. and with BSD systems in particular via rlogin, rexec, rwho, rsh, and friends over the basic networking facilities. UUCP: ...ihnp4!akgua!rebel!george ...{hplabs,seismo}!gatech!rebel!george Phone: (404) 662-1533 Snail: Tolerant Systems, 6961 Peachtree Industrial, Norcross, GA 30071 -- Darrell Long Department of Computer Science & Engineers, UC San Diego, La Jolla CA 92109 ARPA: Darrell@Beowulf.UCSD.EDU UUCP: darrell@sdcsvax.uucp Operating Systems submissions to: mod-os@sdcsvax.uucp
darrell@sdcsvax.UUCP (02/14/87)
What about Apollo's OS (don't know its name)? What about Stanford's V system? I've never used either one, but from what I've read they both seem to be truly distributed. One could also argue that the VAX/VMS cluster facility also represents a distributed OS - it provides a truly transparently distributed file system. [ I would argue that calling Apollo's Domain a "distributed system" ] [ would be close to calling 4.3BSD a "distributed system." I must ] [ confess that I do not have an Apollo on which to base this -- ] [ it's just folk-lore. ] [ ] [ As for the V-System, I would call that a "distributed system," ] [ but it is an extremely experimental system from what I can gather ] [ from the papers. From what I have heard from folks who have used ] [ it, there is little or no protection for data. ] [ ] [ In summary, when I asked "Why are there no `real' distributed ] [ systems?", I was asking why aren't any commercially available? ] [ It is interesting that this discussion has evolved into the very ] [ appropriate question: "What is a distributed system anyhow?" ] [ ] [ --DL darrell@beowulf.ucsd.edu ] -- Larry Campbell The Boston Software Works, Inc. Internet: campbell@maynard.uucp 120 Fulton Street, Boston MA 02109 uucp: {alliant,wjh12}!maynard!campbell +1 617 367 6846 ARPA: campbell%maynard.uucp@harvisr.harvard.edu MCI: LCAMPBELL
darrell@sdcsvax.UUCP (02/17/87)
Apollo's native operating system (AEGIS) supports a truly distributed file server, which is completely transparent to the user. It does not do any automatic load-balancing. Manually, you can spawn processes on any processor in the network.
darrell@sdcsvax.UUCP (02/18/87)
In article <2693@sdcsvax.UCSD.EDU> darrell@sdcsvax.uucp (Darrell Long) writes: >What, exactly, IS a distributed system? How does it differ from >a system such as, say, 4.3BSD? Will any of the current academic >projects see wide acceptance, like MACH or the V-System? A good description of distributed operating systems can be found in: Tanenbaum, A. S. and van Renesse R. 1985. Distributed Operating Systems. ACM Computing Surveys 17, 4 (Dec.) 419-470. Different people have different definitions for what defines a distributed system. The primary criteria for establishing a distributed system are: 1) transparency of file access / naming 2) transparency of process execution 3) transparency of protection / system accounting The key concept is the transparency of operations across the machines that consitute the distributed system. Bsd4.3 is not a distributed system. It does not meet any of the criteria we note above. 1) file access is not transparent. In order to access a remote file a special command must be executed in order to transfer the file to the local host (e.g., rcp, ftp). 2) Process execution is not transparent. Special commands must be executed in order to get a process running on a remote host (e.g., rsh, rlogin). Execution of these commands further requires the use to be aware of which host they desire to access. 3) System accounting is not transparent. Each machine administers its own passwd file. Apollo's Domain system is much closer to a commercial distributed system. It supports transparent file naming and transparent system accounting. A Bsd4.2 and a System 5 UNIX port is co-resident with Apollo's Aegis kernel. This allows any unix program to take advantage of the distributed resources naturally. At this time Apollo's system does not yet support transparent process execution, but its implementation of the Network Computing System provides considerable leverage for the development of fully distributed applications. The Network Computing System (NCS) is a portable system composed of an RPC mechanism (complete with an interface description language and compiler for the automatic generation of client and server side stubs); and a location broker which allows applications to dynamically determine the location of transient objects. - Joe Pato Apollo Computer Inc. apollo!pato@mit-eddie.arpa * Network Computing System and NCS are trademarks of Apollo Computer Inc. ------- -- Darrell Long Department of Computer Science & Engineering, UC San Diego, La Jolla CA 92093 ARPA: Darrell@Beowulf.UCSD.EDU UUCP: darrell@sdcsvax.uucp Operating Systems submissions to: mod-os@sdcsvax.uucp
darrell@sdcsvax.UUCP (02/19/87)
Joe Pato makes a case for distributed systems based on Tanenbaum's recent ACM CS survey. The problem, again, is degree of coupling. Consider another, recent text: %A M. Ajmone Marsan %A G. Balbo %A G. Conte %T Performance Models of Multiprocessor Systems %S Computer Systems Series %I MIT Press %D 1986 %K Book, text, Torino Multiprocessor, TOMP, Markov modeling, queueing network models, MVA, Stochastic Petri Networks, common memory, shared memory, bus architectures, %X As pointed out by the authors, the book does not cover more advanced multiprocessor interconnection networks (MINs). Marsan, et al. put distributed systems into three classes (pp. 101): computer networks, multiprocessors, and `special' parallel machines (quotes are mine). I find it hard to try and separate some of these issues when I maintain The Parallel Processing (multiprocessor/distributed processing) Bibliography [ACM CAN, March 1985]. Many of the issues are identical, only the timing is off to protect the names of the innocent. Note: I am not recommending this book, only that a diversity of definitions exist including Philip Enslow's book, Kuck's papers, Satyanarayanan's book, numerous IEEE Tutorials and so forth. Distributed systems should be for than just file systems (Newcastle, Clusters, LOCUS, etc.). Additionally, it appears the application is also important: Tandem, Parallel, Stratus, 3B20Ds, etc. etc. have different topology requirements than follow-ons to Crays, etc. From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene
darrell@sdcsvax.UUCP (02/20/87)
A very pleasant system which was 2/3 of a distributed system existed about 8 years ago on DEC-10's (TOPS-10 with extensive non-DEC modifications). It had: Transparent distributed file & peripheral system Transparent distributed accounting & privilege system It did not automatically load-balance, however, processes could be auto- matically spawned on other CPUs (including machine-code-incompatible PDP11's). This was NOT DECnet - ISCnet predated DECnet and to the end had considerably more functionality as a moderately tightly coupled system. This was implemented by a small company (Interactive Sciences Corp., Braintree, MA) with an average of five programmers over four years. Many features still equal to the best currently available. (I can bend your ear for hours :-) ) Geoff Steckel (steckel@alliant.UUCP)
darrell@sdcsvax.UUCP (02/25/87)
In article <2721@sdcsvax.UCSD.EDU> darrell@sdcsvax.UUCP writes: >[ In summary, when I asked "Why are there no `real' distributed ] >[ systems?", I was asking why aren't any commercially available? ] >[ .... ] >[ --DL darrell@beowulf.ucsd.edu ] The LOCUS operating system appears to be a true distributed system, running across heterogeneous cpu's. Unfortunately, it is also vaporware, as they have not released it, and don't seem to be planning to any time real soon. (To be fair, part of the problem seems to be that they want to track both BSD and S5, and apparently by the time they get LOCUS up to compatibility with one version of {BSD, S5}, {UCB, ATT} goes and releases the next one! From what I understand, the initial version(s) of LOCUS were based on 4.1BSD.) Non-flaming rebuttals from someone at LOCUS are welcome. -- Arnold Robbins CSNET: arnold@emory BITNET: arnold@emoryu1 ARPA: arnold%emory.csnet@csnet-relay.arpa UUCP: { akgua, decvax, gatech, sb1, sb6, sunatl }!emory!arnold -- Darrell Long Department of Computer Science & Engineering, UC San Diego, La Jolla CA 92093 ARPA: Darrell@Beowulf.UCSD.EDU UUCP: darrell@sdcsvax.uucp Operating Systems submissions to: mod-os@sdcsvax.uucp
darrell@sdcsvax.UUCP (03/05/87)
Geoff, do you have any info on the system available either by email or usmail? I am working on a distributed system and would be interested in lit describing features/philosophy of that system. Were you one of the 5 programmers on the project? Scott M. Hinnrichs NetServices, Inc. 212 Oak Grove Atherton, CA 94025