fouts@orville%ames.arpa (Marty Fouts) (02/04/87)
Direct measurement is very useful, as Gene has pointed out, but suffers from drawbacks: 1) measurement interfers with the system being measured. 2) measurement is subject to sampling frequency related distortion. 3) measurement can miss transients. 4) measurement still requires analysis and interpretation. 5) Some times you need to know if it will work before you can build it. In fact, measurement by itself doesn't solve most of Gene's concerns. It is easy to cook a set of measurements to obtain the desired results. After all, many vendor supplied benchmarks are merely cooked measurements. You do trade off in reality. You ignore events that you can't process quickly enough, you ignore data you don't understand, and you ignore events that you can't measure adequately. Direct measurement has as many drawbacks as modeling. Measurement is only the first step in understanding. Building models to predict future behavior is the second. Comparing the results of models to real measurements is the third. You only understand something if you can predict its behavior and demonstrate the validity of your predictions. In the context of computer performance analysis, a set of measured results will frequently lead to ambiguous interpretation. It is sometimes not possible to find a measurement which will distinguish between the alternatives without first producing at least an analytic model of the system.
darrell@sdcsvax.UUCP (02/11/87)
One of Gene's throwaway lines seems to me to be a key point. The idea behind a lot of distributed systems seems to be that you could load balance and gain fault tolerance by distributing. Unfortunately, you can do both easier by using a multiprocessor, rather than a distributed system. The NAS project here at Ames has taken an approach to heterogeneous systems which is short of a distributed system. We are taking work stations, minis, mainframes, and supercomputers and building a computer system around "a common user interface" by having them all run Un*x variants This system isn't distributed, because the indvidual system's are autonomous, but it allows much code to be easily transportable, and it allows us to develop distributed applications. And the results we are seeing are reinforcing something I believed before I got here: distributed heterogeneous systems don't work because by being heterogeneous they prevent you from being able to load balance and be fault tolerant. This isn't suprising; If the Cray 2 is down for two hours, there isn't enough time on all of our stations combined to make up for the lost CPU cycles and besides, the binaries won't run on a 68K anyway. To me the answer is to run the machines with seperate operating systems, but to provide user and programmer interfaces with good communications primitives so that distributed applications can be developed. I believe that the relative cost of communications versus multiprocessors is always going to make multiprocessors more atractive for homogenous applications than distributed systems and that the difficulty of programming hetrogenous systems is always going to make autonomous systems more attractive than distributed homogenous systems. BTW Gene, aren't you ashamed to be badly plagerizing Einstein?
darrell@sdcsvax.UUCP (02/11/87)
I am looking for a good, generally acceptable definition of the notion "distributed system". The literature contains many definitions of this notion (see below), but they all are different and all seem to be unsatisfactory in some way. Most definitions focus on high communication costs and/or absence of shared memory. With the current state-of-the-art in computer communications it seems strange to insist on high communication costs. For example, John Limb of Bell Labs has built a 200 megabit/second net that could be extended over many miles and ran on fiber optics. Should a system using such a communication medium be excluded as a distributed system because it is too fast ? Neither is it clear why shared memory is so essential. There are several distributed systems and languages that try to present some form of conceptual shared memory (e.g., David Gelernter's Linda, David Cheriton's "Problem-oriented shared memory," and Kai Li's "shared virtual memory"). So why is the absence of shared memory so important? At least, most people seem to agree that network-based systems ought to be included and that, for example, vector computers and dataflow machines are to be excluded. Yet, there are some cases that are less clear, for example: - a single board with lots of Transputers - a system consisting of several nodes connected by a LAN, where every node contains 5 processors and a single shared memory Does anyone know of an accurate, generally acceptable, definition ? ----------------------------------------------------------------------- Here are a few of the definitions I found: Leslie Lamport (CACM July 1978): A distributed system consists of a collection of distinct processes which are spatially separated, and which communicate with one another by exchanging messages. .... A system is distributed if the message transmission delay is not negligible compared to the time between events in a single process. Crookes&Elder (SPE Sept. '81): We thus define a distributed system as a system comprising a number of processors (each with its own private memory space), interconnected in a way which does not provide shared memory. Stammers (in: Concurrent Languages in Distr. Systems, North-Holland, 1984) : In this paper, a distributed system is one that has several memories, some or all of which cannot be directly addressed by every processor. Thus the distribution need not be geographical. Similarly the hardware components could be loosely coupled, for example by communication lines, or tightly coupled, for example by memories accessible from more than one bus. M.L. Scott (PhD thesis Univ. of Wisconsin at Madison, May '85): I use the adjective 'distributed' to describe any hardware or software involving interacting computations on processors that share no physical memory. Filman & Friedman (Coordinated Computing: Tools and Techniques for Distributed Software, McGraw-Hill, 1984): We classify multiple processor systems by their communications bandwidth. Systems that allow sharing of much information are multiprocessors. Such systems can be thought of as providing shared memory to their processes. .... Systems that incur higher communication costs are distributed systems. Henri Bal
darrell@sdcsvax.UUCP (02/11/87)
When is an application ``distributed?''
=======================================
To answer this question, we will try to find the borders of the
domain of distributed applications. It seems to us that this
is better than describing an arbitrary point within the domain
by summing up some properties that distributed applications
usually have. We will start by naming four extreme examples,
each running on two processors. We take it as self-evident
that there must be at least two processors involved in a dis-
tributed application.
Example 1. One processor calculates the odd-numbered decimals
of pi; another the even-numbered in parallel. This is a
distributed application. Note that there is no communica-
tion involved, although there might be some in the end to
merge the results.
Example 2. We have a client process on one processor and a
server process on another, communicating using RPC. There
is no parallelism, since the client blocks while the
server runs, but this still is a distributed application.
Example 3. One processor calculates decimals of pi; another is
doing a compilation. This is not a distributed applica-
tion, although the processors are running in parallel.
Example 4. Two unrelated processes communicate while contend-
ing for a resource, such as a shared disk. This is not a
distributed application either, although there is communi-
cation between the processes.
The results can be summarized as follows:
example | distributed | parallel | communication
--------|-------------|----------|--------------
1 | yes | yes | no
2 | yes | no | yes
3 | no | yes | no
4 | no | maybe | yes
Examples 1 and 4 show that communication is neither a necessary
nor a sufficient property of distributed applications. Exam-
ples 2 and 3 show that distribution and parallelism are not
directly related either.
Definition. A distributed application is an application
carried out by two or more processors.
This leaves the terms ``application'' and ``processor'' to be
defined. For example, a data-flow machine is a distributed
system on a low-level, although on the user level it is neither
an application nor distributed. Another example is a distri-
buted application running on a uniprocessor that is simulating
a distributed system. A distributed application running on
this simulator is still a distributed application, although in
reality there is only one processor and no communication
involved at all. We believe that everybody has intuitive ideas
of what an application and what a processor is, and will not
try to obscure these ideas with a formal definition.
What is a distributed system?
=============================
We have defined a distributed application, but as we have seen
in the example of a distributed application running on a
uniprocessor, a distributed application does not imply a dis-
tributed system. What is a distributed system, then? Again,
we try to define the borders of the domain. We assume that
there must be two or more processors.
Example 1. There are two computers in the same room which are
not physically connected in any way. Whether they are
both computing parts of the same application or not, they
do not make up a distributed system.
Example 2. Two computers are connected by an Ethernet. They
are not communicating over the network, and never have;
still, the possibility for communication exists. They are
a distributed system.
Example 3. Two processing units, each with their own local
memory, have access to a global bus and common shared
memory. They compute independently, but can communicate
through the shared memory. They are a distributed system.
Example 4. A computer is made up, among other components, of a
CPU and a disk drive. The disk drive is a slave to the
CPU, but once it has received a request from the process-
ing unit, it does some processing independent of and in
parallel with the CPU. On the level of a systems program-
mer, this is a distributed system. There is no real
difference in driving a disk controller or a network dev-
ice, especially if the network is reliable. On the user's
level, the system is not distributed.
Example 5. A processor is a dedicated file server. It is a
slave to the requests of other processors connected to it
through a network, but after receiving a request, it does
some processing independent of and in parallel with the
requesting processor. On the level of a systems program-
mer, this is a distributed system. On the user's level,
it is not.
| distributed | shared |
example | system | memory | communication
--------|-------------|---------------|--------------
1 | no | no | no
2 | yes | no | possible
3 | yes | yes | yes
4 | yes | yes (sort of) | yes
5 | yes | no | yes
The examples suggest the following. The possibility for
communication is necessary for a distributed system. The
method of communication is irrelevant; whether the processors
communicate via a network, shared memory, or device registers
(in the case of the disk drive), they still make up a distri-
buted system. A given system can be distributed on one level,
and not distributed on another, higher, level. Most, if not
all systems are distributed on some level, hardly ever on the
user's level.
Definition. A distributed system consists of two or more
processors which have the ability to commun-
icate with one another.
What is a distributed operating system?
=======================================
The set of distributed systems includes not only distributed
operating systems, but also distributed database systems, dis-
tributed airline reservation systems, and systems with multiple
operating systems, such as the UNIX BSD4.3 operating system.
This last system, however, is not a distributed operating sys-
tem, since it does not support distributed applications
directly. So what features should an operating system for mul-
tiple computers have before it deserves the title ``distri-
buted?''
This is where we get into the fuzzy areas. ``When is a system
an operating system?'' is a similar question. A system that
simplifies disk access in the form of files is an operating
system. A system that supports multi-tasking is an operating
system. A distributed operating system must include at least
these features in combination with the ability to communicate
between machines.
But this is not enough. A distributed application needs sup-
port for starting processes on different processors, support
for communication between the processes independent of where
they run, load balancing and fault tolerance mechanisms, and
control for distributed processes (signaling, etc.). An
operating system that supports some of these mechanisms
directly can be called distributed. So, although the UNIX
4.3BSD system could be made into a distributed operating system
by adding these mechanisms to it, the way it currently exists,
it is not.
Jennifer Steiner (jennifer@cwi.nl)
Robbert van Renesse (cogito@cs.vu.nl)
The Amoeba Project
Amsterdam
darrell@sdcsvax.UUCP (02/12/87)
In article <2699@sdcsvax.UCSD.EDU> cogito@cs.vu.nl (Jennifer Steiner and Robbert van Renesse) suggest the following definitions: > > Definition. A distributed application is an application > carried out by two or more processors. > > Definition. A distributed system consists of two or more > processors which have the ability to commun- > icate with one another. Other recent submissions have made an additional point which I would like to second: in order to be "distributed" an application or system must not rely on shared memory. At the application level, this means that processes use a communication mechanism based on some form of message passing. (I consider RPC to be a form of message passing.) At the hardware level, it means that physical memory is local. I agree with Steiner and van Ranesse that a non-distributed application can run on a distributed system that simulates shared memory. It is of course possible to cite examples where the line is hard to draw. Cm*, for example, might or might not be considered to support shared memory, depending on whether you count the microcode of the Kmap as "hardware". At a higher level, the tuple space of Linda might or might not be considered a message-passing abstraction, depending on one's point of view. I find it useful to keep distinct the meanings of the words "concurrent", "parallel", and "distributed." In the interests of broadening the discussion, I suggest the following definitions: "Concurrent" implies the simultaneous existence of more than one thread of control. "Parallel" implies the simultaneous *execution* of more than one thread of control. "Distributed" implies interaction between threads of control on processors that share no physical memory. Parallel and distributed both imply concurrent. Most distributed computations are parallel. Coroutines are concurrent but not parallel. -- Michael L. Scott University of Rochester (716) 275-7745 scott@rochester.arpa scott%rochester@CSNET-RELAY {decvax, allegra, seismo, cmcl2}!rochester!scott
darrell@sdcsvax.UUCP (02/13/87)
I think that trying to define a "distributed system" is a lot trying to define an "artifically intelligent" system. The definitive definition remains elusive because these terms refer more to collections of techniques and models of problems than to actual physical properties of systems. A "distributed system" is a system that makes good use of "distributed programming techniqes." Of course, defining "distributed programming technique" is no easier, but there are various characteristic features: -- The technique is designed to perform well in the presence of high communications latency, often being optimized for this property at the expense of other costs that would be relatively more significant in a system having lower communications latencies. -- The technique may also be optimized for low communications bandwidth. This is similar to, but should not be confused with high latency. Even high-bandwidth fiber-optic networks don't have to be all that long for latency to be significantly greater than memory access time. -- Techniques for improving reliability and availability via replication are often considered to be "distributed computing techniques", both because distributed systems *can* be partially available due to being physically distributed, and because they often *are* partially available due to their complexity. -- Techniques for acheiving security with insecure communications are "distributed techniques", since distributed systems often have insecure communications. -- Techniques for locating services or objects given some symbolic or indirect "name" are often "distributed" techniques, since the fluid configuration of distributed systems makes indirect references necessary. If you agree that distributedness is a matter of degree rather than a boolean attribute, then it is clearly hopeless to firmly discriminate between a "distributed system" and a "multiprocessor", since a system can be built at any level of "distributedness". Going back to the A.I analogy, I think that these complaints about there not "really being any commercial distributed systems" are similar to the complaints from A.I. researchers about all these expert systems companies not doing "real A.I.". The problem with both these fields is that once a technique really works well, it tends to be co-opted by ordinary programmers trying to get work done. A.I. started with heuristic search and logical inference algorithms. Probably distributed systems started the day that someone ran a serial line across the room to the next machine. Basic applications of these techniques are not very sexy research anymore, and thus not "real" A.I. or distributed systems to academics. A conflicting pressure comes from the hype factor in industry. "Buy our system because it's distributed." Even ignoring the vagueness of "distributed", this is a bogus claim, since a rational consumer should make their choice based on how well the system solves their problem, rather than on how the system is implemented. Rob
darrell@sdcsvax.UUCP (03/09/87)
I'm hardly an expert on these things, but I thought my two cents might be interesting to some. Anyway, the moderator can refuse to post this if I don't say anything substantive. In article <2819@sdcsvax.UCSD.EDU>, steve@basser.oz (Stephen Russell) writes: > One approach is to provide no protection at all, as in the V > kernel. Since process id's are small integers, any process can > send to any other process. This means that any validation of the > right of the sender to communicate with the receiver must be done > by the receiver process. This seems to have the advantage of > simplifying the kernel, and makes IPC faster, and protection is > optional, depending on the paranoia of the receiver or the > criticality of the operation requested. However, what are the > disadvantages? One possible disadvantage is that although a "paranoid" receiver (one which carefully screens the pids of its incoming messages) can ensure that no unauthorized process will receive its particular service, it may be simple to deluge such a receiver with unauthor- ized messages, thereby bringing it (and maybe important parts of the system) to its figurative knees. The sys admin would have to keep an eye out for this kind of maliciousness. > Many other systems rely on secure kernels, and establish more > tightly controlled links between processes (a virtual circuit > approach). The disadvantage seems to be the overhead of creating > using and destroying such links. True, this can be a significant performance disadvantage. I am of the opinion that if the system is not required to be "absolutely" secure, then security should be enforced on a process-by-process basis, as needed, using permission checking (possibly based on pids), encryption, and/or whatever is necessary to ensure the desired level of security. If the entire system is required to be secure, it makes more sense to build the security into the kernel, whether you implement the tighter coupling or not. I would expect all communicating processes to take a hit under this scheme, but you pays your money, you takes your choice... > The system I am currently developing protects process id's by > adding a large (64 bit) random number to the normal pid. This > `signature' makes it much more unlikely that you can send to a > process without permission, in the same way as provided by the > sparseness of Amoeba's port numbers. I assume this is the receiving process's pid you're talking about here.. and the sending process is told the correct "virtual pid" upon establishing a connection with the receiver? (I'm not quite clear on this, ). > On a related issue, consider a process acting as an intermediary > between a root owned process and some other server. How does the > intermediary gain root privileges for its requests to the other > server? That is, how does the server verify that the request from > the intermediary is on behalf of a privileged process? The process would have to be given sufficient privilege. I would probably make it setuid root (or equivalent non-UNIX idiom if available). If a process has to do root-type things, it might as well run that way - I don't see the sense in having it run as any other user. ...!decvax!decuac - Phil Kos \ The Johns Hopkins Hospital ...!seismo!mimsy - -> !aplcen!osiris!phil Baltimore, MD / ...!allegra!mimsy -
darrell@sdcsvax.UUCP (03/11/87)
In article <2838@sdcsvax.UCSD.EDU> ron@BRL.ARPA writes: >The S1 operating system is a (claimed) product of Multi Soulutions Inc. >[...] >What is especially distressing, is that there is a architecture research >project developing a computer called "S1" which is much more creditable. Yes indeed. While I am only peripheraly involved in the development of the S-1 computer, and while I am quite sure they are capable of "defending" themselves, I would like to mention that *their* (or should I say "our") computer is real, and works. I have seen it run with my own eyes. They have *nothing* to do with Multi Solutions Inc. S-1 computers run either Unix or Amber -- a local OS. -- "I was led through a beaded curtain, and across a floor so cunningly laid that, no matter where you stood, it was always under your feet..." -- Berry Kercheval -- berry@mordor.s1.gov -- {ucbvax!decwrl,siesmo}!mordor!berry Lawrence Livermore National Laboratory, Special Studies Program ("O" division)
darrell@sdcsvax.UUCP (03/26/87)
[I expect this will spark some controversy. -DL] As I understand the history, Multics first inspired PRIMOS, which was sometimes called "Multics in a Matchbox." The technical talent behind this inspiration left Prime to form Apollo when Prime's management decided that workstations would never catch on. Of course, they continued with the basic good ideas at Apollo. I arrived at Prime shortly thereafter, and took awhile to realize that I'd "missed the bus" technologically. Primos is a shadow of what it could have been with that heritage, and it sounds like Aegis is also trying to hide the gold with lead-colored paint. -- Bob Munck