[mod.os] Submission for mod-os

fouts@orville%ames.arpa (Marty Fouts) (02/04/87)

Direct measurement is very useful, as Gene has pointed out, but suffers
from drawbacks:

1) measurement interfers with the system being measured.
2) measurement is subject to sampling frequency related distortion.
3) measurement can miss transients.
4) measurement still requires analysis and interpretation.
5) Some times you need to know if it will work before you can build it.

In fact, measurement by itself doesn't solve most of Gene's concerns.  It is
easy to cook a set of measurements to obtain the desired results.  After all,
many vendor supplied benchmarks are merely cooked measurements.

You do trade off in reality.  You ignore events that you can't process quickly
enough, you ignore data you don't understand, and you ignore events that you
can't measure adequately.  Direct measurement has as many drawbacks as
modeling.

Measurement is only the first step in understanding.  Building models to
predict future behavior is the second.  Comparing the results of models to
real measurements is the third.  You only understand something if you can
predict its behavior and demonstrate the validity of your predictions.

In the context of computer performance analysis, a set of measured results
will frequently lead to ambiguous interpretation.  It is sometimes not
possible to find a measurement which will distinguish between the alternatives
without first producing at least an analytic model of the system.

darrell@sdcsvax.UUCP (02/11/87)

One of Gene's throwaway lines seems to me to be a key point.  The idea
behind a lot of distributed systems seems to be that you could load balance
and gain fault tolerance by distributing. Unfortunately, you can do both
easier by using a multiprocessor, rather than a distributed system.

The NAS project here at Ames has taken an approach to heterogeneous systems
which is short of a distributed system.  We are taking work stations,
minis, mainframes, and supercomputers and building a computer system around
"a common user interface" by having them all run Un*x variants

This system isn't distributed, because the indvidual system's are autonomous,
but it allows much code to be easily transportable, and it allows us to
develop distributed applications.  And the results we are seeing are
reinforcing something I believed before I got here:  distributed heterogeneous
systems don't work because by being heterogeneous they prevent you from being
able to load balance and be fault tolerant.

This isn't suprising;  If the Cray 2 is down for two hours, there isn't enough
time on all of our stations combined to make up for the lost CPU cycles and
besides, the binaries won't run on a 68K anyway.

To me the answer is to run the machines with seperate operating systems, but
to provide user and programmer interfaces with good communications primitives
so that distributed applications can be developed.

I believe that the relative cost of communications versus multiprocessors
is always going to make multiprocessors more atractive for homogenous
applications than distributed systems and that the difficulty of programming
hetrogenous systems is always going to make autonomous systems more attractive
than distributed homogenous systems.

BTW Gene, aren't you ashamed to be badly plagerizing Einstein?

darrell@sdcsvax.UUCP (02/11/87)

I am looking for a good, generally acceptable definition of
the notion "distributed system".
The literature contains many definitions of this notion (see below),
but they all are different and all seem to be unsatisfactory in some way.

Most definitions focus on high communication costs and/or absence of shared
memory. With the current state-of-the-art in computer communications it
seems strange to insist on high communication costs. For example,
John Limb of Bell Labs has built a 200 megabit/second net that could be extended
over many miles and ran on fiber optics. Should a system using such a
communication medium be excluded as a distributed system because it is
too fast ?
Neither is it clear why shared memory is so essential. There are several
distributed systems and languages that try to present some form of conceptual
shared memory (e.g., David Gelernter's Linda,  David Cheriton's
"Problem-oriented shared memory," and Kai Li's "shared virtual memory").
So why is the absence of shared memory so important?

At least, most people seem to agree that network-based systems ought to be
included and that, for example, vector computers and dataflow machines
are to be excluded. Yet, there are some cases that are less clear, for example:
- a single board with lots of  Transputers
- a system consisting of several nodes connected by a LAN, where
  every node contains 5 processors and a single shared memory

Does anyone know of an accurate, generally acceptable, definition ?

-----------------------------------------------------------------------

Here are a few of the definitions I found:

Leslie Lamport (CACM July 1978):
	A distributed system consists of a collection of distinct processes
	which are spatially separated, and which communicate with one another
	by exchanging messages.
	....
	A system is distributed if the message transmission delay is not
	negligible compared to the time between events in a single process.

Crookes&Elder (SPE Sept. '81):
	We thus define a distributed system as a system comprising a number
	of processors (each with its own private memory space), interconnected
	in a way which does not provide shared memory.

Stammers (in: Concurrent Languages in Distr. Systems, North-Holland, 1984) :
	In this paper, a distributed system is one that has several
	memories, some or all of which cannot be directly addressed by
	every processor. Thus the distribution need not be geographical.
	Similarly the hardware components could be loosely coupled,
	for example by communication lines, or tightly coupled, for example
	by memories accessible from more than one bus.

M.L. Scott (PhD thesis Univ. of Wisconsin at Madison,  May '85):
	I use the adjective 'distributed' to describe any hardware or
	software involving interacting computations on processors that
	share no physical memory.

Filman & Friedman (Coordinated Computing: Tools and Techniques for
		   Distributed Software, McGraw-Hill, 1984):
	We classify multiple processor systems by their communications
	bandwidth. Systems that allow sharing of much information are
	multiprocessors. Such systems can be thought of as providing
	shared memory to their processes.
	....
	Systems that incur higher communication costs are distributed systems.

Henri Bal

darrell@sdcsvax.UUCP (02/11/87)

When is	an application ``distributed?''
=======================================

To answer this question, we will try to	find the borders of the
domain	of  distributed	applications.  It seems	to us that this
is better than describing an arbitrary point within the	 domain
by  summing  up	 some  properties that distributed applications
usually	have.  We will start by	naming four  extreme  examples,
each  running  on  two	processors.  We	take it	as self-evident
that there must	be at least two	processors involved in	a  dis-
tributed application.

Example	1.  One	processor calculates the odd-numbered  decimals
     of	 pi;  another the even-numbered	in parallel.  This is a
     distributed application.  Note that there is no communica-
     tion  involved, although there might be some in the end to
     merge the results.

Example	2.  We have a client process on	 one  processor	 and  a
     server process on another,	communicating using RPC.  There
     is	no parallelism,	 since	the  client  blocks  while  the
     server runs, but this still is a distributed application.

Example	3.  One	processor calculates decimals of pi; another is
     doing  a  compilation.  This is not a distributed applica-
     tion, although the	processors are running in parallel.

Example	4.  Two	unrelated processes communicate	while  contend-
     ing  for a	resource, such as a shared disk.  This is not a
     distributed application either, although there is communi-
     cation between the	processes.

The results can	be summarized as follows:

       example | distributed | parallel	| communication
       --------|-------------|----------|--------------
	  1    |     yes     |	yes	|      no
	  2    |     yes     |	no	|      yes
	  3    |     no	     |	yes	|      no
	  4    |     no	     |	maybe	|      yes

Examples 1 and 4 show that communication is neither a necessary
nor  a	sufficient property of distributed applications.  Exam-
ples 2 and 3 show that distribution  and  parallelism  are  not
directly related either.

    Definition.  A distributed application is an application
		 carried out by two or more processors.

This leaves the	terms ``application'' and ``processor''	 to  be
defined.   For	example,  a  data-flow machine is a distributed
system on a low-level, although	on the user level it is	neither
an  application	 nor distributed.  Another example is a	distri-
buted application running on a uniprocessor that is  simulating
a  distributed	system.	  A  distributed application running on
this simulator is still	a distributed application, although  in
reality	 there	is  only  one  processor  and  no communication
involved at all.  We believe that everybody has	intuitive ideas
of  what  an  application and what a processor is, and will not
try to obscure these ideas with	a formal definition.

What is	a distributed system?
=============================

We have	defined	a distributed application, but as we have  seen
in  the	 example  of  a	 distributed  application  running on a
uniprocessor, a	distributed application	does not imply	a  dis-
tributed  system.   What is a distributed system, then?	 Again,
we try to define the borders of	the  domain.   We  assume  that
there must be two or more processors.

Example	1.  There are two computers in the same	room which  are
     not  physically  connected	 in  any way.  Whether they are
     both computing parts of the same application or not,  they
     do	not make up a distributed system.

Example	2.  Two	computers are connected	by an  Ethernet.   They
     are  not  communicating  over the network,	and never have;
     still, the	possibility for	communication exists.  They are
     a distributed system.

Example	3.  Two	processing units, each	with  their  own  local
     memory,  have  access  to	a  global bus and common shared
     memory.  They compute independently, but  can  communicate
     through the shared	memory.	 They are a distributed	system.

Example	4.  A computer is made up, among other components, of a
     CPU  and  a  disk drive.  The disk	drive is a slave to the
     CPU, but once it has received a request from the  process-
     ing  unit,	 it  does some processing independent of and in
     parallel with the CPU.  On	the level of a systems program-
     mer,  this	 is  a	distributed  system.   There is	no real
     difference	in driving a disk controller or	a network  dev-
     ice, especially if	the network is reliable.  On the user's
     level, the	system is not distributed.

Example	5.  A processor	is a dedicated file server.   It  is  a
     slave  to the requests of other processors	connected to it
     through a network,	but after receiving a request, it  does
     some  processing  independent  of and in parallel with the
     requesting	processor.  On the level of a systems  program-
     mer,  this	 is a distributed system.  On the user's level,
     it	is not.

	       | distributed |	  shared     |
       example |   system    |	  memory     | communication
       --------|-------------|---------------|--------------
	  1    |     no	     |	    no	     |	    no
	  2    |     yes     |	    no	     |	 possible
	  3    |     yes     |	    yes	     |	    yes
	  4    |     yes     | yes (sort of) |	    yes
	  5    |     yes     |	    no	     |	    yes

The  examples  suggest	the  following.	  The  possibility  for
communication  is  necessary  for  a  distributed  system.  The
method of communication	is irrelevant;  whether	the  processors
communicate  via  a network, shared memory, or device registers
(in the	case of	the disk drive), they still make up  a	distri-
buted  system.	A given	system can be distributed on one level,
and not	distributed on another,	higher,	level.	 Most,	if  not
all  systems  are distributed on some level, hardly ever on the
user's level.

    Definition.  A distributed system consists of two or more
		 processors which have the ability to commun-
		 icate with one another.

What is	a distributed operating	system?
=======================================

The set	of distributed systems includes	 not  only  distributed
operating  systems, but	also distributed database systems, dis-
tributed airline reservation systems, and systems with multiple
operating  systems,  such  as the UNIX BSD4.3 operating	system.
This last system, however, is not a distributed	operating  sys-
tem,   since  it  does	not  support  distributed  applications
directly.  So what features should an operating	system for mul-
tiple  computers  have	before	it deserves the	title ``distri-
buted?''

This is	where we get into the fuzzy areas.  ``When is a	 system
an  operating  system?''  is a similar question.  A system that
simplifies disk	access in the form of  files  is  an  operating
system.	  A  system that supports multi-tasking	is an operating
system.	 A distributed operating system	must include  at  least
these  features	 in combination	with the ability to communicate
between	machines.

But this is not	enough.	 A distributed application  needs  sup-
port  for  starting  processes on different processors,	support
for communication between the processes	 independent  of  where
they  run,  load  balancing and	fault tolerance	mechanisms, and
control	 for  distributed  processes  (signaling,  etc.).    An
operating   system  that  supports  some  of  these  mechanisms
directly can be	called	distributed.   So,  although  the  UNIX
4.3BSD system could be made into a distributed operating system
by adding these	mechanisms to it, the way it currently	exists,
it is not.

			Jennifer Steiner (jennifer@cwi.nl)
			Robbert van Renesse (cogito@cs.vu.nl)
			The Amoeba Project
			Amsterdam

darrell@sdcsvax.UUCP (02/12/87)

In article <2699@sdcsvax.UCSD.EDU> cogito@cs.vu.nl
(Jennifer Steiner and Robbert van Renesse) suggest the following definitions:
>
>    Definition.  A distributed application is an application
>        carried out by two or more processors.
>
>    Definition.  A distributed system consists of two or more
>        processors which have the ability to commun-
>        icate with one another.

Other recent submissions have made an additional point which I would like
to second:  in order to be "distributed" an application or system must not
rely on shared memory.  At the application level, this means that processes
use a communication mechanism based on some form of message passing.
(I consider RPC to be a form of message passing.)  At the hardware level,
it means that physical memory is local.  I agree with Steiner and van Ranesse
that a non-distributed application can run on a distributed system that
simulates shared memory.

It is of course possible to cite examples where the line is hard to draw.
Cm*, for example, might or might not be considered to support shared memory,
depending on whether you count the microcode of the Kmap as "hardware".
At a higher level, the tuple space of Linda might or might not be considered
a message-passing abstraction, depending on one's point of view.

I find it useful to keep distinct the meanings of the words "concurrent",
"parallel", and "distributed."  In the interests of broadening the discussion,
I suggest the following definitions:

    "Concurrent" implies the simultaneous existence of more than one
    thread of control.

    "Parallel" implies the simultaneous *execution* of more than one
    thread of control.

    "Distributed" implies interaction between threads of control on
    processors that share no physical memory.

Parallel and distributed both imply concurrent.
Most distributed computations are parallel.
Coroutines are concurrent but not parallel.
-- 
Michael L. Scott
University of Rochester    (716) 275-7745
scott@rochester.arpa       scott%rochester@CSNET-RELAY
{decvax, allegra, seismo, cmcl2}!rochester!scott

darrell@sdcsvax.UUCP (02/13/87)

I think that trying to define a "distributed system" is a lot trying to
define an "artifically intelligent" system.  The definitive definition
remains elusive because these terms refer more to collections of techniques
and models of problems than to actual physical properties of systems.

A "distributed system" is a system that makes good use of "distributed
programming techniqes."  Of course, defining "distributed programming
technique" is no easier, but there are various characteristic features:
 -- The technique is designed to perform well in the presence of high
    communications latency, often being optimized for this property
    at the expense of other costs that would be relatively more significant
    in a system having lower communications latencies.
 -- The technique may also be optimized for low communications bandwidth.
    This is similar to, but should not be confused with high latency.
    Even high-bandwidth fiber-optic networks don't have to be all that long
    for latency to be significantly greater than memory access time.
 -- Techniques for improving reliability and availability via replication
    are often considered to be "distributed computing techniques", both
    because distributed systems *can* be partially available due to being
    physically distributed, and because they often *are* partially available
    due to their complexity.
 -- Techniques for acheiving security with insecure communications are
    "distributed techniques", since distributed systems often have insecure
    communications.
 -- Techniques for locating services or objects given some symbolic or
    indirect "name" are often "distributed" techniques, since the fluid
    configuration of distributed systems makes indirect references necessary.

If you agree that distributedness is a matter of degree rather than a
boolean attribute, then it is clearly hopeless to firmly discriminate
between a "distributed system" and a "multiprocessor", since a system can be
built at any level of "distributedness".

Going back to the A.I analogy, I think that these complaints about there not
"really being any commercial distributed systems" are similar to the
complaints from A.I. researchers about all these expert systems companies
not doing "real A.I.".  The problem with both these fields is that once a
technique really works well, it tends to be co-opted by ordinary programmers
trying to get work done.

A.I. started with heuristic search and logical inference algorithms.
Probably distributed systems started the day that someone ran a serial line
across the room to the next machine.  Basic applications of these techniques
are not very sexy research anymore, and thus not "real" A.I. or distributed
systems to academics.  

A conflicting pressure comes from the hype factor in industry.  "Buy our
system because it's distributed."  Even ignoring the vagueness of
"distributed", this is a bogus claim, since a rational consumer should make
their choice based on how well the system solves their problem, rather than
on how the system is implemented.

Rob

darrell@sdcsvax.UUCP (03/09/87)

I'm hardly an expert on these things, but I thought my two cents
might be interesting to some. Anyway, the moderator can refuse to
post this if I don't say anything substantive.

In article <2819@sdcsvax.UCSD.EDU>, steve@basser.oz (Stephen Russell) writes:
> One approach is to provide no protection at all, as in the V
> kernel. Since process id's are small integers, any process can
> send to any other process. This means that any validation of the
> right of the sender to communicate with the receiver must be done
> by the receiver process. This seems to have the advantage of
> simplifying the kernel, and makes IPC faster, and protection is
> optional, depending on the paranoia of the receiver or the
> criticality of the operation requested. However, what are the
> disadvantages?

One possible disadvantage is that although a "paranoid" receiver
(one which carefully screens the pids of its incoming messages) can
ensure that no unauthorized process will receive its particular
service, it may be simple to deluge such a receiver with unauthor-
ized messages, thereby bringing it (and maybe important parts of the
system) to its figurative knees.  The sys admin would have to keep
an eye out for this kind of maliciousness.

> Many other systems rely on secure kernels, and establish more
> tightly controlled links between processes (a virtual circuit
> approach). The disadvantage seems to be the overhead of creating
> using and destroying such links.

True, this can be a significant performance disadvantage.  I am of
the opinion that if the system is not required to be "absolutely"
secure, then security should be enforced on a process-by-process
basis, as needed, using permission checking (possibly based on
pids), encryption, and/or whatever is necessary to ensure the
desired level of security.  If the entire system is required to be
secure, it makes more sense to build the security into the kernel,
whether you implement the tighter coupling or not.  I would expect
all communicating processes to take a hit under this scheme, but
you pays your money, you takes your choice...

> The system I am currently developing protects process id's by
> adding a large (64 bit) random number to the normal pid. This
> `signature' makes it much more unlikely that you can send to a
> process without permission, in the same way as provided by the
> sparseness of Amoeba's port numbers.

I assume this is the receiving process's pid you're talking about
here.. and the sending process is told the correct "virtual pid"
upon establishing a connection with the receiver?  (I'm not quite
clear on this, ).

> On a related issue, consider a process acting as an intermediary
> between a root owned process and some other server. How does the
> intermediary gain root privileges for its requests to the other
> server? That is, how does the server verify that the request from
> the intermediary is on behalf of a privileged process?

The process would have to be given sufficient privilege.  I would
probably make it setuid root (or equivalent non-UNIX idiom if
available).  If a process has to do root-type things, it might as
well run that way - I don't see the sense in having it run as any
other user.

                              ...!decvax!decuac -
Phil Kos                                          \
The Johns Hopkins Hospital    ...!seismo!mimsy  - -> !aplcen!osiris!phil
Baltimore, MD                                     /
                              ...!allegra!mimsy -

darrell@sdcsvax.UUCP (03/11/87)

In article <2838@sdcsvax.UCSD.EDU> ron@BRL.ARPA writes:
>The S1 operating system is a (claimed) product of Multi Soulutions Inc.
>[...]
>What is especially distressing, is that there is a architecture research
>project developing a computer called "S1" which is much more creditable.

Yes indeed.  While I am only peripheraly involved in the development
of the S-1 computer, and while I am quite sure they are capable of
"defending" themselves, I would like to mention that *their* (or
should I say "our") computer is real, and works.  I have seen it run
with my own eyes.  They have *nothing* to do with Multi Solutions Inc.
S-1 computers run either Unix or Amber  -- a local OS.

--
"I was led through a beaded curtain, and across a floor so cunningly laid that,
no matter where you stood, it was always under your feet..."
--
Berry Kercheval -- berry@mordor.s1.gov -- {ucbvax!decwrl,siesmo}!mordor!berry
Lawrence Livermore National Laboratory, Special Studies Program ("O" division)

darrell@sdcsvax.UUCP (03/26/87)

[I expect this will spark some controversy. -DL]

  As I understand the history, Multics first inspired PRIMOS, which
was sometimes called "Multics in a Matchbox."  The technical talent
behind this inspiration left Prime to form Apollo when Prime's
management decided that workstations would never catch on.  Of course,
they continued with the basic good ideas at Apollo.

  I arrived at Prime shortly thereafter, and took awhile to realize
that I'd "missed the bus" technologically.  Primos is a shadow of
what it could have been with that heritage, and it sounds like Aegis
is also trying to hide the gold with lead-colored paint.
                   -- Bob Munck