[mod.os] Why no "real" distributed systems

darrell@sdcsvax.UUCP (02/20/87)
In article <2733@sdcsvax.UCSD.EDU>, Joe Pato writes:
> Different people have different definitions for what defines a distributed 
> system.  The primary criteria for establishing a distributed system are:
>     1) transparency of file access / naming
>     2) transparency of process execution
>     3) transparency of protection / system accounting
> 
> The key concept is the transparency of operations across the machines that
> consitute the distributed system.  

/* WARNING -- dogma and personal opinion ahead -- WARNING */

I disagree.  Transparency is a real convenience, and may have a lot to
do with the commercial acceptance of distributed systems, but has
little, if anything, to do with "being a distributed system."

What, then, IS the key concept (or concepts)?  Loose coupling.

Along with "loose coupling" goes things like:  Autonomy.  Division of
labor (processing) and of authority (administration).  Exploitation of
locality.  Notice that those last two are pretty much at odds with
transparency.

Friedberg's Conjecture:
+====================================================================+
| The only time to use a distributed system is when the distinctions |
| between sites are more valuable than the similarities between them.|
+====================================================================+

Reasonable distinctions include:
  What                 (Why)
  ====                 =====
  Failure Independence (different power grids, disaster zones)
  Specialized Hardware (peripherals, accelerators, number-crunchers)
  Specialized Software (per-machine licenses, standalone servers, experiments)
  Local Autonomy       (privacy, paranoia)
  Geographic Location  (communication latency, noise)
  
Corollaries:
  If you want lots of processing, all alike,
    use a multiprocessor.
  If you want lots of data entry, all consistent,
    use a central site with long lines.
  If you want lots of files, all consistent,
    use a file server.
  Most systems built on a local area network aren't "real"
    distributed systems.
  Homogenous distributed systems makes sense only if you are
    willing to tolerate enough inconsistency between sites to
    allow all sites to run at full function while the others are
    unavailable.  (see LOCUS as an example)

Let me give you an example of a real, live distributed computation.
The Arpanet routing algorithm runs in sites scattered across North
America and various spots elsewhere around the globe, solving graph
shortest path problems in psuedo-real-time.  Each site communicates
directly only with its immediate neighbors.  There is no remote
access other than the communication links that are being managed.
No remote file access, no remote execution.  Administration and
configuration is managed from a central site in Boston, which does
NOT have transparent anything.  They explicitly want to know what
physical location they are dealing with (for remote program load,
remote diagnostics, etc) at all times.

Stu Friedberg  {seismo,allegra}!rochester!stuart  stuart@cs.rochester.edu