[comp.os.misc] Test for distributedness

ast@cs.vu.nl (Andy Tanenbaum) (03/25/90)

In article <6548@becker.UUCP> bdb@becker.UUCP (Bruce Becker) writes:
>	More or less. All resources can be allocated
>	to a job, including cpu, memory, files, disk
>	drives, modems, etc. The difference is that
>	you can tell (until the new version this
>	year perhaps) where the resource lives, but
>	you treat it locally (except for diskette
>	drives - inserting a diskette halfway across
>	a continent is a bit inconvenient 8^). One can
>	get an executable from one machine, execute
>	it on another, and have the screen/kybd i/o
>	sent to your own machine - all simple to do,
>	built in to the system. The name space isn't
>	as global as the way you describe, in the current
>	version - I imagine better name space management
>	might come with the POSIX version due out this
>	spring...
The key issue in a distributed system is transparency.  Are the users
aware of where things are? I propose a kind of Turing test for distributedness:

You take two teams of users and give each person a terminal.
The people on team A all have terminals on a standard timesharing system.
The people on Team B all have terminals attached to a distributed system.
The object of the game is for each team to figure out which system it has.
The object of the game for the system designers is to make them fail.
If nobody can tell, you have a distributed system.  In a funny way, the old
CDC Cybers are very distributed.  They have one or more CPUs and a whole
bunch of PPUs that do lots of work.  The users to not rlogin to PPUs or
tell them to do anything.  The average user doesn't know they exist even.
It looks like one system.  However, the Cybers are not really distributed
because an additional requirement really should be that the processors need
not necessarily all be in the same room, which they must be with the Cybers.

It is legitimate for person 1 to start up a huge CPU-bound computation to
see if person 2 notices any performance degradation.  On a timesharing
system, person 2 would see this (Butler Lampson once called this a covert
channel.)  If the system being tested has the property that your jobs always
run on your "home" machine, so that the other team members who happen
to be logged in elsewhere don't notice anything, it is not distributed.
There must be a common pool of resources shared equally among all users,
just like in a timesharing system.  Remember, the goal of the system designers 
is to make a system where the team can't tell.  This implies (as a bare
minimum, that the *system* does process scheduling, e.g. assigning processes
to processor at random, using round-robin, using lowest load, or something
like that.  The user should have nothing to do with it.

I don't think Amoeba would fully pass at the moment, but I don't think
any other existing system would either.  Nevertheless, that is clearly the
goal we are aiming at, and I think we are not doing all that much worse than
everyone else.  Could QNX pass the test?  How about Mach?  Chorus?  Sprite?
Others?

Andy Tanenbaum (ast@cs.vu.nl)

gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) (03/26/90)

In article <6117@star.cs.vu.nl> ast@cs.vu.nl (Andy Tanenbaum) writes:

>The key issue in a distributed system is transparency.  Are the users
>aware of where things are? I propose a kind of Turing test for
>distributedness:
[...]
>It is legitimate for person 1 to start up a huge CPU-bound computation to
>see if person 2 notices any performance degradation.  On a timesharing
>system, person 2 would see this (Butler Lampson once called this a covert
>channel.)

Unfortunately, the terms of this test can easily be abused. If the
distributed system has more processors than processes, and the 2 users
start up "truely" CPU-bound programs (e.g. no paging, no I/O except to
print "I'm done after consuming 57 hours of cpu time, etc.), then the
distributed system should show no slowdown while the timesharing
system will see a large slowdown.

This abuse of the test depends on the observation that, under sane
operating systems (I exclude VM/CMS CPU partitions), a single
CPU-bound program running on an idle single-processor timesharing
system can consume essentially all of the CPU time. And that's all
there is, so multiple programs can't consume twice as much.

You might think that a "normal" program, which depends on several
resources, would provide a better test. In that case, you might expect
that several normal programs would be able to consume more total
resources than one normal program on both the timesharing and the
distributed system.

Of course, since this is ast's test, my CPU-bound program would solve
a problem which could be solved using integers between 0 and 255, such
as finding minimal Golomb rulers. None of this floating point stuff. 

:-)

Greg Lindahl
gl8f@virginia.edu                  I gave my lunch for space-sickness research.

feustel@well.sf.ca.us (David Alan Feustel) (03/26/90)

Regarding the Amoeba system: I don't remember whether you said you
would make the source code available. I will be interested if you do.
-- 
Phone:	 (home) 219-482-9631 
E-mail:	feustel@well.sf.ca.us	{ucbvax,apple,hplabs,pacbell}!well!feustel	
USMAIL: Dave Feustel, 1930 Curdes Ave, Fort Wayne, IN 46805

douglis@ALLSPICE.BERKELEY.EDU (Fred Douglis) (03/26/90)

Sprite would not pass Andy's test for distributedness, though it, too,
would come close.  Originally we had the notion that workstations
wouldn't even have hostnames -- a workstation would just be a display
onto a distributed time-sharing system.  The problem is that making
everything anonymous, including machines on people's desks, would make
it harder to distinguish between machines. The rest of the world
believes in separate hosts.  For example, what if someone from sprite
wanted to rlogin to a unix machine?  If rlogin just claimed to be from
a single internet address "sprite.Berkeley.EDU", then all packets
would have to be routed via that machine, even though my own
workstation is capable of running IP/TCP and communicating directly.
But "finger" on sprite lists all users on all sprite workstations,
including which workstation each user is most recently active on, and
mail goes out as "sprite".  There's a single shared file system.  The
real distinction between Sprite and Andy's distributed system is that
we name our workstations and can rlogin between them.

With respect to load sharing, we provide transparent remote execution
(users on one host can run "pmake" and other programs to use multiple
hosts in parallel) but we also provide performance guarantees to the
owner of a workstation.  Andy suggests that in a true distributed
system, all processors are available to all users, and someone on one
machine would notice a performance degradation when someone else on
another machine started a large program.  This is true, and to us it
suggests that we don't necessarily want a "true" distributed system,
only something close.  In Sprite, anyone can use idle hosts, but if a
user comes back to a host that's running someone else's processes, the
foreign processes get migrated back to the originator's machine.
People who are now using Sprite seem to uniformly appreciate the
ability to control their own machine.

I'd be interested in hearing how other people address this issue.  Do
you think workstation ownership is a reasonable notion, or is it
outdated?

============		===========================	==============
Fred Douglis		douglis@sprite.Berkeley.EDU	ucbvax!douglis
============		===========================	==============

blarson@dianne.usc.edu (bob larson) (03/26/90)

In article <9003260052.AA262747@sprite.Berkeley.EDU> douglis@ALLSPICE.BERKELEY.EDU (Fred Douglis) writes:
>Sprite would not pass Andy's test for distributedness, though it, too,
>would come close.
[...]
>For example, what if someone from sprite
>wanted to rlogin to a unix machine?  If rlogin just claimed to be from
>a single internet address "sprite.Berkeley.EDU", then all packets
>would have to be routed via that machine, even though my own
>workstation is capable of running IP/TCP and communicating directly.

Couldn't the sprite network be configured to look like a single machine
with multiple network interfaces on the same network?  Although unusual,
I think this would be legal.  Would redirects be sufficient to allow
packets to be sent to the host that (currently) wants them?

>With respect to load sharing, we provide transparent remote execution
>(users on one host can run "pmake" and other programs to use multiple
>hosts in parallel) but we also provide performance guarantees to the
>owner of a workstation.

Tops-20 also had a way to guarentee performance to users, so a minimum
guarentee of performance realy isn't a way to distinguish a distributed
system from a single processor one.  (I think this was normally used
for groups, but a group could consist of a single users.)  What's the
difference between a guarantee of 10% of a 10 mip machine to that of
100% of a 1 mip machine?
Bob Larson (blars)	blarson@usc.edu			usc!blarson
I do enjoy receiving money.  -- Richard Stallman
--**	To join Prime computer mailing list send mail to	**---
info-prime-request@ais1.usc.edu	  or	usc!ais1!info-prime-request

lws@comm.wang.com (Lyle Seaman) (04/03/90)

ast@cs.vu.nl (Andy Tanenbaum) writes:
>There must be a common pool of resources shared equally among all users,
>just like in a timesharing system.  Remember, the goal of the system designers 

I'm not sure this is an ideal goal of a distributed system.  The concept
of priorities has been around for a long time from the centralized system
approach, and I think a reasonable approach to these concepts would be
something like what Sprite does.  The distributed priority would then be
the composition of the priorities on the individual machines.  This would 
allow me to have at a minimum, the resources available on my own machine.
If someone in my environment is wasteful of resources, there will be a limit
to that person's ability to impair my access to resources.  This is one
of the central arguments in the ongoing "single-user vs. shared" debate.
One of the nice things about using many small systems instead of one
large system is that I don't have to share my machine with any of a 
small group of people who never delete old files...  And I don't have to
share with some of the other people, who (instead of keeping track of
what is where) grep the whole directory hierarchy several times a day.
Or with those who recompile every piece of source code, instead of 
using even a dumb make...

But I see your point about the test.

-- 
Lyle                      Wang             lws@comm.wang.com
508 967 2322         Lowell, MA, USA       uunet!comm.wang.com!lws