[comp.sys.isis] Questions about ISIS

ken@cs.cornell.edu (Ken Birman) (04/16/91)

> From: sc268116@seas.gwu.edu (Kavosh Soltani)
> Subject: ISIS Clarification
> Date: Tue, 16 Apr 91 7:45:04 EDT

> Hi,

> I have sent a copy of the following to comp.sys.isis group; I am not certain 
> if you are more equipped to answer some of the questions.

For some reason, the message bounced and I am posting it on your
behalf...

> Thank you for the help with installation!  I have ran the system on 3 SUN 3/80
> 2 SUN 3/60 and 2 HP 9000 conurrently.  It works fine!

>      I am preparing a presentation for our advanced operating system course
> at George Washington University.  We are evaluating distributed system kits
> available on the workstations and reccomending one for future use in the
> institution.  I am evaluating ISIS, but have a few questions I could not
> find the answer to in the manual.  Could you take a few minutes and fill in
> the answers, if possible?  Your help is greatly appreciated....



> PS.  We are interested very much in underlying algorithms (not in detail!)
>      but it will take more time than I presently have to go through all of
>      the source code.  You do not need go in too much detail, for instance
>      you could say that you use a UNIX semaphore to implement such and so..

We don't use UNIX semaphores at all.  In fact, UNIX doesn't have
semaphores.  Pretty much all ISIS uses from UNIX is the ability
to send UDP messages and to find out what time it is!

> - What does ISIS mean? (What's the reason behind the name?)

The name is a reference to an Egyptian myth in which the Goddess Isis
restored Osiris to health after he was torn to pieces by Seth in a battle.
I've posted the whole story in the past -- if you skim old postings
to this group (on ftp.cs.cornell.edu in pub/comp.sys.isis) you should
be able to find the "whole story".

Basically, we liked the image.  Prof. Amr El Abbadi (U.C. Santa Barbara)
suggested the name and gets full credit for a very appropriate choice!

> - Are the developers of ISIS the same names appearing on the cover page of
> V 2.0 of the manual?

Many people have contributed but most of the code in the core of the system
was originally written by myself and Tommy Joseph.  By now, Tommy has been
at SUN for some time and a lot of his code has been rewritten, so I probably
wrote more than 80% of the actual lines of code in the current V3.0 release.
The rest was written by Robert Cooper, Brad Glade, Frank Schmuck, 
Alex Siegel, Mark Steiglitz, Pat Stephenson, Mark Wood, and others.

> - Is the package owned by Cornell?

An early version of ISIS was public domain; not much of the code
remains, but any lines that survive unchanged from ISIS V1.2 are
still "public domain".  V1.3 was the first to include a copyright
notice.  Version 2.1 of ISIS is the current distribution and is about 75%
different from V1.2 (a rough estimate).  For example, all of the
bypass code is new and the entire message library was rewritten;
the gbcast protocol was rewritten, etc.

Copyright for ISIS V2.1 is jointly held by the authors; Cornell has
relinquished all rights and the US Government requires only that we
"disclose" our work, which we obviously do.  The new V3.0 release
contains a lot of proprietary code that was developed by ISIS Distributed
Systems and is not available except under license to the company.
So, IDS "owns" V3.0, or at least the changes and extensions it made
to turn V2.1 into V3.0.  

These changes are very extensive.

The ISIS Group at Cornell has been granting all reasonable requests to
use ISIS V2.1 in any manner at all, commercial, public, or private.
However, we reserve the right to review any requests that do not
fall strictly under the terms of the copyright notice in the system
source.  There are no fees for ISIS V2.1; fees for V3.0 are (we
think) reasonable.

> - Does the scheduler take over after isis_start_done?

Actually, the ISIS schedule takes over whenever all other tasks are
blocked.  It also runs when you call isis_accept_events.  The code
is in clib in cl_isis.c: run_tasks()

> - Why are these tasks lightweight? (how is overhead small and does UNIX
> provide features for saving the state of a task before switching?)

They share an address space and basically consist of a stack and a set
of registers.  UNIX only knows about the "process" and even if you
have several tasks running, UNIX will think there is only one program
active and won't distinguish between them.  The tasks switch using
a co-routine mechanism.  UNIX does provide the feature for saving state:
we usually do this with setjmp/longjmp, which does not involve any
system calls on most machines.  So, context switching just involves
saving and loading the registers -- roughtly the cost of a subroutine 
call.

> - If our system does pre-emptive processing, and isis grants isis_mutex to
> currently active task, does it mean that, if several instanances of an isis
> application (say teller) is run, resources are being wasted?

I don't understand this question.

> - In short, how can you write tasks in UNIX?  (What I mean is, does UNIX
> provide system calls so you can tell it which area to use as stack?)

We do this by using malloc to allocate a big stack area, computing
an initial stack frame, using setjmp to figure out the caller's context,
changing the SP and other registers to point to the new frame, then
calling longjmp to jump into the new frame and immediately call the
user's top-level routine.

If there is an existing threads package, like cthreads, we just
map our calls into their calls.

Our scheme is such that an active task never "exits".  Instead, if the
user's code returns, we put the task on an idle queue to be recycled
on the next request (saves an extra free/malloc).

See cl_task.c for details.  It is really pretty simple code.

> - You mentioned semaphores in introdution to chapter 11 of the manual but
> no further references in the documentation.  Do you support them?

ISIS supports a "token" tool, which is described in detail in the end
of Chapter 11.  The token tool is basically a distributed binary semaphore.
We don't support general semaphores, but this would make a nice exercise
if you are trying to get the hang of using ISIS.

> - Using bcast, what is the principle of the underlying algorithm the enforces
> virtual synchrony?
> - How is atomicity guarantee implemented (all or nothing delivery)?

You will need to read the papers.    The versions of the protocols in
the ISIS protocols servers correspond pretty closely to the paper on
"Reliable communication in the presense of failures" (ACM TOCS 1987).
The more recent work comes from Pat Stephenson's thesis and is based
on two papers, on by Aleta Ricciardi on Failure detection using process
groups and one by Pat, myself and Andre Schiper that will appear in ACM
TOCS later this year (Lightweight Process Groups and Group Multicast).

In the later paper the "principle" is an an extension of Lamport's
causal clocks, something called a "vector clock" or "vector timestamp".
The atomicity guarantee is implemented using a flushing technique.

> - What is the exact format of messages? (Are they linked lists or packed
> lists and how do you maintain pointers to each field, name. and type?)

More or less: lists stored within bigger blocks of memory, but no "links".
They look like packed symbol tables: each field has a name, type, length
and is followed by the data or, if the data is out of line (%*C, for
example), by a pointer to the body.  Because messages can share data
blocks and can contain other messages, the implementation is somewhat
more complex than this sounds.

> - We know UNIX calls can block a program.  How can we avoid blocking in say
> I/O commands?

You need to call isis_select and in this way avoid doing any IO call that
might block.

> - 2-Phase flushing of transactions seem to be interesting.  But how do you
> guarantee the data is actually saved? (That is following the second phase
> one of the processes involved has not crashed before saving.)

We don't.  YOU force the data when the prepare-to-commit routine is called.

But, note that the transaction tool is not really a "common" interface
to ISIS.  Performance is slow compared to non-transactional data
management, because of the log flush the tool does.  At any rate,
the prepare routine is supposed to do the flush and then say "yes"
if the data is secure and "no" if not.  It is probably a good idea
to flush ahead of time, during idle periods, so that the delay at this
point will be as short as possible.

The interface we used is based on something that X/Open is developing,
called XA.  We intend to support the XA interfaces, in fact, in
a future release of ISIS.

Actually, I only know of one or two groups that have even played
with ISIS transactions, and although they work, my guess is that IDS will
have to develop a commercial strength version before many people use
ISIS this way, e.g. to implement replicated databases.  
-- 
Kenneth P. Birman                              E-mail:  ken@cs.cornell.edu
4105 Upson Hall, Dept. of Computer Science     TEL:     607 255-9199 (office)
Cornell University Ithaca, NY 14853 (USA)      FAX:     607 255-4428