[comp.sys.isis] ISIS V1.3 finally ready to go

ken@gvax.cs.cornell.edu (Ken Birman) (09/30/89)

We have completed an exhaustive (exhausting!) checkout of ISIS V1.3
on the following systems:
	SUN 3,4's on OS 3.5, 4.0.1 (**NOT 4.0.3!)
	DEC VAX and 3100 systems under Ultrix, 4.3bsd, Mt.Xinu 
	MIPS version of same
	HP 300 and HP 800 systems under HPUX
	Apollo native UNIX 10.1 (see notes below)
	IBM PC/RT under AIX (recent port)

It compiles both under GCC and normal C, and can also be used
from C++, FORTRAN, Common LISP (Lucid, Allegro).  A C-Prolog
interface is in use at Cornell but not included on the tar image.

ISIS V1.3 is available by anonymous FTP from Cornell or can be copied
from  UUNET.  We will also be happy to copy it onto a tape for you, for
a handling fee of $20, or to make a tape for you (contact Helene Croft
for details).

This version of the system seems to work extremely well and I
think it would be advisable for ISIS users to switch over to it now.
Among other things, we have been running tests in which we bring
large numbers of sites up and down while running very active
selftest programs that change group membership while broadcasting
heavily, and so the system seems robust enough to survive this.
Under V1.2, this used to provoke various problems...

Something to be aware of:

**  ISIS V1.3 uses a different message "header" format than ISIS V1.2
**  This means that old ISIS programs will need to be relinked or
**  will print a message about "msg_reconstruct inconsistency" and
**  will fail to connect to ISIS.  

**  This also means that old ISIS "log" files can't be read without
**  being put into the new format.  We can provide a utility for this
**  if it represents a problem for you.  The current scheme is such
**  that ISIS will silently ignore files containing messages created
**  under ISIS V1.2.

**  We made this change to increase the limit on the number of fields in
**  a message from 255 to a "very large number".  Sorry if it causes
**  you any inconvenience.

Other changes relative to ISIS V1.2 are as follows:

        o You can now use ANSI standard C compilers and "gcc".
	  An advantage to using GCC is that it allows both the -g and -O
          options to be specified at once (a disadvantage is that the
          object files get very big when you do this).
	o I implemented the extension to "pg_leave" mentioned in a prior
	  message; namely, that once pg_leave is called, a process will
	  no longer EVER receive messages to the group.  It used to be
	  the case that leaving took a significant amount of time during
	  which messages could still arrive.
	o protos supports a startup option "-fn", i.e. (in isis.rc)
		../bin/protos -f10 -d#.logdir
	  This causes the failure detector to detect failures more
	  aggressively -- after about 2*10 seconds in this case, rather
	  than after 2*60 in the normal case.  The factor of 2 enters
	  because ISIS only says hello once every <n> seconds, and then
	  will detect a failure if no reply is received after <n> more
	  seconds.  With <t> multiple failures, the delay could be as much
	  as <t*n*2> seconds, but would usually be faster than this.
	o The BYPASS stuff now works well enough for an ALPHA release.
          Note that this also means you can add "new" broadcast protocols
	  to ISIS (see the V1.2 or V1.3 ISIS manual for details) .  At 
	  Cornell, we are adding an ethernet multicast protocol, for example.

Other ISIS "news":

	o We have submitted ISIS to the Open Software Foundation under its
	  recent RFT for a distributed computing environment.  OSF won't
	  decide on the technologies that will be included in this until
	  late this fall, so if you would like to see a widely available
	  and supported version of ISIS, keep your fingers crossed.  I
	  have no idea exactly how they will make this decision or whether
          the recent OSF decision to go with MACH works for or against ISIS.

          Whether OSF officially licenses ISIS or not, we plan to continue 
          supporting it on all platforms, including the one that OSF produces.

	o ISIS V1.3 contains a directory called "meta1.0", in which the
	  sensor part of a beta version of the META sensor/actuator 
	  platform can be found.  Mark Wood and Keith Marzullo are adding
	  actuators and so forth, and by ISIS V2.0 a more complete release
	  is planned.  Meanwhile, documentation for the meta1.0 code is
          located in doc/fd.tex, and (compressed "dvi") doc/fd.dvi.Z.

	o We are thinking seriously about a commercial version of ISIS --
	  extended in various ways and supported by our company.  I would
	  be very interested to know if your group might be willing to buy		  such a system.  Our pricing would probably emulate the database
	  systems.  IDS will also offer a number of ISIS based products in
	  mid 1990.  Meanwhile, we would continue to have a public version
	  of ISIS, but might stop making such an effort to keep ISIS running
	  on every version of every system.

Good luck with ISIS V1.3 and META 1.0...

The remainder of this note gives details for readers who might
want to use the BYPASS feature experimentally, or who need to run on
Apollo systems.  

[B] Regarding the BYPASS feature, we have been working on this over the past
[B] week and it now runs more solidly.    BYPASS is a compile time option when
[B] building clib; if you want it, you specifiy MCHDEPCFLAGS=-DBYPASS and
[B] them "make clib".  For any given process group, either ALL THE MEMBERS should
[B] use the BYPASS code, or none should.  However, you can mix BYPASS using
[B] applications with non-BYPASS applications if you follow this rule.

[B] We consider this to be an "alpha" release of the BYPASS code.  Don't use it
[B] if bugs and problems will upset you... a more solid and more extensively
[B] tested version will be available in ISIS V2.0 later this fall.  At this
[B] time, it seems not to crash, but sometimes delivers messages out of
[B] order.  The case we are seeing with is one where a process is joining
[B] or leaving while broadcasts are occurring, and some broadcasts get
[B] delivered after the view changes that should have been delivered before it.
[B] The problem arises out of the structure of our state transfer code and
[B] may take a week to fix -- we didn't want to hold V1.3 up waiting for it.

[A] Next, a note to Apollo users.  I was unable (even with some help
[A] from Apollo) to fix the hangs noted in a previous message.  Instead,
[A] a work-around was introduced.

[A] In brief, fork, exec and exit all "hang" when called from programs using
[A] lightweight processes in Apollo, which essentially wants all the processes
[A] to exit (exxcept the main one) before such calls are issued -- otherwise,
[A] the locks they hold basically cause a deadlock.  This is (in my opinion,
[A] not shared by Apollo) and Apollo bug.

[A] The patch for these problems is enabled by a compile time flag
[A] -DFORK_BROKEN, which I have added to the APOLLO system makefile.

[A] With the fix in place, ISIS uses a somewhat roundabout way of starting
[A] processes up, one that seems to work well and never hangs.  exit() is
[A] redefined as "kill(getpid(), 9)", which is the only thing that works.
[A] ISIS then restarts normally and runs normally.

[A] One warning is that the Apollo UNIX still has problems garbage collecting
[A] UDP ports that are no longer in use.  You may find that if you shut ISIS
[A] down, it won't restart with the same port numbers unless you reboot the
[A] machine.  Of course, you can also change the port numbers. Apollo
[A] agrees with me that this is a bug -- they say that 10.2 UNIX fixes this.

[A] The effect of all this is that most applications should find that ISIS on
[A] the Apollo UNIX is now perfectly useable.