ken@gvax.cs.cornell.edu (Ken Birman) (12/15/89)
Here at SAIC in San Diego, we are using ISIS to build a worldwide monitoring system that locates and identifies seismic events. The system is intended to determine if particular events could be the results of nuclear weapons testing. Data come in from a variety of sensors and several seismic arrays on a continuous basis. One of our design goals is to provide for continuous operation of the system without requiring a large full-time staff. We were attracted to ISIS because of its strong support for fault tolerance and redundant computation. It has also proved flexible enough to allow us to bring together pre-existing pieces of code in a natural fashion. However, a couple of minor nits: 1) Executable size Currently, the size of an executable after linking in ISIS is quite large.. ~300K. I suspect the libraries might include some unnecessary code. 2) Program exit if ISIS is not running Some of our programs can run stand alone. It would be nice if they could attempt to connect to ISIS and get an error code back when it is not running so that they could take appropriate action. Another possibility would be some sort of "autostart" feature that would start up ISIS when a connection was requested. Thanks for the support. +-----------------------------------------------------------------------------+ | Jerry Jackson UUCP: seismo!esosun!jackson | | Geophysics Division, MS/12 ARPA: jackson@esosun.css.gov | | SAIC SOUND: (619)458-4924 | | 10210 Campus Point Drive | | San Diego, CA 92121 | +-----------------------------------------------------------------------------+
ken@gvax.cs.cornell.edu (Ken Birman) (12/15/89)
In article <35194@cornell.UUCP> jackson@gymer.css.gov (Jerry Jackson) writes: >... a couple of minor nits: >1) Executable size As mentioned earlier, Jerry and I ended up sitting down and working out the causes for this; the result is that I rebuilt the ISIS V2.0 alpha release here at Cornell in a way that corrects this problem. >2) Program exit if ISIS is not running > >Some of our programs can run stand alone. It would be nice if they >could attempt to connect to ISIS and get an error code back when it >is not running so that they could take appropriate action. Another >possibility would be some sort of "autostart" feature that would >start up ISIS when a connection was requested. This is a good idea. I am going to extend the isis_init interface as follows: I'll add a new call isis_init_l(client_port, flags) where the flags are initially: ISIS_PANIC /* Panic if connect fails; else returns -1 */ ISIS_AUTOSTART /* Auto restart ISIS if not already running */ The current isis_init becomes a synonym for isis_init_l(port, ISIS_PANIC) The autorestart scheme is that if ISIS is not able to connect, it will try running "/bin/csh" on the file called "/usr/bin/startisis"; if this fails or if a second attempt to connect fails after a delay of 90 seconds, the system will panic/return -1 depending on whether ISIS_PANIC is specified. Thanks for the suggestion... By the way, I am adding a few other extensions along these lines: 1) ISIS_MONITOR_ENTER(count,cond) ISIS_MONITOR_EXIT(count,cond) For lightweight tasks that want a monitor-style of critical section Arguments are an integer counter and a condition variable 2) THREAD_LEAVE_ISIS() and THREAD_REENTER_ISIS() Lets a lightweight thread run concurrently with the ISIS system and later re-enter it. Useful for systems with real parallelism 3) cc_terminate_l(dest, <message format and args>) Currently, cc_terminate sends a message just to the cohorts of a ccord-cohort computation and "reply" in such a computation sends a message to the sender of the original request and to the cohorts. cc_terminate_l sends a message to <dest> and to the cohorts, atomically. This is useful if the coordinator is supposed to send some message exactly once, and it isn't a reply. 4) group addresses now are preserved even when you leave the group 5) In the BYPASS stuff, you can now get at the protocol at several levels, giving you: a raw interface to a multicast transport, with little reliability an atomic "fbcast" interface (fifo from sender) a cbcast interface an abcast interface (not yet implemented) the scheme is such that you pay an incremental cost as you move up the hierarchy. The raw interface is fastest, fbcast is about 2ms slower (constant overhead w.r.t. the raw protocol regardless of # dests), abcast slowest of all. Figures will be out on this shortly... 6) More flexible group addressing (multicast to members/members+clients/clients) 7) New message formats and types, faster and smaller mlib 8) a fast pg_join for use in special cases (added for Deceit file system) 9) A way to monitor a group for total failure even if you aren't a member 10) a version of pg_lookup that caches results and uses (9) to run very fast 11) automatic switch to MACH IPC for local communication in MACH settings I think this covers the whole thing. We will be alpha testing this version of ISIS early in January, Beta testing by late January, and hope for a release around March 1. Ken