[comp.sys.isis] New ISIS spool facility -- comments?

ken@gvax.cs.cornell.edu (Ken Birman) (07/27/89)

Below is a "man" page for a new "spooler" facility that we are adding
to ISIS.  The facility doubles as a long-haul interface for
communication between LAN's on which ISIS is running independently.

We would be very interested in suggestions/comments/questions on this.
Direct your remarks to me or to Messac Makpangou: mak@cs.cornell.edu,
or post them to comp.sys.isis if you think they might be generally
interesting.  Thanks...

.TH SPOOL 3  "1 February 1986" ISIS "ISIS LIBRARY FUNCTIONS"
.SH NAME
spool, spool_replay, spool_and_discard \-- ISIS spooling and long-haul functions.
.SH SYNOPSIS
.B 
#include "isis.h"
.PP
.B 
id = spool(sname, entry, fmt, args, SP_KEY, args, ... , 0);
.br
.B char *sname, *fmt;
.br
.B int entry;
.PP
.B 
id = spool_m(sname, entry, msg, SP_KEY, args, ... , 0);
.br
.B char *sname;
.br
.B message *msg;
.br
.B int entry;
.PP
.B
spseqn = spool_getseqn(msg)
.br
.B int spseqn;
.br
.B message *msg;
.PP
spool_replay(sname, SP_PAT, args, ..., 0);
.PP
int spool_in_replay;
.br
.B char *sname;
.PP
spool_and_discard(sname, \fIspool-request\fR... 0, \fIdiscard-pattern\fR, 0);
.PP
spool_m_and_discard(sname, \fIspool_m-request\fR... 0, \fIdiscard-pattern\fR, 0);
.br
.B char *sname;
.PP
spool_set_replay_pointer(sname, spseqn);
.PP
spool_play_through(sname, SP_OFF/SP_ON);
.PP
spool_cancel(id)
.PP
spool_inquire(id)
.PP
spool_wait(id)
.PP
spool_advise(sname, options, 0);
.NP
.SH DESCRIPTION
.NP
An ISIS spool is used for \fIasynchronous\fR communication with
a process group that is either known to be down, or where
the group may need to spool input for fault-tolerance reasons.
The interface is somewhat restricted by comparison to the remainder of
ISIS, and is intended to be used in a stylized manner
that explicitly recognizes the long delays that typically will occur
between when these types of messages are sent and when they are received.
These delays make it impractical to support, for example,  a ``spooled broadcast''
that would spool a request until the destination service becomes available
and then perform the broadcast and return whatever replies are received.
(The user who wishes to implement the equivalent functionality can do so using
a ``call-back'' approach.)
.NP
ISIS spools are also used for communication with remote networks.
In this mode, the network name (see below) is specified using the
spooler option SP_NETWORK.
The network name is taken from the network ``names'' file.
.NP
The spooler can be contrasted with the ISIS logging facility, which is
concerned with the recovery of individual processes (associated with
specific nodes in the network) into the state that they held at the time of
a failure.  Spooling is used when the destination is an entire process group,
and when the group may be offline at the time a message is sent.
By communicating through the intermediary of the spool, the sender
need not be concerned with whether or not the destination group is
operational.
The spooler is thus visible directly to the sender of a message.
Logging is used in a manner transparent to the caller, which would 
be coded to deal only with ``operational'' process groups.
.NP
The standard use for a spool in ISIS involves a collection of processes
that send messages to a destination process group via the spooler, without
waiting for replies.  During periods when the destination group is operational,
these messages are spooled and promptly forwarded,
in the order that they were spooled.
During periods when the destination group is down, messages are spooled but
not forwarded.
Upson recovery, a process group initiated replay of spooled messages.
When the replay terminates, new arriving messages and any messages that had not
previously been fully executed are delivered in spool order.
.NP
The spooler has no way to know when execution for a given spooled
message is completed, and this raises the issue of how it can distinguish
between \fIreply\fR of a message that has already been executed and
\fIfirst time delivery\fR of a message that may, in fact, have been
delivered previously but which has not yet been ``executed''.
This is done by associating a \fIspool pointer\fR with each spool,
which is controlled by the application in a call to spool_set_replay_pointer.
The value supplied in a call to the spool_set_replay_pointer routine
should be a spooler sequence number, obtained by calling spool_get_spseqn().
It is illegal to set the spool pointer back; it can only be advanced.
.NP
The spooler interface is as follows:
.NP
.I spool
puts a message in the \fIspool\fR for a named process group.  Normally, this
group would be one that is believed to not be operational.
The 
.I spool_m
interface is used when the message to be spooled has been
precomputed and is analogous to
calling bcast_l and specifying the `m' option.
.NP
On recovery, a group triggers spool replay either by invoking
.I spool_replay
or by specifying the 
.B PG_REPLAYSPOOL
argument to pg_join.
Notice that spool replay is not automatic in ISIS; it must
always be activated explicitly.
During replay, the flag \fIspool_in_replay\fR will be non-zero.
Only messages with spooler sequence numbers smaller than or equal to the current
spooler replay pointer will be replayed.
Moreover, replay allows messages to be replayed selectively, using a replay pattern.
For example, say that an application spools all types of messages, but that
only some messages are needed to recover after a failure.
A replay pattern can be specified that will inhibit replay of the ``irrelevant''
messages.  On the other hand, their presense in the spool may be useful in other
ways, for example to exactly recreate a scenario that has been causing a process to crash. 
After replay has finished, any additional spooled messages in the spool
or any new messages that are received by the spooler are ``played through''
immediately upon
reception, and this continues
so long as the process group remains operational.
Play through can be disabled by calling spool_play_through(), but
is activated by default.
Unlike messages being replayed, play-through
messages are NOT subject to any sort of pattern-matching process.
.NP
When spool_play_through() is used to disable play-through, the procedure must
be called \fIbefore\fP calling
spool_replay() (or pg_join).
Otherwise, some play-through may occur during the interval after the replay
completes and before your program is informed of it.
Play-through messages are not delivered until after isis_start_done() is
called in cases where replay is initiated during startup.
.NP
Programs must explicitly discard the contents of a spool.
This is done using 
.I
spool_discard.
.NP
Finally, the procedure 
.I
spool_and_discard
atomically discards some of the messages in a spool and prepends a new
message (e.g. a checkpoint) to the end of the spool.
(The caller can specify that the new message should be appended at the tail of the
of the spool if desired, but this is not the default).
.NP
The following additional spooler functions are not yet implemented.
.I spool_cancel(id)
provides a way to cancel a pending request.
.I spool_wait(id)
blocks until a specified request has been replayed.
.I spool_inquire(id)
returns 0 if the request is still spooled and 1 if it has been replayed.
.I spool_advise(sname, options, 0)
provides an interface with which the caller can create spools
having special characteristics (non-standard resilience, size limits, etc).
Currently, all spools have the same degree of resiliency to failures and
no size limit is enforced.
.NP
.SH DESCRIPTIONS
.I spool
puts a message in the \fIspool\fR for a named process group and delivers it
promptly (``plays it through'') if the process group is operational.
The 
.I sname
argument is the name under which the group will run when it restarts.
The 
.I entry 
argument tells what entry point this message should be delivered to upon replay.
The
.I fmt
is a format from which the message should be create; the arguments are
as for \fBmsg_put\fR.
.NP
A zero-terminated series of optional keywords describing this message follow. 
Each keyword in the series consists of a name \-- we define a basic set,
but you can extend it \-- and perhaps arguments associated with that name.
There are currently three sorts of keywords: numeric ones, which have an
integer value, timer keywords, which take a long integer argument of the
sort returned in the \fIseconds\fR (tv_sec) field of the timeval structure by
gettimeofday(2), and SP_KEYWORDS
which takes a null-terminated list of strings as its argument.
.NP
The type of broadcast used for actually transmitting to the group will
normally be \fIcbcast\fR.  This is certain to work correctly if all messages to the group
are sent via the spooler.  However, if a group receives some of its messages directly,
you may need to specify the broadcast
type.  This is done by including the key SP_FBCAST, SP_CBCAST, SP_ABCAST or SP_GBCAST,
with no argument.
.NP
The spooler currently does not predefine any numeric message keys.
Instead, the user is permitted to define up to 9 such keys.
This should be done using \fIdefine\fR and specifying values in the range 1-9
inclusive.
A numeric key should be immediately followed by its value in the call to spool.
.NP
There is currently only one timer key that the user would explicitly specify
in a call to spool: SP_EXPIRES.  The argument to SP_EXPIRES is an absolute time
at which this message ``expires''.
The argument should be computed by calling gettimeofday(&now) and then 
computing now.tv_sec+delay, where delay is a delay in seconds
between the time of the call and the time when the message expires.
An expired message will never be delivered to a client, but neither will it
actually be deleted from the spool  
until the next time that spool_discard call is called.
.NP
A spooled message can also have a list of ascii strings associated with it.
Such a list, null-terminated, should follow the keyword SP_KEYWORDS.
.NP
The following illustrates a very complex call to the spool routine as it might
be done from C; the corresponding interface is also supported from FORTRAN and LISP.
.NP
.nf
#define SP_SEQN       1
#define SP_EPOCH      2
.NP
       ....
       id = spool("dbserver", ADD_RECORD, "%s,%d", "Richard Nixon", 68,
         SP_SEQN, db_seqn++,
         SP_EPOCH, current_epoch,
         SP_EXPIRES, now.tv_sec+60*60*12, 
         SP_KEYWORDS, "add", 0,
         0);
.fi
.NP
.I spool
returns a spooled-message-id that can be used in subsequent queries concerning this
message or to cancel this message.
.NP
The above example uses a ``sequence'' number and
an ``epoch'' number, but the reader should be aware  that
these have no special meaning to the spooler.
On the other hand,
the spooler \fIdoes\fR assign all
spooled messages a sequence number on a per-spool basis, which is incremented
for each received message.
The spooler 
delivers messages sequentially in order of increasing sequence number,
except during replay when messages from the start of the spool up to and
including the current spool pointer are subject to a pattern and will not be
replayed unless the pattern matches.
The spooler sequence number for a message can be obtained by calling
.I spool_getseqn(mp).
This function returns 0 when applied to a message that was never spooled.
.NP
The destination group is considered to be on the local network
of the caller unless the
keyword SP_NETWORK is specified.  This keyword takes a single
argument, which should be a network name defined in the ``networks''
configuration file for your installation.
In this case, delivery will be on the indicated network.
The network name ``local'' can be used to obtain
a loop-back effect if desired for debugging.
.NP
.I spool_replay
triggers replay of a spool.
Replay can be selective; for example, one can replay just the
messages from a particular sender or just the messages with spooler sequence
numbers larger than a specified value.
A pattern is specified very much as the set of keys for a message,
but where a key typically specifies a value, a replay pattern
typically specifies a rule that the value must satisfy for the message
to be replayed.  If several replay constraints (patterns) are given,
all must be satisfied for a given message to be replayed.
.NP
In the case of a numeric key, a low and high bound are given (either can be
SP_INFINITY, however).
Only messages that included the designated key and have a value greater
than or equal to the low bound and less than or equal to the high bound.
For example, spool_replay(sname, SP_SEQN, 55, SP_INFINITY, 0); replays
all messages in the spool \fIsname\fR with the user-defined numeric
key SP_SEQN in the message and having a value of 55 or greater, inclusive.
.NP
The 
spooler's internal sequence number can be treated as a numeric pattern using the predefined
keyword SP_SPSEQN.  Note, however, that
replay will only be applied to messages between the start of the spool and the
current spool pointer.
.NP
The time at which a message was spooled can be used as part of a pattern.
SP_ATIME places bounds on this time in absolute time units.
SP_RTIME places bounds on this time relative to the time at which
spool_replay was called.
.NP
The process that sent or spooled a message can also be part of the pattern.
SP_SENDER takes a single address which is the address of the sender whose
messages are to be replayed.
SP_SPOOLER works the same way, but takes the address of the process that invoked
spool.  Note that unless spool_m is being used, the sender is by definition the
same process as the spooler.  In the case of spool_m, however, the
message could be one that was received from some other source.
.NP
If string keywords were specified, the pattern SP_KEYWORDS can be used to
enforce a 1-1 exact match.  The number of strings and their values must
match for the message to be replayed.
.NP
To replay all the messages in a spool, one would call spool_replay(0).
.NP
After a spool_replay is done, the spooler plays through any
messages that are received and that match the ``current'' replay
pattern, with the single exception of any message received from a
spool_and_discard request (in this case, the spooled message normally is a checkpoint,
and hence playing it through would cause confusion).
It will also spool these messages upon reception.
This play-through behavior continues 
so long as the destination process group remains accessible, or
until spool_play_through is called to inhibit further playthrough.
.NP
.I spool_discard
is called just like 
.I spool_replay.
It deletes those spooled messages that match the specified pattern,
retaining in the spool any messages that DO NOT match the pattern.
It is important to specify the spooler sequence number up to which messages
should be discarded, as it would be an error to discard messages that have
not yet been played through.  Although such messages would still be
played through, the effect would be to delete them from the spool
``prematurely'' -- before the application has actually received and executed
them.
.NP
.I spool_and_discard
combines a call to
.I spool
and a call to
.I spool_discard
into one atomic operation.
In the arguments associated with the message to be spooled one may specify
SP_APPEND, in which case the new message will be stored at the end of the spool.
Otherwise, the new message is prepended to the spool, which the the appropriate
place to store a checkpoint.
.NP
For example, say that a checkpoint is made after receiving a spooled message
for which spooler_getseqn(mp) returns 66.
A good way to do this would be to call 
.CE
spool_m_and_discard(spname, chpt-msg, SP_SPSEQN, SP_INFINITY, 66, 0);
This modifies the spool by prepending the checkpoint message to it and
deleting the messages that the checkpoint ``covers'', while retaining all others.
.NP
.SH SET UP
.PP
To set up the spooler, you should add a line to the isis.rc files
on the machines where you wish to have spool files created.
This line should run ../bin/spooler under the name <isis-spooler>.
The spooler takes a single, optional argument; if specified, this
should be a ``networks'' configuration file in the format
described in the next section.
Spools are currently replicated over the full set of spoolers, but in
future versions of the program will be replicated only to the extent
needed for fault-tolerance and availability purposes.
.NP
.SH LONG-HAUL COMMUNICATION
.NP
A \fInetwork\fR is a set of sites within which ISIS.
When an applications spans multiple networks, the recommended way
to do inter-net communication is through the 
long-haul spooler facility.
Each network is defined by a name, assigned in a configuration file,
and by a set of sites on which its spoolers run, and which can
be contacted to establish an inter-network link.
Network names look like a group names in all respects.
Each spooler site is defined
by an internet host name/address, plus a tcp port number
used to accept connection requests from remote networks.
.NP
The \fInetwork configuration file\fR is used to communicate this information
to the spooler program.
It is specified by
by supplying the spooler command with the option
.I -l network-config-file.
.NP
A network configuation file is formatted as follows.
The first line of the file
contains a default tcp port number for contacting spooler sites;
this must not be the same as any port number already in use by
your applications, or already defined in your ISIS sites file.
Each subsequent line of the file
describes one specific network.
Such a line is composed of
the described network's name, and a null-terminating
list of hosts  descriptors.
Each host is specified either by its internet host name (in which case
this name is prefixed by ``N:''), or by its internet address (in which case
this address is prefixed by ``A:'').
A host's descriptor contains also the tcp port number. A host's name (or address)
is separated from the reserved port number by the slash (`/') character.
If the port number is zero, the long-haul package uses the
default value.
.NP
.SH DESCRIPTION
The long-haul package
establishes connections between the local and remote networks.
For each remote network described in the networks configuration file,
one of the running hosts is designated as the manager of the connection
with this partner.
Each designated manager tries to connect to one of the remote
network's host. The designated manager tries successively different hosts
until one accepts the connection request.
.NP
Each long-haul process may be in charge of more than one long-haul connections.
The long-haul package ensures automatic reconnection in the case
of failure of an existing connection.
It also preserves the state of a failed connection and makes it available
to the new manager.
This allows the 
.I at most once delivery semantic
in presence of connections failures and hosts crashes.
.NP
To 
trigger long-haul communication,
you should specify the remote network name using
the SP_NETWORK option in a call to the
.I spool
or
.I spool_m
procedures.
The messages will be transmitted the next time the spooler
makes contact with the specified network.
.NP
.SH CONSTANTS
.NP
The maximum networks you can define is MAXNETS,
and the maximun sites in a network is MAXSITES.
.SH BUGS
.NP
The current version of the spooler loses spooled data if the
spooler itself experiences a total failure.  (On recovery,
existing spool files are deleted).
This will be corrected in the near future.
.SH "SEE ALSO"
isis_logentry(3), gettimeofday(2),
ISIS(3)

ken@gvax.cs.cornell.edu (Ken Birman) (07/27/89)

>> From jlevy@arisia.xerox.com Thu Jul 27 11:51:01 1989
>> ....

>> Have I understood correctly that this in effect makes it possible
>> for two independent networks of ISIS-connected sites to
>> communicate? This means that in effect the limitation on the
>> number of sites is removed?

Yes, the new spooler/long-haul facility is intended to help
you link "clusters" of ISIS sites over long-haul links.  The idea
is that if you are running ISIS at Cornell and ISIS at Xerox,
you might want to build applications that span the two LAN's.
ISIS doesn't normally allow this, but the spool mechanism offers
a way to cobble something together explicity.

Basically, you spool messages for the remote LAN (currently
you need to use the SP_NETWORK option, but eventually we will move
to a new group naming convension like
	/cardiology/ccu/bed11/alarm@columbia.new-york
where "new-york" would be taken as the network name).  ISIS
needs to know how to talk to "new-york", and you tell it by
means of the spooler's interlan configuration file - which
lists some machine in new-york and ports that they use to
do this type of communication.  Whenever it gets the change, 
ISIS makes a connection and forwards the spool using a scheme that
gives at most once delivery semantics.  As long as the line is up,
overhead is low -- your message goes to the spooler, out the line,
into the spooler remotely, and is re-broadcast on arrival.

Because the scheme is completely asynchronous, you would have to
layer any sort of reply mechanism on top of this as part of your
application.  There were technical obstacles to doing something
reasonable with replies as part of our mechanism -- basically,
failure cases that made it very hard to recover lost replies.

We recognize that the initial facility is pretty primitive.
Messac Makpangou is working on this and would be interested in
feedback or suggestions.  Contact him as mak@cs.cornell.edu.
He is working on extending the long-haul code, specifically, while
I am responsible for the spooling/replay mechanisms.

Ken Birman

wunder@hp-ses.SDE.HP.COM (Walter Underwood) (07/28/89)

Well, for starters, don't use .NP (or .LP) for paragraphs.  Those
macros don't exist in System V man macros.  Use .PP.

Also, only some of the lines in the SYNOPSIS section are bold, and many
don't have the args declared.  Looks kinda funny.  

wunder