[comp.sys.isis] lazy replication

ken@gvax.cs.cornell.edu (Ken Birman) (03/15/91)

> From: jyl@Eng.Sun.COM (Jacob Levy)
> Message-Id: <9103131904.AA01088@noam.Eng.Sun.COM>
> To: ken@cs.cornell.edu
> Subject: Interesting Paper
> Status: R

> Barbara Liskov's paper about Lazy Replication in the 1990 PODC
> proceedings seems to contain protocols which are very similar
> to ISIS. They compare their approach to ISIS and seem to claim
> that their one is more efficient for both the
> normal case (no failures) and for failure detection, and also
> that it is more robust for network partition.

> Can you post a comparison to comp.sys.isis?

A comparison between our work with the lazy replication protocols appears
in the most recent TR on our version ("Lightweight Causal and Atomic Group
Multicast").

Basically, their paper compared the Lazy Replication scheme with
the old ISIS protocols.  Our new protocols are at least as efficient
as Lazy Replication in all of these aspects, and actually include some
performance enhancements that the LR scheme didn't have (at the time they
wrote the PODC paper, at any rate).  For example, our new paper includes
faster group view management protocols and a VT compression scheme that
appear to be big wins.  As it happens, the measured latency of our
OLD protocols was comparable to the LR figures in the PODC paper (our cbcast
protocol, used asynchronously, runs at the same speed as the protocol they
evaluate in section 3.6.1 of their paper).   Our recent paper reports 
the performance of the NEW cbcast protocol, which runs about 5 times
faster than either our old protocols or the LR scheme.

Regarding the robustness issue, LR uses a stable storage/logging scheme,
while ISIS only uses logging if you explicitly enable it.  When you do
use logging in ISIS (or long-haul spooling) you can get the same 
level of availability out of our system as you can from theirs.  We
prefer not to log in the normal case because we prefer to reduce the
overhead as much as possible for the average cbcast -- this is part of the
reason our stuff is faster.

In more general terms, it is worth noting that the basic LR protocol and
the ISIS "lightweight causal cbcast" protocol are really very similar in
the single group cases.  The main differences, unless you focus on detail,
are that we support large numbers of overlapped groups, and that our 
handling of client/server structures is somewhat more general then theirs.
We do this because we need both features in ISIS; one implication is that
our scheme scales well, even to very large networks.  Their scheme could
be extended the same way, of course -- perhaps even using our methods.

I can provide more details on the fine points if people request them.

Ken Birman

----------------------------------
Kenneth P. Birman                              E-mail:  ken@cs.cornell.edu
4105 Upson Hall, Dept. of Computer Science     TEL:     607 255-9199 (office)

liuba@proton.lcs.mit.edu (Liuba Shrira) (03/26/91)

Ken Birman compares the performance of ISIS with that of the Lazy Replication
scheme developed by Ladin, Liskov and Shrira at MIT.  I would like to make a
couple of points about Ken's note.  First, the Lazy Replication paper that
appeared in PODC 90 does not contain any performance data.  The data appear in
MIT technical report MIT/LCS/TR-484 authored by Ladin, Liskov, Shrira, and
Ghemawat.  People interested in Lazy Replication and its performance can send
me a note and I will send a hard copy of the paper.  Alternatively, a
postscript version of the paper can be obtained via anonymous ftp from
mintaka.lcs.mit.edu (18.26.0.36) as pub/pm/lazyrep.PS.

The second point concerns the performance data themselves.  Our paper describes
an experiment designed to evaluate the cost of Lazy Replication: it compares
the response time and request processing capacity of a prototype replicated
server implemented using our replication scheme with that of comparable
nonreplicated server.  The point of our data is the comparison between the two
implementations rather than how fast the replicated implementation ran.

We used the Argus system as the implementation platform for the experiments, so
our system does not perform as well as it could.  For example, all client
interactions with the server occur within atomic transactions even though
transactions are not needed in our method.  We were pleasantly surprised,
therefore, to hear that our response times compare so favorably with those of
the highly engineered ISIS system!


Liuba Shrira
liuba@lcs.mit.edu

ken@CS.Cornell.EDU (Ken Birman) (03/27/91)

In article <1991Mar26.033004.25436@mintaka.lcs.mit.edu> liuba@proton.lcs.mit.edu (Liuba Shrira) writes:
>
>... First, the Lazy Replication paper that appeared in PODC 90 does not
>contain any performance data....

Actually, my comments were based on Sec. 3.6.1 of the PODC paper, which
did give some numbers.  

>... the highly engineered ISIS system!

Thanks...  but, it is funny to read this even as I type in a revision
to our TOCS paper explaining why ISIS multicast is so much slower than
Ameoba or X-kernel multicasts!  Both systems are losing heavily by
running over UNIX -- my guess is that Argus is only a minor problem.

The situation we see in ISIS -- I wonder if you run into this too -- is
that UNIX tends to throw away a lot of UDP messages if you present it
with lots of data at once.  For example, we have been looking at how
ISIS performance varies as you stream data to larger and larger process
groups.  What we find is that the dominent effect occurs when UNIX`
suddenly decides that there isn't enough mbuf memory left, and starts
to discard both outgoing messages, from the sender, and also incoming
ones (in this kind of test, mostly acknowledgement packets).  

Unfortunately, UNIX usually doesn't indicate that it is doing this, and
it never indicates when it discards an incoming packet, except to keep
a count in some internal, undocumented, kernel data structure.

Packet loss is a performance disaster, and to make matters worse, there
seems to be no way to figure out that UNIX is doing this.  You can
tell that it is happening -- running netstat -m, for example -- but at
runtime there is no UNIX system interface to indicate that UNIX is getting
overfilled with data.  So, you give it the data, it thanks you and 
discards it, and then you sit and wait to see if it went anywhere.  
Needless to say, this causes a lot of multi-second delays, waiting for
acks that don't make it, retransmitting packets, etc.

Our course, there are flow control heuristics that help, a little, but
my belief is that when we measure ISIS or LR or ARGUS over UNIX, we
are so far from what the hardware can do and running over such a complex
and quirky black box, that the situation is really pretty hopeless.

For ISIS, the plan is to rebuild our system over the X-kernel inside
Mach, or inside Chorus (a comparable environment), as close to the
hardware as possible.  This should give us direct access to information
like memory usage in the kernel and to hardware multicast, and will
make a huge difference to our system performance.

Is anyone aware of successful multicast mechanisms layered over UNIX?
I would be very interested to hear about other experiences...

-- Ken

-- 
Kenneth P. Birman                              E-mail:  ken@cs.cornell.edu
4105 Upson Hall, Dept. of Computer Science     TEL:     607 255-9199 (office)
Cornell University Ithaca, NY 14853 (USA)      FAX:     607 255-4428