[comp.sys.isis] A bypass mode bug and fix

ken@gvax.cs.cornell.edu (Ken Birman) (02/14/91)
The group at Los Alamos ran into a BYPASS problem that kills
performance, introducing 1 second delays into certain types of
BYPASS multicasts.

There is a fairly simple fix.  In cl_inter.c, add the following
code right before the procedure called "intercl_deliver"

void                <----- insert around line 1030
check_wantack()
  {
        register qnode *qp, *nqp;
        if(isis_state&ISIS_WANTACK)
        {
            for(qp = IOQS->qu_next; qp != IOQS; qp = nqp)
            {
                register ioq *ioqp = qp->qu_ioq;
                nqp = qp->qu_next;
                if(ioqp->io_wantack)
                {
                    ++ioqp->io_nacks;
                    intercl_xmit(ioqp, ackiclpkt);
                }
            }
            isis_state &= ~ISIS_WANTACK;
        }
  }

Then at the bottom of intercl_deliver look for the line that says

    else    	<------  Line 1148, before patching 
    {
	if(ioqp->io_wantack)
	    isis_state |= ISIS_WANTACK;
        ttid = isis_timeout_reschedule(ttid, 1000, intercl_sweep, 0, 0);
    }

Change this to:
    else
    {
	if(ioqp->io_wantack)
	{                                      <----- new "{"
	    isis_state |= ISIS_WANTACK;
	    isis_ondrain(check_wantack, 0);    <----- new code
	}                                      <----- new "}"
	ttid = isis_timeout_reschedule(ttid, 1000, intercl_sweep, 0, 0);
    }

The problem was that in some situations, ACK packets were not getting
sent until the intercl_sweep routine was called, and as you can see, this
is after a 1-second delay.  Normally, the delay didn't matter, but under
some circumstances this could cause the sender of a new multicast to
jam up before sending it.  The change causes the ACK to be sent after 1
second or if the recipient goes idle.  So, you can STILL see the 1-second
delays, but only if the recipient of a multicast sends no messages back to
the destination and stays busy for 1-second or so after receiving the message.
Presumably, in such a busy situation, communication costs are not the
bottleneck in any case.

The core issue here is related to what we now call message stability.
We'll be putting a revised copy of the paper on BYPASS cbcast on cu-arpa
within a few days.  If you are interested in understanding what is going
on at this level of ISIS, I recommend that you read that section of the
revised paper.  Very briefly, the problem is one of a cost tradeoff: ISIS
can send lots of acks and this makes a multicast "stable" quickly, allowing
the sender to send in a different process group safely or to garbage collect
the message, but performance may suffer due to the extra messages.  Or,
ISIS can wait for a while before sending the ack in the hope that a message
from the recipient to the sender can piggyback this information.  Our ideal
goal would be rapid stability but few acks... 

Let me also comment that by now I have fixed several things in ISISV3.0
that would show up as bugs in the BYPASS mechanisms under ISISV2.1.
These will all be fixed in ISISV2.2 (and have been fixed in ISISV3.0 for
those who plan to switch over).

Ken