clyde@ut-ngp.UTEXAS (Clyde W. Hoover) (10/29/85)
Index: /sys/vaxif/if_acc.c 4.2BSD APOLOGIA: The previous fix posted was **** ALL WRONG. **** My colleague who tracked down this bug did not (by his own admission) explain the nature of the bug sufficently, hence the wrong 'fix'. This person, who will remain nameless, has suffered and will continue to suffer the pains of the damned because *I* ended up looking stupid on USENET. (I know, USENET is full of stupid-looking people, but I was saving that for net.singles). Thanks to Art Berggreen <Art@ACC.ARPA> for his analysis of the problem (included below) and to my nameless colleagues for spending hours pouring over logic diagrams to figure out just how this bloody box works. NOTE: This is **not applicable** unless the modifications from Chris Kent (cak@purdue.ARPA, posted 21 March 1984) have been made to /sys/netinet/tcp_output.c. These modifications advertise a maximum TCP segment size that is tuned per network interface. Description: Connections to certain hosts on the ARPAnet will start failing with "out of buffer space" messages. Doing a 'netstat -h' shows that the host (or the gateway to it) has a RFNM count of 8. The RFNM count never drops below 8 and so the network path is unusable until the system is rebooted. The problem lies in the LH/DH-11 IMP interface. Sometimes, most likely always, it will not set the <END OF MESSAGE> flag in the control & status register if the input buffer is filled at the same time that <LAST BIT SIGNAL> from the IMP comes up. This causes the LH/DH driver to append the next incoming message from the IMP to the the previous message. This process (appending of messages) will continue until a message SHORTER then the input buffer size is sent -- a RFNM response does nicely. This results in the LOSS of the succeeding messages (e.g. RFNMs) since the 1822 protocol handling code expects to get only <ONE> message from the LH/DH at a time. This problem happens when the IMP MTU is advertised as the TCP maximum segment size (as is done by the TCP changes from cak@purdue). This allows an incoming message to be 1006 + 12 bytes long, which equals the size of the 1018 byte input buffer in the IMP (I believe) and so exercises the bug in the LH/DH. The described problem would appear to happen ONLY if a message from the IMP is one word longer than the buffer being read into. When the buffer fills, leaving the data that contains the Last Bit in the LH/DH data buffer, the Receive DMA terminates and the EOM flag is NOT ON (because the user has not yet DMA'd the End-of-Message into memory). What should happen when the Receive DMA is restarted, is that the remaining word is read into memory and the DMA should terminate with the EOM flag ON. If when the DMA is restarted, the internal EOM status is lost, the following message would be concatenated with the end of previous message. A better solution than reducing IMPMTU (which doesn't really fix the problem) would be to use I/O buffers that are slightly larger than IMPMTU (and of course setting the Receive Byte Counter to be larger than any expected message). Fix: /sys/vaxif/if_acc.c: 163c164 < (int)btoc(IMPMTU)) == 0) { --- > (int)btoc(IMPMTU+2)) == 0) { 190c191 < addr->iwc = -(IMPMTU >> 1); --- > addr->iwc = -((IMPMTU + 2) >> 1); 328,330c329,331 < len = IMPMTU + (addr->iwc << 1); < if (len < 0 || len > IMPMTU) { < printf("acc%d: bad length=%d\n", len); --- > len = IMPMTU+2 + (addr->iwc << 1); > if (len < 0 || len > IMPMTU+2) { > printf("acc%d: bad length=%d\n", unit, len); 362c363 < addr->iwc = -(IMPMTU >> 1); --- > addr->iwc = -((IMPMTU + 2)>> 1); This fix really does the job properly. -- Shouter-To-Dead-Parrots @ Univ. of Texas Computation Center; Austin, Texas "All life is a blur of Republicans and meat." -Zippy the Pinhead clyde@ngp.UTEXAS.EDU, clyde@sally.UTEXAS.EDU ...!ihnp4!ut-ngp!clyde, ...!allegra!ut-ngp!clyde