mann@Navajo.ARPA (11/14/84)
I've discovered a documentation bug that may cause some grief for anyone who uses 3Com Multibus ethernet boards. These boards rely on software to implement part of the exponential backoff algorithm. Whenever a collision is detected, the board generates a JAM interrupt and waits for a random number to be written into the backoff register by software. It then delays the specified number of slot times and tries again. Section 4.4 of the manual explains how to choose the random numbers: "a uniformly distributed random integer greater than or equal to zero and less than 2^k, where k is either the number of retransmission attempts for the packet being transmitted or 10, whichever is less." Section 4.2 says: "software must write the two's complement of the number of slot times to delay into MEBACK." These statements would lead one to believe that it is legitimate to write a 0 into MEBACK. In fact, following this procedure, one would choose to delay 0 slots and write a 0 into MEBACK 50% of the time on the first collision, 25% on the second, etc. The problem is that the ethernet board interprets a zero as "delay 65536 slot times"!! (This was determined by timing how long it took to try transmitting again: about 3.36 seconds.) Thus, it seems the driver must write -(s+1) into MEBACK, where s is the number of slot times to delay, determined as above. This sort of bug is likely to go unnoticed for a very long time since collisions are quite rare on an ethernet. In fact, the 3Com driver in the V kernel was in use for over a year with several bugs in collision handling. These were not detected until recently, when they showed up as occasional unexpected timeouts during a stress test of our file server. (The timeout was less than 3 seconds.) People who maintain ethernet drivers may want to recheck their code. --Tim