[fa.tcp-ip] 4.2BSD tftp doesn't retransmit acks

stanonik@NPRDC.ARPA (Ron Stanonik) (10/01/85)

Description:
	A RRQ on receiving a duplicate data packet doesn't retransmit
	the last ack.
Repeat-By:
	This would happen intermittently between our vax and a pc,
	but the problem can be reproduced by hacking tftpd.c to
	not advance its block count and then tftp'ing to yourself.
Fix:
	Move the retransmit code into the inner loop in recvfile().
	This actually causes tftp to retransmit on receiving anything
	but the next expected packet or an error packet.  I believe
	that's in keeping with RFC783, but at any rate it makes tftp
	"generous in what it accepts".
	
	We haven't really observed the corresponding WRQ problem with
	duplicate acks, but the logic is the same, so we fixed(?) it too.

	Oh, the diff will probably only make sense if you've already
	installed the fixes from mogul@gregorio and satz@joyce.

Ron Stanonik
stanonik@nprdc.arpa

RCS file: RCS/tftp.c,v
retrieving revision 1.3
diff -c -r1.3 tftp.c
*** /tmp/,RCSt1001512	Tue Oct  1 09:43:20 1985
--- tftp.c	Tue Oct  1 09:43:00 1985
***************
*** 73,85
  		}
  		timeout = 0;
  		(void) setjmp(timeoutbuf);
- 		if (trace)
- 			tpacket("sent", stp, size + 4);
- 		n = sendto(f, sbuf, size + 4, 0, (caddr_t)&sin, sizeof (sin));
- 		if (n != size + 4) {
- 			perror("tftp: sendto");
- 			goto abort;
- 		}
  		do {
  			alarm(rexmtval);
  			do {

--- 73,78 -----
  		}
  		timeout = 0;
  		(void) setjmp(timeoutbuf);
  		do {
  			if (trace)
  				tpacket("sent", stp, size + 4);
***************
*** 81,86
  			goto abort;
  		}
  		do {
  			alarm(rexmtval);
  			do {
  				fromlen = sizeof (from);

--- 74,86 -----
  		timeout = 0;
  		(void) setjmp(timeoutbuf);
  		do {
+ 			if (trace)
+ 				tpacket("sent", stp, size + 4);
+ 			n = sendto(f, sbuf, size + 4, 0, (caddr_t)&sin, sizeof (sin));
+ 			if (n != size + 4) {
+ 				perror("tftp: sendto");
+ 				goto abort;
+ 			}
  			alarm(rexmtval);
  			do {
  				fromlen = sizeof (from);
***************
*** 144,157
  		}
  		timeout = 0;
  		(void) setjmp(timeoutbuf);
- 		if (trace)
- 			tpacket("sent", stp, size);
- 		if (sendto(f, sbuf, size, 0, (caddr_t)&sin,
- 		    sizeof (sin)) != size) {
- 			alarm(0);
- 			perror("tftp: sendto");
- 			goto abort;
- 		}
  		do {
  			alarm(rexmtval);
  			do

--- 144,149 -----
  		}
  		timeout = 0;
  		(void) setjmp(timeoutbuf);
  		do {
  			if (trace)
  				tpacket("sent", stp, size);
***************
*** 153,158
  			goto abort;
  		}
  		do {
  			alarm(rexmtval);
  			do
  				n = recvfrom(f, rbuf, sizeof (rbuf), 0,

--- 145,158 -----
  		timeout = 0;
  		(void) setjmp(timeoutbuf);
  		do {
+ 			if (trace)
+ 				tpacket("sent", stp, size);
+ 			if (sendto(f, sbuf, size, 0, (caddr_t)&sin,
+ 			    sizeof (sin)) != size) {
+ 				alarm(0);
+ 				perror("tftp: sendto");
+ 				goto abort;
+ 			}
  			alarm(rexmtval);
  			do
  				n = recvfrom(f, rbuf, sizeof (rbuf), 0,

lwa@apollo.UUCP (10/13/85)

THAT IS NOT A BUG!!

The ORIGINAL tftp spec required retransmisson of ack's, as well
as of data packets, upon receiving any old packet.  This algorithm
was shown to be faulty by Michael Greenwald at MIT.  It has the
following problem:

    Suppose host A is talking to host B.  Suppose further that host
    A's retransmit timeout is too short (a common case, since there is
    no good way to determine an initial retransmit timeout).  Now
    observe what happens:

    Host A sends packet 1

    Host A's retransmit timer goes
    off, and host A retransmits packet
    1.
                                        Host B receives packet 1, sends
                                        ack 1.
    Host A receives ack 1, sends
    packet 2. 
                                        Host B receives retransmitted packet
                                        1, retransmits ack 1.
    Host A receives retransmitted
    ack 1, retransmits packet 2.
                                        Host B receives packet 2, sends
                                        ack 2.
    Host A receives ack 2, sends
    packet 3.
                                        Host B receives retransmitted packet
                                        2, retransmits ack 2.
    Host A receives retransmitted
    ack 2, retransmits packet 3.
                .                                       .
                .                                       .
                .                                       .

Note that what has now happened is that every tftp packet is being transmitted
twice.  Furthermore, if Host A's retransmit timer goes off too early again,
every packet will be transmitted three times, and so forth.  This quickly
causes tftp performance to degrade to zero, and connections eventually time
out.

The current tftp spec avoids this problem by specifying that only data packets
are to be retransmitted in response to receipt of an old ack, and then only if
the ack is for the previously transmitted data packet.  Acks are retransmitted
ONLY when the retransmit timer expires.  I believe that this is the way the
Berkeley tftp currently (and correctly) behaves, and hence that Ron's "bug fix"
is in fact unnecessary and incorrect.

There are several other problems with the 4.2bsd tftp as distributed, including:
    1) No support for netascii mode.
    2) Relies on signals breaking through read() calls; this no longer
	   happens in 4.2 (instead the read() call is restarted after a signal).
    3) Uses the same buffer for transmit and receive, thereby clobbering the
       packet to be retransmitted if an old packet arrives.
    4) Several other problems in the retransmit code.

I believe that the PC/IP people at MIT are shipping a completely new tftp implementation
for 4.2bsd as part of their PC/IP package.  I suggest contacting John Romkey at MIT
(romkey@mit-borax.mit.edu, I think) for further information.
                                                -Larry Allen
                                                 Apollo Computer

stanonik@NPRDC.ARPA (Ron Stanonik) (10/18/85)

Thanks.  You refer to a "current tftp spec" which says "Acks
are retransmitted ONLY when the retransmit timer expires".
The most recent tftp spec I'm aware of is rfc783 which says
"If a packet gets lost in the network, the intended recipient
will timeout and may retransmit his last packet (which may be
data or an acknowledgement)".  Is there some later tftp spec?
Where?

Also, the problem we encountered was that 4.2bsd tftp never
retransmitted the ack.  It didn't retransmit in response to
repeated data packets, and it didn't timeout because the data
packets reset the timer.  Given that "acks are only retransmitted
when the timer expires", I can see the problem appears to be
an incorrectly reset timer.

Yep, we're bagging 4.2bsd's tftp, mostly.  A couple of pc's here
are still running a version of tftp assuming 4.2bsd's damaged
netascii.

Thanks again,

Ron
stanonik@nprdc.arpa