stanonik@NPRDC.ARPA (Ron Stanonik) (10/01/85)
Description: A RRQ on receiving a duplicate data packet doesn't retransmit the last ack. Repeat-By: This would happen intermittently between our vax and a pc, but the problem can be reproduced by hacking tftpd.c to not advance its block count and then tftp'ing to yourself. Fix: Move the retransmit code into the inner loop in recvfile(). This actually causes tftp to retransmit on receiving anything but the next expected packet or an error packet. I believe that's in keeping with RFC783, but at any rate it makes tftp "generous in what it accepts". We haven't really observed the corresponding WRQ problem with duplicate acks, but the logic is the same, so we fixed(?) it too. Oh, the diff will probably only make sense if you've already installed the fixes from mogul@gregorio and satz@joyce. Ron Stanonik stanonik@nprdc.arpa RCS file: RCS/tftp.c,v retrieving revision 1.3 diff -c -r1.3 tftp.c *** /tmp/,RCSt1001512 Tue Oct 1 09:43:20 1985 --- tftp.c Tue Oct 1 09:43:00 1985 *************** *** 73,85 } timeout = 0; (void) setjmp(timeoutbuf); - if (trace) - tpacket("sent", stp, size + 4); - n = sendto(f, sbuf, size + 4, 0, (caddr_t)&sin, sizeof (sin)); - if (n != size + 4) { - perror("tftp: sendto"); - goto abort; - } do { alarm(rexmtval); do { --- 73,78 ----- } timeout = 0; (void) setjmp(timeoutbuf); do { if (trace) tpacket("sent", stp, size + 4); *************** *** 81,86 goto abort; } do { alarm(rexmtval); do { fromlen = sizeof (from); --- 74,86 ----- timeout = 0; (void) setjmp(timeoutbuf); do { + if (trace) + tpacket("sent", stp, size + 4); + n = sendto(f, sbuf, size + 4, 0, (caddr_t)&sin, sizeof (sin)); + if (n != size + 4) { + perror("tftp: sendto"); + goto abort; + } alarm(rexmtval); do { fromlen = sizeof (from); *************** *** 144,157 } timeout = 0; (void) setjmp(timeoutbuf); - if (trace) - tpacket("sent", stp, size); - if (sendto(f, sbuf, size, 0, (caddr_t)&sin, - sizeof (sin)) != size) { - alarm(0); - perror("tftp: sendto"); - goto abort; - } do { alarm(rexmtval); do --- 144,149 ----- } timeout = 0; (void) setjmp(timeoutbuf); do { if (trace) tpacket("sent", stp, size); *************** *** 153,158 goto abort; } do { alarm(rexmtval); do n = recvfrom(f, rbuf, sizeof (rbuf), 0, --- 145,158 ----- timeout = 0; (void) setjmp(timeoutbuf); do { + if (trace) + tpacket("sent", stp, size); + if (sendto(f, sbuf, size, 0, (caddr_t)&sin, + sizeof (sin)) != size) { + alarm(0); + perror("tftp: sendto"); + goto abort; + } alarm(rexmtval); do n = recvfrom(f, rbuf, sizeof (rbuf), 0,
lwa@apollo.UUCP (10/13/85)
THAT IS NOT A BUG!! The ORIGINAL tftp spec required retransmisson of ack's, as well as of data packets, upon receiving any old packet. This algorithm was shown to be faulty by Michael Greenwald at MIT. It has the following problem: Suppose host A is talking to host B. Suppose further that host A's retransmit timeout is too short (a common case, since there is no good way to determine an initial retransmit timeout). Now observe what happens: Host A sends packet 1 Host A's retransmit timer goes off, and host A retransmits packet 1. Host B receives packet 1, sends ack 1. Host A receives ack 1, sends packet 2. Host B receives retransmitted packet 1, retransmits ack 1. Host A receives retransmitted ack 1, retransmits packet 2. Host B receives packet 2, sends ack 2. Host A receives ack 2, sends packet 3. Host B receives retransmitted packet 2, retransmits ack 2. Host A receives retransmitted ack 2, retransmits packet 3. . . . . . . Note that what has now happened is that every tftp packet is being transmitted twice. Furthermore, if Host A's retransmit timer goes off too early again, every packet will be transmitted three times, and so forth. This quickly causes tftp performance to degrade to zero, and connections eventually time out. The current tftp spec avoids this problem by specifying that only data packets are to be retransmitted in response to receipt of an old ack, and then only if the ack is for the previously transmitted data packet. Acks are retransmitted ONLY when the retransmit timer expires. I believe that this is the way the Berkeley tftp currently (and correctly) behaves, and hence that Ron's "bug fix" is in fact unnecessary and incorrect. There are several other problems with the 4.2bsd tftp as distributed, including: 1) No support for netascii mode. 2) Relies on signals breaking through read() calls; this no longer happens in 4.2 (instead the read() call is restarted after a signal). 3) Uses the same buffer for transmit and receive, thereby clobbering the packet to be retransmitted if an old packet arrives. 4) Several other problems in the retransmit code. I believe that the PC/IP people at MIT are shipping a completely new tftp implementation for 4.2bsd as part of their PC/IP package. I suggest contacting John Romkey at MIT (romkey@mit-borax.mit.edu, I think) for further information. -Larry Allen Apollo Computer
stanonik@NPRDC.ARPA (Ron Stanonik) (10/18/85)
Thanks. You refer to a "current tftp spec" which says "Acks are retransmitted ONLY when the retransmit timer expires". The most recent tftp spec I'm aware of is rfc783 which says "If a packet gets lost in the network, the intended recipient will timeout and may retransmit his last packet (which may be data or an acknowledgement)". Is there some later tftp spec? Where? Also, the problem we encountered was that 4.2bsd tftp never retransmitted the ack. It didn't retransmit in response to repeated data packets, and it didn't timeout because the data packets reset the timer. Given that "acks are only retransmitted when the timer expires", I can see the problem appears to be an incorrectly reset timer. Yep, we're bagging 4.2bsd's tftp, mostly. A couple of pc's here are still running a version of tftp assuming 4.2bsd's damaged netascii. Thanks again, Ron stanonik@nprdc.arpa