JBVB@AI.AI.MIT.EDU ("James B. VanBokkelen") (10/28/87)
One has to be at least a little suspicious of layering violations, if for no other reason that if you go blithely installing hooks and odd interdependencies, you will wind up with a tangled morass of code that can't be enhance, or ported, or even maintained. Layering usually exists because the designers wanted modularity, and relatively clean interface specifications on module boundaries. Handling RIF-cache flush on TCP timeouts means adding another hook, and dummy routines to the other low-level routing layers. Maybe some of the other routing layers could use it, too. It certainly represents more code, and more complexity (and work, and money for initial purchase and support). Even something like ICMP illustrates this: Essentially all TCPs will return an error to their caller whenb they receive a Reset. An ICMP Destination Unreachable message implies much the same thing, but many TCP/IPs won't return an error to the caller. I won't defend this, but I certainly understand why: Handling ICMP Destination Unreachable requires a 2nd, parallel demultiplexing path through IP and into the TCP, and it is not absolutely required during the initial rush to get on the air. A seer employed by one large network user I know of has pronounced that 1 Mb of memory will be necessary to implement ISO. I don't know if he/she/it is right, but the pronouncement certainly made more than one manufacturer jump... jbvb
karn@faline.bellcore.com (Phil R. Karn) (10/29/87)
> Even something like ICMP illustrates this: Essentially all TCPs will > return an error to their caller whenb they receive a Reset. An ICMP > Destination Unreachable message implies much the same thing, but many > TCP/IPs won't return an error to the caller.... There is something to be said for this. ICMP unreachable messages are often generated under transitory conditions. Many TCPs (e.g., BSD UNIX) bomb out when they get an ICMP unreachable message, even if a connection has already handled many packets successfully. I find this *most* annoying; TCP is supposed to provide reliable end-to-end communications, not bomb out at the slightest provocation from the underlying Internet. There are basically two classes of network applications: interactive users and automatic daemons. An interactive network command should always leave it up to the user to decide whether he or she wants to give up or be patient and see if the problem goes away. An automatic daemon is much more patient, so as long as the TCP is careful to avoid wasting network resources (e.g., by backing off) I see little reason it shouldn't just keep trying "forever". ICMP unreachable messages are very useful debugging tools (or at least they would be if they actually contained the source address of the complaining gateway instead of some nonsense IP address like that of the original destination). But they shouldn't affect TCP's operation without the consent of the user. Even worse are TCP "keep alive" timers. Not only do they waste network resources, as far as I'm concerned they serve no useful purpose at all. If I have a long-idle TCP connection, why should I care if the path goes down temporarily? I certainly don't want my connection aborted because of it. Phil
CLYNN@G.BBN.COM.UUCP (10/29/87)
Phil, ICMP messages DO conatin an IP source address of the gateway which sent the message - look in the IP header's Source Address field. If the available software does not include such basic information when the ICMP message is passed by IP to the ICMP "layer", complain to the vendor. I would also complain if the subnet address were not passed up, or if the info wasn't available at the TCP or higher layers; I hope that such additional information doesn't fall into the category which is the subject of this message. While you may find reset connections "annoying", I suspect that others might add it to their list of "denial of service attacks". Remember ... Validate your input before you process it!
karn@FALINE.BELLCORE.COM (Phil R. Karn) (10/29/87)
>Phil, ICMP messages DO conatin an IP source address of the gateway >which sent the message - look in the IP header's Source Address field. Yes, I know they're *supposed* to. I meant to say that I've seen many gateways use the destination IP address of the original datagram as the source address for the IP datagram containing the ICMP message, and this makes it impossible to discern where the problem is. One implementation around here returns every broadcast it sees with a "port unreachable" ICMP message and puts the IP broadcast address in the IP source field! Phil
karels@OKEEFFE.BERKELEY.EDU (Mike Karels) (11/01/87)
Two comments on your recent message. First, about TCP behavior when ICMP unreachables are received: I definitely agree that TCP ought not to quit when it receives an unreachable. However, in Unix and probably most other systems, it's hard to report "soft" errors to a network client. In 4.3, I chose to return a single error on the next send or receive, but the TCP connection remains open. Unfortunately, most network applications carefully check for errors on each send/receive, and they give up on the first error. (4.2 aborted the connection when ICMP errors were received, and thus the application had no chance to keep trying.) I also agree that you're right to distinguish between interactive network users and automatic daemons. However, it's precisely for the daemons that are willing to wait patiently forever that "keep alive" messages are needed. Although the telnet client will give up and close the connection manually, there needs to be a way to prevent systems from accumulating useless, disconnected telnet servers and other such trash. Most application-level programs don't have their own keep-alive or are-you-there to detect network failure. For those reasons, we use TCP-level keepalives (which are also not well provided-for at this level) only on network servers that don't have their own time-out scheme. Mike
braden@VENERA.ISI.EDU (11/02/87)
Yes, I know they're *supposed* to. I meant to say that I've seen many gateways use the destination IP address of the original datagram as the source address for the IP datagram containing the ICMP message, and this makes it impossible to discern where the problem is. One implementation around here returns every broadcast it sees with a "port unreachable" ICMP message and puts the IP broadcast address in the IP source field! Phil Phil, I wish you would name names. Being coy about whose box is screwing up isn't doing anyone a favor. Since there is no Internet Conformance testing service, we have to collaborate as a group to "encourage" conformance from the vendors. Bob Braden
PADLIPSKY@A.ISI.EDU (Michael Padlipsky) (11/02/87)
Phil-- A sidelight on keeping connections open "forever" seems appropriate, just in case anybody doesn't attach enough strength to the quotation marks you rightly used: In the early days of the Multics "NCP" [sic], we discovered that we were sending "RST"s (the old Host-Host Protocol Reset command, which was sent whenever an NCP came back up, to "everybody" --well, you could do that sort of thing when there weren't four dozen Hosts in the world) without end to a particular Host. It turned out the problem was that we were getting "Incomplete Transmission" from our IMP, so we tried again, since that code was supposed to mean that a temporary problem had prevented successful transmission; however, the Host in question had somehow jumpered their IMP interface in such a fashion as to convince their IMP that they really were up when they weren't and so we got the code in a circumstance where we really shouldn't have. Naturally, we put a limit on the retransmisions after an Incomplete Transmission was encountered after that (and we probably should have had one in the first place). The moral does seem worth pointing out, though: keep connections open for appropriately small values of forever. (For example, if you happen to get a Host Down, you might as well close even if you're only a daemon, since the other side should come up again out of Sequence Number synch--shouldn't it?) cheers, map -------
karn@faline.bellcore.com (Phil R. Karn) (11/03/87)
In article <8711010759.AA08856@okeeffe.Berkeley.EDU>, karels@OKEEFFE.BERKELEY.EDU (Mike Karels) writes: > Two comments on your recent message. First, about TCP behavior > when ICMP unreachables are received: I definitely agree that TCP > ought not to quit when it receives an unreachable. However, in Unix > and probably most other systems, it's hard to report "soft" errors > to a network client. In 4.3, I chose to return a single error > on the next send or receive, but the TCP connection remains open. > Unfortunately, most network applications carefully check for errors > on each send/receive, and they give up on the first error. > (4.2 aborted the connection when ICMP errors were received, > and thus the application had no chance to keep trying.) TCP doesn't return an error to the application when it retransmits, so why should it do so when it receives a sporadic ICMP unreachable message? I think a better approach would be for TCP to ignore these messages, except to keep the last one or two around in case the application specifically wanted to see them (e.g., by doing a special ioctl on the socket). > I also agree that you're right to distinguish between interactive > network users and automatic daemons. However, it's precisely for > the daemons that are willing to wait patiently forever that "keep alive" > messages are needed. Although the telnet client will give up and close > the connection manually, there needs to be a way to prevent systems > from accumulating useless, disconnected telnet servers and other such > trash. Most application-level programs don't have their own keep-alive > or are-you-there to detect network failure. For those reasons, we use > TCP-level keepalives (which are also not well provided-for at this level) > only on network servers that don't have their own time-out scheme. I strongly disagree that this should be done at the TCP level. I took keepalives out of most of our systems some time ago. It's really nice not to have to recreate a half dozen rlogin windows on my Sun each time my SLIP link drops and has to be redialed. It's also nice not to have a steady stream of useless traffic on my amateur packet radio channel when somebody logs in but remains idle for long periods. The only way you accumulate useless, disconnected telnet servers is when the client machines crash. If you *really* want to get rid of them, just have a shell script do a write to anybody idle for more than X days -- the data will trigger a TCP Reset which will close the connection for you. On the other hand, while they are aesthetically unpleasing, idle sessions really don't hurt anything -- main memory is cheap and paging memory even cheaper. A year or two ago I thought as you do on keepalives, but a discussion with Jon Postel turned me around. :-) Phil
PADLIPSKY@A.ISI.EDU (Michael Padlipsky) (11/06/87)
Although you probably make a valuable distinction (given confirmation that X.25 actually does have a "Host Down" return--I remember something like Virtual Circuit Failure, or the like, myself), I was not, of course, talking about X.25 explicitly. (Nor was I talking about daemons explicitly.) Always delighted to learn of still another X.25 faux pas, though. (I'd confirm the Host Down question myself, but I don't have an X.25 spec handy, even though I acknowdlege that it's a seminal fascicle.) Just so the principle doesn't get lost in the worrying over the example, let me rephrase: Just as there's no need to close connections before their time, there's no need to keep them open beyond their time. Good judgment is expected to be applied to the issue of what time it is in the life of given connections in given contexts. OK? cheers, map P.S. Maybe it's a quibble, but wouldn't X.25 call it DTE Down, anyway? (Or is it DCE? Well, DxE, at least.) -------
g-tasman@GUMBY.WISC.EDU (11/08/87)
You suggest that if a "host down" indication is received, a daemon should immediately close the associated TCP connection. With an 1822 Distant Host connection to DDN, this may be a fairly reasonable approach. However, a typical DDN connection of late has been X.25 or HDH. Here, "host down" may have a more transitory meaning: simply that there was noise on the host access line. The remote host may well reappear with all TCP connections intact. Consider in particular the case of a Telnet server. If connections are cleared prematurely/incorrectly, extremely annoyed users will result. On the other hand, I understand all too well the importance of eventually detecting and closing "half-open" connections which result from an actual crash (since these will eventually inhibit new remote terminal sessions). The issue of distinguishing between a dead host and an "unhealthy" host access line is likely to become increasingly serious over time, as more DDN hosts switch to synchronous access protocols. For a network client, remote host status can simply be reported to the (human) user. For a server, however, I don't see a straightforward solution. Mitchell Tasman