[comp.protocols.tcp-ip] ..."layering violations"

JBVB@AI.AI.MIT.EDU ("James B. VanBokkelen") (10/28/87)

One has to be at least a little suspicious of layering violations, if
for no other reason that if you go blithely installing hooks and odd
interdependencies, you will wind up with a tangled morass of code that
can't be enhance, or ported, or even maintained.  Layering usually
exists because the designers wanted modularity, and relatively clean
interface specifications on module boundaries.

Handling RIF-cache flush on TCP timeouts means adding another hook,
and dummy routines to the other low-level routing layers.  Maybe
some of the other routing layers could use it, too.  It certainly
represents more code, and more complexity (and work, and money for
initial purchase and support).

Even something like ICMP illustrates this:  Essentially all TCPs will
return an error to their caller whenb they receive a Reset.  An ICMP
Destination Unreachable message implies much the same thing, but many
TCP/IPs won't return an error to the caller.  I won't defend this, but
I certainly understand why:  Handling ICMP Destination Unreachable requires
a 2nd, parallel demultiplexing path through IP and into the TCP, and it
is not absolutely required during the initial rush to get on the air.

A seer employed by one large network user I know of has pronounced that
1 Mb of memory will be necessary to implement ISO.  I don't know if he/she/it
is right, but the pronouncement certainly made more than one manufacturer
jump...

jbvb

karn@faline.bellcore.com (Phil R. Karn) (10/29/87)

> Even something like ICMP illustrates this:  Essentially all TCPs will
> return an error to their caller whenb they receive a Reset.  An ICMP
> Destination Unreachable message implies much the same thing, but many
> TCP/IPs won't return an error to the caller....

There is something to be said for this. ICMP unreachable messages are
often generated under transitory conditions.  Many TCPs (e.g., BSD UNIX)
bomb out when they get an ICMP unreachable message, even if a connection
has already handled many packets successfully. I find this *most*
annoying; TCP is supposed to provide reliable end-to-end communications,
not bomb out at the slightest provocation from the underlying Internet.

There are basically two classes of network applications: interactive
users and automatic daemons.  An interactive network command should always
leave it up to the user to decide whether he or she wants to give up or
be patient and see if the problem goes away. An automatic daemon is much
more patient, so as long as the TCP is careful to avoid wasting network
resources (e.g., by backing off) I see little reason it shouldn't just
keep trying "forever".  ICMP unreachable messages are very useful
debugging tools (or at least they would be if they actually contained
the source address of the complaining gateway instead of some nonsense
IP address like that of the original destination). But they shouldn't
affect TCP's operation without the consent of the user.

Even worse are TCP "keep alive" timers.  Not only do they waste network
resources, as far as I'm concerned they serve no useful purpose at all.
If I have a long-idle TCP connection, why should I care if the path
goes down temporarily? I certainly don't want my connection aborted
because of it.

Phil

CLYNN@G.BBN.COM.UUCP (10/29/87)

Phil,	ICMP messages DO conatin an IP source address of the gateway
which sent the message - look in the IP header's Source Address field.
If the available software does not include such basic information
when the ICMP message is passed by IP to the ICMP "layer", complain to
the vendor.  I would also complain if the subnet address were not passed
up, or if the info wasn't available at the TCP or higher layers; I hope
that such additional information doesn't fall into the category which is
the subject of this message.  While you may find reset connections
"annoying", I suspect that others might add it to their list of "denial
of service attacks".  Remember ... Validate your input before you
process it!

karn@FALINE.BELLCORE.COM (Phil R. Karn) (10/29/87)

>Phil,	ICMP messages DO conatin an IP source address of the gateway
>which sent the message - look in the IP header's Source Address field.

Yes, I know they're *supposed* to. I meant to say that I've seen many
gateways use the destination IP address of the original datagram as the
source address for the IP datagram containing the ICMP message, and this
makes it impossible to discern where the problem is.

One implementation around here returns every broadcast it sees with
a "port unreachable" ICMP message and puts the IP broadcast address in
the IP source field!

Phil

karels@OKEEFFE.BERKELEY.EDU (Mike Karels) (11/01/87)

Two comments on your recent message.  First, about TCP behavior
when ICMP unreachables are received: I definitely agree that TCP
ought not to quit when it receives an unreachable.  However, in Unix
and probably most other systems, it's hard to report "soft" errors
to a network client.  In 4.3, I chose to return a single error
on the next send or receive, but the TCP connection remains open.
Unfortunately, most network applications carefully check for errors
on each send/receive, and they give up on the first error.
(4.2 aborted the connection when ICMP errors were received,
and thus the application had no chance to keep trying.)

I also agree that you're right to distinguish between interactive
network users and automatic daemons.  However, it's precisely for
the daemons that are willing to wait patiently forever that "keep alive"
messages are needed.  Although the telnet client will give up and close
the connection manually, there needs to be a way to prevent systems
from accumulating useless, disconnected telnet servers and other such
trash.  Most application-level programs don't have their own keep-alive
or are-you-there to detect network failure.  For those reasons, we use
TCP-level keepalives (which are also not well provided-for at this level)
only on network servers that don't have their own time-out scheme.

		Mike

braden@VENERA.ISI.EDU (11/02/87)

	Yes, I know they're *supposed* to. I meant to say that I've seen many
	gateways use the destination IP address of the original datagram as the
	source address for the IP datagram containing the ICMP message, and this
	makes it impossible to discern where the problem is.
	
	One implementation around here returns every broadcast it sees with
	a "port unreachable" ICMP message and puts the IP broadcast address in
	the IP source field!
	
	Phil
	
Phil,  I wish you would name names.  Being coy about whose box is
screwing up isn't doing anyone a favor.  Since there is no Internet
Conformance testing service, we have to  collaborate as a group to
"encourage" conformance from the vendors.

Bob Braden

PADLIPSKY@A.ISI.EDU (Michael Padlipsky) (11/02/87)

Phil--
   A sidelight on keeping connections open "forever" seems appropriate,
just in case anybody doesn't attach enough strength to the quotation
marks you rightly used:  In the early days of the Multics "NCP" [sic],
we discovered that we were sending "RST"s (the old Host-Host Protocol
Reset command, which was sent whenever an NCP came back up, to "everybody"
--well, you could do that sort of thing when there weren't four dozen
Hosts in the world) without end to a particular Host.  It turned out the
problem was that we were getting "Incomplete Transmission" from our IMP,
so we tried again, since that code was supposed to mean that a temporary
problem had prevented successful transmission; however, the Host in
question had somehow jumpered their IMP interface in such a fashion as
to convince their IMP that they really were up when they weren't and so
we got the code in a circumstance where we really shouldn't have.
Naturally, we put a limit on the retransmisions after an Incomplete
Transmission was encountered after that (and we probably should have
had one in the first place).  The moral does seem worth pointing out,
though: keep connections open for appropriately small values of forever.
(For example, if you happen to get a Host Down, you might as well close
even if you're only a daemon, since the other side should come up again
out of Sequence Number synch--shouldn't it?)
   cheers, map
-------

karn@faline.bellcore.com (Phil R. Karn) (11/03/87)

In article <8711010759.AA08856@okeeffe.Berkeley.EDU>, karels@OKEEFFE.BERKELEY.EDU (Mike Karels) writes:
> Two comments on your recent message.  First, about TCP behavior
> when ICMP unreachables are received: I definitely agree that TCP
> ought not to quit when it receives an unreachable.  However, in Unix
> and probably most other systems, it's hard to report "soft" errors
> to a network client.  In 4.3, I chose to return a single error
> on the next send or receive, but the TCP connection remains open.
> Unfortunately, most network applications carefully check for errors
> on each send/receive, and they give up on the first error.
> (4.2 aborted the connection when ICMP errors were received,
> and thus the application had no chance to keep trying.)

TCP doesn't return an error to the application when it retransmits, so
why should it do so when it receives a sporadic ICMP unreachable
message? I think a better approach would be for TCP to ignore these
messages, except to keep the last one or two around in case the
application specifically wanted to see them (e.g., by doing a special
ioctl on the socket).

> I also agree that you're right to distinguish between interactive
> network users and automatic daemons.  However, it's precisely for
> the daemons that are willing to wait patiently forever that "keep alive"
> messages are needed.  Although the telnet client will give up and close
> the connection manually, there needs to be a way to prevent systems
> from accumulating useless, disconnected telnet servers and other such
> trash.  Most application-level programs don't have their own keep-alive
> or are-you-there to detect network failure.  For those reasons, we use
> TCP-level keepalives (which are also not well provided-for at this level)
> only on network servers that don't have their own time-out scheme.

I strongly disagree that this should be done at the TCP level. I took
keepalives out of most of our systems some time ago. It's really nice
not to have to recreate a half dozen rlogin windows on my Sun each time
my SLIP link drops and has to be redialed.  It's also nice not to have
a steady stream of useless traffic on my amateur packet radio channel
when somebody logs in but remains idle for long periods.

The only way you accumulate useless, disconnected telnet servers is when
the client machines crash. If you *really* want to get rid of them, just
have a shell script do a write to anybody idle for more than X days --
the data will trigger a TCP Reset which will close the connection for
you. On the other hand, while they are aesthetically unpleasing, idle
sessions really don't hurt anything -- main memory is cheap and paging
memory even cheaper.

A year or two ago I thought as you do on keepalives, but a discussion
with Jon Postel turned me around. :-)

Phil

PADLIPSKY@A.ISI.EDU (Michael Padlipsky) (11/06/87)

Although you probably make a valuable distinction (given confirmation
that X.25 actually does have a "Host Down" return--I remember something
like Virtual Circuit Failure, or the like, myself), I was not, of course,
talking about X.25 explicitly.  (Nor was I talking about daemons explicitly.)
Always delighted to learn of still another X.25 faux pas, though.
(I'd confirm the Host Down question myself, but I don't have an X.25
spec handy, even though I acknowdlege that it's a seminal fascicle.)

Just so the principle doesn't get lost in the worrying over the example,
let me rephrase: Just as there's no need to close connections before
their time, there's no need to keep them open beyond their time.  Good
judgment is expected to be applied to the issue of what time it is
in the life of given connections in given contexts.  OK?

cheers, map

P.S.  Maybe it's a quibble, but wouldn't X.25 call it DTE Down, anyway?
(Or is it DCE?  Well, DxE, at least.)
-------

g-tasman@GUMBY.WISC.EDU (11/08/87)

     You suggest that if a "host down" indication is received, a daemon
should immediately close the associated TCP connection.

     With an 1822 Distant Host connection to DDN, this may be a fairly
reasonable approach.  However, a typical DDN connection of late has been
X.25 or HDH.  Here, "host down" may have a more transitory meaning:  simply 
that there was noise on the host access line.  The remote host may well
reappear with all TCP connections intact.

     Consider in particular the case of a Telnet server.  If connections 
are cleared prematurely/incorrectly, extremely annoyed users will result.
On the other hand, I understand all too well the importance of eventually 
detecting and closing "half-open" connections which result from an actual
crash (since these will eventually inhibit new remote terminal sessions).

     The issue of distinguishing between a dead host and an "unhealthy"
host access line is likely to become increasingly serious over time, as more
DDN hosts switch to synchronous access protocols.  For a network client,
remote host status can simply be reported to the (human) user.  For a server,
however, I don't see a straightforward solution.


   						Mitchell Tasman