[mod.protocols.tcp-ip] network horror stories

karn@FLASH.BELLCORE.COM.UUCP (03/24/87)

Anyone who believes that connection-oriented networks are inherently immune
to congestion should consider one of the following events:

1. The US phone network on the night of the Carter-Reagan debates in 1980.
AT&T conducted their first large-scale trial of their 900 DIAL-IT service
which was used to poll viewers on their opinions regarding the debate. Although
AT&T  placed strict limits on the number of long distance trunks that could be
used for 900 service, there were so many people attempting to call it that in
most places in the country there were delays of at least 2 minutes in getting
dial tone from the local office.

2. The phone system in the state of Arizona the day the Tucson Amateur Packet
Radio group decided to take phone orders (one day only!) for their new packet
radio controller box. They only had one phone line. Not only was Tucson
cut off from the rest of the world for several hours, but most of the rest of
the state as well.

3. The ticket-buying frenzy accompanying any Bruce Springsteen concert.

The problem isn't connectionless vs connection-oriented, it's that the network
is a shared resource. If there aren't enough facilities to go around in times
of peak demand, some people are going to be denied service. The only difference
is in the details of how that's done.  The real issue when trying to assure
network stability is how the users react to being denied service. If they
use Demon Dialers to hammer away at the phone network, you're going to have
trouble. The Internet is in trouble because there are so many broken TCPs
out there that do the equivalent thing when the network load picks up.
I wouldn't be surprised if a few nodes could bring down an X.25 network
pretty easily by just pummeling it with CALL REQUEST packets to unreachable
addresses.

The solution is more likely economic than technical. Since failed attempts
still use network resources, one answer is to charge for them.  I suspect
charging for failed phone calls would put a stop to the abuse of demon dialers.
Since the Internet doesn't charge for each packet, the solution here would
be to require certification of a host's TCP retransmission behavior before
it attached to the network in the same way that a radio transmitter has to
be type certified before it can be placed in operation.

Phil

jon@CS.UCL.AC.UK.UUCP (03/25/87)

Exactly my point. TCP can be made to work fine. IP can be made
to work fine, but currently has no (workable) mechanism for
congestion control. If the internet is overengineered in terms
of bandwidth, this works OK. If some congestion control
mechanism is put in the gateways, it can be made to work better.
Just fixing everyone's TCP does not fix things, because they
can not make optimal use of the paths.

X.25 networks give you a (one particular kind of) handle
to control the hop by hop congestion
control if you implement it right, whilst TP4/TCP (especially with
selacks) buys you efficient end to end control. JANET happens
to do this, which is why it  works well.

The resource reservation explicit in X.25 means you don't get
hit by misbehaving end points - it's fair. A DTE that floods
the DCE with CALL REQUEST packets just gets ignored by any
reasonable DCE, and does not impinge on the network at all,
after all X.25 is an interface spec more than a protocol.

Asking people to certify TCPs before attaching to a research
network like the internet is like asking your friend to certify
that the car he's selling you cheap is going to run trouble
free.

We can't prove programs yet.

jon

Rudy.Nedved@H.CS.CMU.EDU.UUCP (03/26/87)

Phil,

I agree with your points alas certification seems to be something like
program verification, it only works on small test cases. With comments
from things like Jan 1987 ACM SIGOPS section on MIT Project Athena,
"Firewalls in gateways are neccessary" and my own experiences, I believe
it is up to the routers, bridges and gateways to control congestion and
ignore brain damaged hosts.

I would suggest that an implementation be beat on for some type of
certification before being released but experience has shown that the
imagination of the attackers/testers is more conservative then the ever
changing network enviroment....something always shows up later. Therefore,
the two prong approach of doing constructive/definitive tests and putting
firewalls into gateways is the way to go.

For firewalls, adding hysteresis to gateways, bridges and routers tied
in with the volume of datagrams from a host or network should help even
though it would penalize highly used paths...these paths are having
severe problems as it is...this will encourage more efficient use of those
paths....especially if every relaying agent does it. For relay agents on
"dedicated" networks the hysteresis would be very heavy for datagrams not
to or from dedicated network clients.

When congestion occurs, the clients that want to send the once and a while
important message would succeed but the clients that generally send lots
of communication in an inefficient manner would be penalized....this is a
more desirable behaviour then everyone who tries gets penalized.

Lastly, communicating back to entry gateways that some client is being nasty
and should be ignore would reduce gateway to gateway congestion just like most
of the telephone companies have the prefix for remote areas stored locally to
reduce trunk line usage from wrong numbers...if you dial 412 333 XXXX in 201
area then 201 area will not even try the connection, it will indicate that
number is incorrect and a telephone book should be consulted. Alas,
propagation of this information has the same problems as propagating routing
information....sigh.

Cheers,
-Rudy

haas%utah-gr@UTAH-CS.ARPA.UUCP (03/28/87)

One of the more interesting but little-discussed events in the
history of network engineering occured when Telenet converted
from unreliable-datagram internal architecture to VC internally.
For a good discussion of the whys and wherefores of Telenet internal
architecture see a paper entitled "An X.75 Based Network Architecture"
by D. F Weir, J. B. Holmblad and A. C. Rothberg published in the
"Proceedings of the Fifth International Conference on Computer
Communications", 1980.  If you don't feel like chasing the Proceedings
around the library Telenet will probably give you a free copy of the
paper.

The justification for ripping out the datagram code and replacing it
with VC code was economic.  There is less waste and better management
of the resource in a VC network.  I quote from the cited paper:

 "... in the late 70's it became economically attractive to incur
  additional processing and storage costs in order to reduce
  communications costs...

  ... By establishing fixed paths, virtual circuit routing can
  better balance load as compared with routing in a datagram based
  network ... congestion at a transit point in a virtual circuit
  network can be reflected back to the endpoint nodes to restrict
  flow into the network.  This capability results from access to
  the virtual circuit at transit nodes using the logical channel
  number.  In a datagram network, knowledge of virtual calls does
  not exist at transit nodes and flow control cannot easily be
  applied to the virtual circuits at the endpoints."

Cheers  -- Walt

Mills@UDEL.EDU.UUCP (03/28/87)

Rudy,

You are referring to what are called "hard-to-reach" (HTR) numbers. Clever
switches remember HTR prefixes in order to back congestion up towards the
source. Now consider doing the same thing in an IP gateway. All it has to
do is wiretap ICMP messages on the way back to the sender and cache the
information for awhile. You may remember the general reaction (horror)
in response to this suggestion some time back. Many consider wiretapping
ICMP messages something only a little less sinful than forwarding
redirects across gateways. Considering the almost universal practice
of ignoring ICMP error messages, perhaps your comment may spark a minor
change in that thinking.

Dave

ms6b#@ANDREW.CMU.EDU.UUCP (03/30/87)

>most of the telephone companies have the prefix for remote areas stored
locally to

>reduce trunk line usage from wrong numbers...if you dial 412 333 XXXX in 201

>area then 201 area will not even try the connection, it will indicate that

>number is incorrect and a telephone book should be consulted. Alas,

>propagation of this information has the same problems as propagating routing

>information....sigh.


When CMU's exchange code was changed last year (to 268= CMU) that fact was

not properly propogated to the CCSA switches used by the FTS network.  As a

consequence, for two days after the change, no one at NSF, DoD, etc. could
call 

CMU over the FTS network!  They were told 268 was not a working exchange.


Marvin Sirbu

CMU

Rudy.Nedved@H.CS.CMU.EDU.UUCP (03/31/87)

I am not aware of the details behind the transition from one exchange to
another...maybe it could have been done over a longer time period with
some type of exchange forwarding at the local TelCo...

As far as I know, the major users were not affected only private and small
telephone companies. This is a problem in the ARPA Internet and is an issue of
managerial inertia not a technical problem. If you don't play the game
correctly then what can you do.

Anyhow, two days once every few years is not a major reason to not do what
they do.

-Rudy