jbn@wdl1.UUCP (08/28/85)
Datagram systems have some serious problems. Here are a few of them. 1. In a pure datagram system, with no link-level retransmission, the probability of successfully forwarding a packet througn N nodes declines exponentially with the number of nodes. Ham users of digipeaters and UNIX users of async links for IP datagrams are painfully aware of this phenomenon. You really do need link-level retransmission in any sizable datagram system, unless the medium has very low error rates. 2. Congestion is a serious problem in datagram systems. No really good general solutions are known. I've solved some problems associated with some of the simpler cases (IP/TCP via Ethernet to slow link gateways) but a general solution is still elusive. There are tough theoretical problems here; there may be a way to organize an arbitrarily large datagram network, but it hasn't been discovered yet. Telephony has been around long enough that we know how to build very large virtual circuit networks. 3. Datagram networks tend to break down when fully loaded; this is a consequence of (2) above. There are ways around this, but they involve running the system in a derated mode, where keeping all links as busy as possible is not attempted. The ARPANET technology really works only because the ARPANET has substantially more link bandwidth than it needs for its traffic volume; this is a well known problem. TELENET started out with ARPANET technology but has since gone to virtual circuits internally to get better link utilization. In any case, the IMP system of the ARPANET is not a true datagram system internally, although it exports a datagram interface. 4. Datagram systems have some serious vulnerabilities. One bad guy can hog the network and clog up the links. Datagram systems tend to rely on hosts being well-behaved. With virtual circuits, the network has a positive throttle over host traffic generation, and can keep bad hosts from interfering with other traffic. In networks with no central administrative authority over hosts, this is a serious problem in practice. The ARPANET/MILNET gateways are already under serious strain because of this exact problem. Tight standards and anti-bad-guy queuing algorithms in nodes can solve this problem; unfortunately the Internet lacks both. 5. Accounting is difficult in datagram systems. What should a phone bill for a datagram net look like? Histograms of traffic by time and destination? Just a total amount? The network may need to recognize clumps of packets for similar destinations and treat them as a ``call'' for billing purposes. This may sound odd coming from me, as a builder of datagram gateways. But datagram systems are useful in the military environment, where the important thing is to keep going despite serious failures, not achieve maximum throughput under optimal conditions. They may be useful for other purposes, if the problems above are addressed. But a simple virtual circuit network (a la Tymnet) behaves better than a simple datagram network (a la Internet) given the same bandwidth. John Nagle
karn@petrus.UUCP (Phil R. Karn) (08/30/85)
> Datagram systems have some serious problems. Here are a few of them. > > 1. In a pure datagram system, with no link-level retransmission, the > probability of successfully forwarding a packet througn N nodes > declines exponentially with the number of nodes. Ham users of > digipeaters and UNIX users of async links for IP datagrams are > painfully aware of this phenomenon. You really do need link-level > retransmission in any sizable datagram system, unless the medium > has very low error rates. Portions of a datagram network are free to use link level retransmission whenever they consider it necessary. However, one of the beauties of datagram networks is that they don't HAVE to use link level retransmission where it doesn't make any sense (Ethernets, DDS lines with 1 in 10^9 error rates, etc). Packet radio (amateur or otherwise) is one of the few places where link level acknowledgments really do make sense. > 2. Congestion is a serious problem in datagram systems. No really good > general solutions are known. I've solved some problems associated with > some of the simpler cases (IP/TCP via Ethernet to slow link gateways) > but a general solution is still elusive. There are tough theoretical > problems here; there may be a way to organize an arbitrarily large > datagram network, but it hasn't been discovered yet. Telephony > has been around long enough that we know how to build very large > virtual circuit networks. Congestion is a serious problem in any network that depends on well-behaved user statistics, be it virtual circuit, datagram or simple circuit switched. I could make a virtual circuit network go into "congestion collapse" just like an IP network, assuming that I have a transport protocol in each case. I merely set the reset-and-re-establish VC timer in my transport protocol to be short enough that it frequently clears and re-establishes the underlying VC, preferably ten or twenty times for each successful packet delivery. Considering that most VC networks assume virtual circuit setups to be rare events (some even have central nodes performing all circuit setup and teardown operations) I think this could cause a lot of havoc. Telephony has been around a long time, but we still don't know how to build a large circuit switched network (virtual or otherwise) that isn't susceptible to congestion collapse. Just see the notes on net.ham-radio about what happened to phone service in Tucson AZ during the recent TAPR TNC sale. The only guaranteed way out is to have enough network resources for the absolute worst possible case. In most long haul networks this is clearly out of the question, so you just try to deal with it as best as you can. > 3. Datagram networks tend to break down when fully loaded; this is a > consequence of (2) above. There are ways around this, but they > involve running the system in a derated mode, where keeping all > links as busy as possible is not attempted. The ARPANET technology > really works only because the ARPANET has substantially more link > bandwidth than it needs for its traffic volume; this is > a well known problem. TELENET started out with ARPANET technology > but has since gone to virtual circuits internally to get better link > utilization. In any case, the IMP system of the ARPANET is not > a true datagram system internally, although it exports a datagram > interface. I don't understand this comment. It is at least possible to re-route excess traffic around a congested area when datagrams are used. If you have N virtual circuits established through a given fixed route, there's not much you can do if all N users decide to send simultaneously, overloading the links along the route. Of course, you could statically allocate link bandwidth and buffer space for each virtual circuit, something that is difficult to do in a datagram network. However, this defeats the whole point of packet switching, namely the statistical sharing of resources. If you really want to guarantee throughput once a connection is established, build a pure circuit-switched network; if you want the guaranteed ability to establish a connection at any time, put in a leased line. TELENET went to VCs internally for two reasons: a) they only had to provide a virtual circuit service, X.25; b) the bulk of their traffic consists of single character packets from people typing on dumb terminals. In this case the larger datagram headers were the deciding factor. > 4. Datagram systems have some serious vulnerabilities. One bad guy can > hog the network and clog up the links. Datagram systems tend to > rely on hosts being well-behaved. With virtual circuits, the network > has a positive throttle over host traffic generation, and can keep > bad hosts from interfering with other traffic. In networks with > no central administrative authority over hosts, this is a serious > problem in practice. The ARPANET/MILNET gateways are already > under serious strain because of this exact problem. Tight standards > and anti-bad-guy queuing algorithms in nodes can solve this > problem; unfortunately the Internet lacks both. Not unlike virtual circuit networks. The IP "source quench" is a protocol; unfortunately many hosts refuse to adhere to it. I could also refuse to adhere to X.25 and send traffic outside of my agreed-apon window, for example, or I could (and do!) establish additional virtual circuits to my destination to circumvent the much-touted per-VC network flow control ability. The only answer in either case is to cut off hosts that don't play by the rules, but this is an implementation problem, not a problem with the protocols. > 5. Accounting is difficult in datagram systems. What should a phone > bill for a datagram net look like? Histograms of traffic by > time and destination? Just a total amount? The network may need > to recognize clumps of packets for similar destinations and treat > them as a ``call'' for billing purposes. > PDNs already charge for both connect time and for packets sent. (I've sometimes suggested, only half in jest, that the real reason they don't like datagram services is because they'd no longer be able to charge for connect time.) Since most datagram traffic would continue to be "clustered" to a small set of destinations, I don't see any problem with billing by per-destination packet counts in the local switch. TELENET punts the issue anyway, since their charges are distance-independent. I guess we don't really disagree on what needs to be done to make datagram networks like the Internet behave well under loads as they grow. My suggestions are as follows: 1. Implement mechanisms to "punish" hosts that misbehave by ignoring ICMP source quench messages. 2. Make sure that each packet switch has more than enough buffer memory to handle all but extremely unusual peak traffic bursts. The older IMPs and IP gateways are probably the major offenders in this regard. I suspect that memory-starved IP gateways account for the vast majority of dropped datagrams (ignoring causes such as unreachable destinations, of course.) Regardless of the protocol, the laws of queuing theory still apply. If you use an internal flow control mechanism to avoid dropping packets in a memory-starved packet switch, you won't be able to utilize your outgoing link as efficiently. The larger the queue on your outgoing link, the closer you'll be able to approach 100% utilization. 3. Use link level acknowledgements only on those paths (radio, dialup modems) that are unreliable enough to justify them. Better yet, do something to improve the raw error rate on the links. Get rid of link acknowledgments on all other paths to improve link efficiency. 4. Once the above steps are taken, the dropped packet rate should fall to a very low value. Once this happens, it should be possible to convince TCP implementers to lengthen their retransmission timers significantly to avoid congestion collapse when round trip delays jump because of sudden load. If you can send a datagram with a very high degree of confidence that it'll get there (eventually), people won't be tempted to use such trigger-happy retransmission timers. Phil Karn
karn@petrus.UUCP (Phil R. Karn) (09/04/85)
Here's some more information about Telenet that might be interesting. As I understood an explanation given by one of their employees at an amateur packet radio conference last fall, each of their packet switches appears as a self-contained X.25 "network", and these switches speak to each other with X.75. (X.75 was intended as an "Internetwork" protocol for the interconnection of X.25 networks owned by different operators, but it, like the DoD "Internet" Protocol, can also be used internally within individual networks as well.) Once an initial connection is established, the packet switch translates the VC identifier in each arriving packet to the proper identifier for the correct outgoing link, and sends the packet. This operation applies to flow control (RR/RNR) as well as data packets. What this means is that regardless of the setting of the D-bit, flow control in Telenet is done on an end-to-end basis. (The "D", or "Delivery Confirmation" bit is supposed to control whether X.25 DCE packet level acknowledgements indicate that the packet has been accepted by the network or whether it has actually been delivered to the other DTE.) The result of this is that you can never have more than W (the window size, typically 2) packets in flight at any one time on each virtual circuit. The carrier likes this, since it alleviates his congestion problems, but the user hates it because it puts an upper bound on his throughput. This is also the reason why the CSNET software is forced to open multiple, parallel virtual circuits in order to get reasonable throughput. Of course, only those of us who run datagram protocols above X.25 are able to pull this trick; everybody else has to put up with lousy throughput. Once this is done, I don't see any easy way for the network to control potential congestion if resources are limited. Maximizing user throughput and preventing network congestion seem to be in fundamental opposition, and virtual circuit network protocols are no panacea. Phil
jbn@wdl1.UUCP (09/05/85)
Karn's comments are good. There are ways to abuse virtual circuit networks too. But they tend not to happen by accident, and existing calls are usually not interfered with when the call-setup process is bottlenecked. You can build virtual circuit systems which provide an assured level of service if you can get any service (i.e. can place a call) and we don't know how to do this for datagrams yet. I'm writing a paper on queueing in datagram networks which will have a new solution to part of the congestion problem, one good enough to keep things going in the presence of obnoxious hosts. When I get it done, I will post it here, as well as submitting it for publication. John Nagle
ch@gipsy.UUCP (Christian Huitema) (09/10/85)
Please don't try to revive the old VC/D.gram polemic! Refer to the litterature instead. The big difference between VC & D.grams, from a commercial point of view, is the possibility to guarantee a certain level of "quality of service". During the call set-up phase, buffers can be reserved in the intermediate nodes for the virtual circuit; it is even possible to reserve some part of the transmission ressource (An extreme case is the simulation of "physical circuits" by TDMA satellite). Obviously, the "per duration" charge derives from the amount of ressources that were reserved. This charge is generally *small*. (Transpac charges FF0.02, i.e. $0.0018, per minute of connection, for a 1200bit/s national connection). The worst "per duration" charges are encountered on international connections. The other difference is the ability to avoid congestion due to "transmission" overload. A typical PSDN operates at a load of 40-50% per link during the peak hours. It is possible to block the incoming packets; a user that would try to ignore the window limitations would get his packets rejected (reinitialisation cause "remote protocol error"). It is also possible to choose the "best route" at call set up time, thus avoiding the "congested areas". The same behaviour is not recommanded on a Datagram network, as it tends to propagate congestion. Still, VC networks can be congested, just like Telephone networks, by an excess of calls. That was the reason of Transpac "black friday", last june. However, flow control procedure have been already implemented on some telephone networks, and could be ported onto PSDNs. At INRIA we experimented with LAN-satellite connections; the first design used a datagram based gateway for "transparent" interconnection. However we found out that the efficiency was poor, as the gateway had to trow away packets when the load increased. Thus, in the next design, we have implemented X25 on top of Ethernet, which allows for an easy interconnecton to the outer world, and for a much more efficient usage of external connection. This will not cause an undue overload for local communications, as we can use the "class 0" transport protocol, i.e. no end to end acknowledgments. X25 allows you to optimize windows and packet sizes on each subnetworks. The last, but not the least, advantage of X25 is that all public data networks are interconnected, and that one can establish a direct connection with virtually any computer in the world.
karn@petrus.UUCP (Phil R. Karn) (09/17/85)
> Please don't try to revive the old VC/D.gram polemic! Refer to the > litterature instead. Actually, I think the real issue here is whether you should have some form of resource preallocation in your network. This affects the datagram/VC choice, because VC networks allow for preallocation at circuit setup time while datagram nets can't. I've done a lot of thinking about this issue lately and am fast coming to the conclusion that if you REALLY need guaranteed bandwidth after "circuit setup", then you want an ordinary circuit-switched network, not packet switching. If your traffic has a peak-to-average ratio near 1 for long periods of time, or if you're willing to pay extra for reserve idle bandwidth, then circuit switching is clearly superior to any form of packet switching, be it datagram or virtual circuit. Packet switching is meant to statistically multiplex for transmission traffic from a collection of users whose individual requirements are unpredictably bursty. As you pool more and more users, however, the law of large numbers means that the aggregate traffic becomes more and more predictable. Therefore as public data networks grow and their links increase in bandwidth, I think that the need for "preallocation" will decrease considerably. Preallocation of buffer space is, I think, an obsolete issue. Maybe this was important at one time when memory was expensive, but now there's little reason why packet switches cannot be given so much memory that they almost never have to drop packets or invoke congestion control. Delays may get large, but only if there isn't enough transmission bandwidth to go around. > At INRIA we experimented with LAN-satellite connections; the first > design used a datagram based gateway for "transparent" interconnection. > However we found out that the efficiency was poor, as the gateway had to > trow away packets when the load increased. Only because you didn't have enough buffer space in your gateways. > Thus, in the next design, we have > implemented X25 on top of Ethernet, which allows for an easy interconnecton > to the outer world, and for a much more efficient usage of external > connection. This will not cause an undue overload for local communications, > as we can use the "class 0" transport protocol, i.e. no end to end > acknowledgments. X25 allows you to optimize windows and packet sizes on each > subnetworks. I'm not willing to live dangerously and trust assurances of reliability from any network, even Ethernet. Therefore I'm not willing to give up using a transport protocol with end-to-end acknowledgements, like TCP. Worrying about protocol overhead on a 10 mb/s LAN, where the minimum packet size is 60 bytes anyway, is silly. The best way to improve efficiency (i.e., reduce the effects of protocol header overhead) on ANY network is to send fewer, larger packets. This will have a much greater effect than trying to trim down the header sizes, because it will also reduce packet switch CPU requirements. > The last, but not the least, advantage of X25 is that all public data > networks are interconnected, and that one can establish a direct connection > with virtually any computer in the world. The same is true of the public telephone network, but it doesn't mean that it's ideal for my purposes. Phil
jbn@wdl1.UUCP (10/01/85)
``Lumpiness'' is a sign of proper adaptation to overload. The alternative, given the same bandwidth resources, is falling further and further behind as you send more and more tiny packets. Try two 4.2BSD systems connected via an overloaded net for comparison. Obviously it's better to have the bandwidth, but lumpiness is far better than continually losing ground. Or would you rather have the keyboard lock when you get too far ahead, as with the old IBM 2741? John Nagle