berry@zehntel.UUCP (01/17/84)
#R:allegra:-220500:zinfandel:12400044:000:283 zinfandel!berry Jan 16 12:47:00 1984 4.2 UNIX domain datagram service is 'unreliable' in the sense that datagrams are not GUARANTEED to get to their recipient. Overall, they are in fact quite reliable until your Ethernet gets really loaded. Berry Kercheval Zehntel Inc. (ihnp4!zehntel!zinfandel!berry) (415)932-6900
andree@uokvax.UUCP (01/22/84)
#R:allegra:-220500:uokvax:6200010:000:374 uokvax!andree Jan 20 21:38:00 1984 Gee, that's interesting. Non-guaranteed datagram service in Unix would fit well with the Unix philosophy ("all the rope you need, and you can hang yourself if you want to"), and has been done before (the ameoba (sp?) system). I *hope* they did it that way in the kernel. I also hope they provided a library that DOES DO reliable transmission - in multiple flavors. <mike
rpw3@fortune.UUCP (01/31/84)
#R:allegra:-220500:fortune:11600049:000:4908
fortune!rpw3 Jan 31 02:26:00 1984
[This lengthy tutorial probably belongs in net.arch, but the discussion
has been here so far.]
O.k., nobody has come forth to defend "UNIX domain datagrams", so here it is...
>>> Why datagrams SHOULD be "unreliable". <<<
The internet datagram "style" is based on the observation that
the end processes in any communication have to be ultimately
responsible for "transaction integrity" so they might as well be
resonsible for all of it. No amount of intermediate error checking and
retransmission can GUARANTEE reliable synchronization if the ultimate
producer and consumer do not do the handshake. The layers on layers
of protocols don't hack it, if the critical state is outside the end
process. Nodes can crash; links can crash; nodes and links can go down
and up. Servers (e.g. mail) still have to do their own ultimate lost
message and duplication checking. (I will not argue that point
further. If you disagree, go see your local communications wizard and
get him/her to explain.) (Also, a moment of silence for anyone who
thinks X.25 is a "reliable" protocol.)
Given that the responsibility for ultimate error correction lies in the
end-point processes, the transmission and switching portion of the net
can get A LOT cheaper and simpler. Instead of trying (vainly) to GUARANTEE
that no data is lost (with the attendant headaches of very careful buffer
management, flow-control, load shedding, load-balancing, re-routing,
synchronizing, etc.), in the internet datagram style (DoD IP, Xerox NS, etc.)
the transmission system makes a "good effort" to get your packet from
here to there. The only thing that IS demanded is that the probability
of receiving a bad (damaged) packet that is claimed to be good should
be VERY small. (Since that is a one-way requirement, it's fairly easy.)
So if the packet has a bit error, throw it away; if the outgoing queue
won't hold the packet, throw it away (that line's overloaded anyway);
if the route's not valid anymore, toss it. Somebody (the end process)
will try again soon anyway. (Two notes: 1. It is considered polite
BUT NOT NECESSARY to send an error packet back, if you know where "back"
is; and 2. if the system is to be generally considered usable, the
long-term error rate should be less than 1%, although short-term losses
of 10% or more don't hurt anything.)
This seemingly cavalier attitude results in ENORMOUS savings in complexity,
memory, and CPU ticks for the intermediate nodes, which merely make a
(good but not perfect) attempt to throw the packet out the next link.
Packet switching rates of several hundred to several thousand per second
are easily attainable with cheap micros. The routers don't have to have
any "memory" (other than the routing tables). They are not responsible
for "connections", or "re-transmissions", or "timeouts". They don't know
a terminal from a file (since they don't know either!).
Secondly, the CPU/memory load of handling the connections/retransmisions/etc.
is spread out where there is lots of resources -- at the end points. The
backbone nodes just move data, so they can move lots of it. (Think of a
hundred IBM PC's using your VAX to move files back and forth. Who do you
want to do the busy work, the VAX or the PC's?)
Thirdly, the end process always had to do about 70-90% of the work anyway,
duplicating the work the network was doing (and sometimes triplicating the
work that the kernel was duplicating, on top of that); the added 30-10%
is easily justfied by the savings in the net (or in the kernel, if we are
talking about process-to-process on a single host -- I didn't forget).
The total number of CPU ticks on an end-point processor can even go DOWN,
because of the smaller number of encapsulations (layers) packets have to go
through. (In the simplest case, there are only three layers: client, datagram
router or internet, and physical.)
Lastly, there are some applications (voice, time-of-day) where you do not
want the network trying to "help" you. A voice packet that is perfect but
is late because it got retransmitted might as well have been lost -- it's
useless. Ditto time-of-day.
(whew! is it soup yet?)
So "unreliable" when talking about datagrams means "not perfect",
and is a desirable attribute. Desirable, since the cost of "reliability"
is very high and the goal illusary in any case. On a single processor,
it makes sense sometimes to have other (reliable) inter-process primitives
besides datagrams, if (1) throughput is paramount and (2) the set of
cooperating processes will NEVER be distributed. But the overhead of
handling the "retransmission" can be made small (and processes DO die
sometimes, even on uni-processors), so the argument for "reliable" IPC
is weaker than most people think.
Rob Warnock
UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD: (415)595-8444
USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065
ka@hou3c.UUCP (Kenneth Almquist) (02/04/84)
In response to Rob Warnock's defense of unreliable datagrams: I have nothing against networks (like the ARPANET) that are based upon unreliable datagrams, but I don't like the idea of lots of user programs using unreliable datagrams. They force each user to provide for retransmission of lost packets, so there will be a lot of reinventing the wheel. And since testing the effectiveness of error recovery schemes is rather difficult, probably many of these retransmission schemes will be tested inadequately or not at all. The result is likely to be programs that work most of the time but suffer occasional mysterious failures. Ideally I would like an interprocess communication mechanism that would handle all errors for the user. It is not possible for any communications protocal to recover if either the data network or the program being communcated dies, so the application program must be informed of these errors and must be programmed to deal with them; but I would require the protocal to recover from other errors types of errors. Three possible objections to this: 1) It is unimplementable. No. Stream sockets meet these requirements. 2) It is too inefficient to implement. I believe that some of the most heavily used protocals on the Arpanet, such as smtp (mail) and ftp (file transfer) are built on top of tcp rather than being built directly on ip (unreliable datagrams), so presumably the Arpanet developers find the cost of doing so acceptable. When the two processes are on one machine it is easy to make communication reliable, and it's also more efficient to do so than to have programs constantly setting alarm signals to determine whether the programs they are talking to have gone away. The cost of setting up a connection probably makes stream sockets unacceptable for some applications, but that is a problem with streams rather than with the concept of reliable communications. The V operating system, for example, provides efficient interprocess communication without requiring a "connection" to be established first. 3) Retransmission is not good for real time applications such as voice or time of day. True, but UNIX is not good for real time applications either. Nor are unreliable datagrams necessarily good either; they may arrive out of order after arbitrary delays. Kenneth Almquist P. S. While I'm not an expert on X.25, I've never heard anyone call it unreliable before. It doesn't have dynamic routing on a per message basis, so if a link goes down anybody who has virtual circuts over the link looses them, if that's what you mean. However, a protocal does not need end to end acknowledgements to avoid losing data. The level 2 protocals on each link ensure that no data is lost. If a node crashes then of course any data in its memory is lost but that is irrelevant since the virtual circut has been lost anyway.
rpw3@fortune.UUCP (02/07/84)
#R:allegra:-220500:fortune:11600052:000:2831 fortune!rpw3 Feb 6 20:41:00 1984 In response to Ken Almquist: 1. As Eric said, if you want reliable UNIX domain communication, use streams, which provide that service. IP datagrams are by definition unreliable. [But... if you are using UNIX domain as IPC, see below.] 2. In general, what makes good IPC within a single (tightly-coupled multi-)processor does not make a good (loosely-coupled network) message system, and vice-versa. The tradeoffs are too different. Trying to use one where the other fits is awkward at best. (Microsecond busy-waits are often appropriate in the former case; sleeps of seconds or minutes in the latter.) A good fast low-overhead IPC can sometimes be one of the mechanisms on which networking is built (see the CMU "Hydra/C.mmp" book), but even so one must be careful about synchronization and queueing. (It is not yet clear to me that S-5 IPC is fast/cheap enough.) Simple/clean IPC is rarely achieved on top of networking (but see the literature on "Remote Procedure Calls".) When the reason for having multiple processes in the first place was to "solve" some synchronization/event-waiting problem with multi-programming WITHIN a single process, the lack of adequate sync/event mechanisms often comes back to bite the IPC user. Whether it's "software interrupts on events" or the 4.2 "select", SOME form of "tell-me-about-the-first-interesting-event" seems necessary for real concurrency. Having a forest of processes, each waiting on one specific event, is useful only if the process sleep/wake time is VERY small, and efficient fine-grained locks on shared-memory data are available. (Again, see the "Hydra" book, conversely also see Holt's "Concurrent-Euclid/UNIX/Tunis".) I guess I'm trying to say, "Look again at why you're using UNIX domains at all." Preparation for future networking? [O.k., fine] ... or as a type of IPC? If IPC, would you rather have some other kind? [Nothing wrong with "making do", but one needn't celebrate the crutch.] 3. A close reading of the X.25 standard will reveal that a "RESET" message is permitted from the network or either party at any time, with no synchronization with the other party. (Remember, X.25 is a network ACCESS protocol, not a peer-peer protocol.) "RESET" does NOT drop the connection, it just resets ("drops") the sequence numbers. This can cause data to be lost and/or duplicated unless a higher level stream protocol (with its own sequence numbers) is used on top of X.25 connections. Networks may issue "RESET" at any time, such as when load-shedding data to relieve buffer congestion. Rob Warnock UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065
rpw3@fortune.UUCP (02/07/84)
#R:allegra:-220500:fortune:11600053:000:1040 fortune!rpw3 Feb 7 00:45:00 1984 +-------------------- | From: Peter Vanderbilt <pv.usc-cse@rand-relay> | | It would be convenient to be able to reliably send one message | without going through the trouble (and overhead) of setting up | a stream. +-------------------- That's what the Xerox NS "Packet Exchange" protocol is all about. Even so, you should be careful to set up your services to be idempotent. Since any given packet may be lost/duplicated, it should be o.k. for a given request (with the same tag/cookie/handle) so be acted on more than once. Note that UNIX read's and write's have this property, if the file offset ["lseek" pointer] is included in the request, and you don't do a second operation 'til the first has succeeded. (Unfortunately, "open" and "close" are harder, without additional state in the server that the simple protocol was designed to avoid in the first place.) Rob Warnock UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065
beau@sun.uucp (Beau James) (02/07/84)
With respect to the reliability of the X.25 protocol ... The X.25 protocol set (level 3, packet; level 2, link; level 1, electrical) was designed for a network whose general structure looks like (DTE) ---- (DCE) .............. (DCE) ---- (DTE) According to the CCITT spec, X.25 specifies the *interface* to the packet transmission network - the "----" link in the diagram. What happens to packets inside the transmission network on their way to the remote end DTE is not specified. The specification does not even tell you how to implement a DCE, although most of the differences are obvious. The unreliabilities arise from the fact that there is no part of X.25 specification that is assured to be end-to-end between the DTEs that have the open virtual circuit. The CCITT spec specifically leaves that decision up to the implementors of the packet data network (PDN) (assumed to be the government communication agencies (PTTs) everywhere but in the U.S.). Many control messages can cause duplicate packets end-to-end because they may be generated locally. (E.g. DTE RESET: only one end of the connection may be reset; if so, and that DTE resends unacknowledged packets, the "remote" DTE may see the same data twice.) Even the ordinary data packet acknowledgement scheme is not reliable, since the DCE may acknowledge successful transmission locally, meaning across the DTE/DCE interface. If the data does not get to the remote DTE for any reason, there is no mechanism for determining which packets got lost. Not very useful, but all according to the standard. This is not to say that the X.25 protocols cannot be USED in a reliable, end-to-end network design, IF the implementor ensures that all the X.25 acknowledgement and control messages have end-to-end significance. The Data General Xodiac(TM) network uses X.25 virtual circuits as end-to-end session connections over several different transmission media, for example. But when a PDN is the "transmission medium", the virtual circuits are not necessarily reliable (it depends on the PDN). In the final analysis, each top-level service protocol (mail, file transfer, etc.) has to provide its own end-to-end reliability, if it cares. Beau James Sun Microsystems, Inc.
kre@mulga.SUN (Robert Elz) (02/13/84)
There's nothing remarkable about datagrams not being reliable under unix. No output is reliable. You're not guaranteed to get an error if the disk that you happen to be writing on developes a coughing fit just at the time you do your write, you're not guaranteed to get an error if the process at the other end of a pipe dies before reading your data (though you will if its already dead when you send it), nor will you get an error if your output to a terminal is mangled by noise on the phone, or simply by some super-user type doing a "wall" at the relevant time. Why should unix datagrams do something different? (As has been mentioned before, they would also cease to be datagrams by the traditional definition.) You will often get an error indication if something is wrong, but you may not. I do agree though that for most programs, datagrams are not an intelligent service to use. As to X.25, that can certainly not be considered to be reliable. DCE's are permitted to RESET a virtual circuit as often as they deem fit, and each RESET may cause data to be lost. Higher level protocols are necessary to guarantee data integrity. Robert Elz, decvax!mulga!kre