roy@phri.UUCP (Roy Smith) (09/08/86)
In article <6@cvbnet.uucp> acrotty@cvbnet.uucp (Art Crotty) asked about broadcasting a file to all the nodes on an ethernet. In article <335@mc0.UUCP> garyf@mc0.UUCP (Gary Friedman) explained that TCP/IP guarantees safe delivery for point-to-point links and that UDP allows for broadcasting, but at the price of reduced reliability. Gary then suggests that Art probably wants to use NFS, since it's designed to allow efficient sharing of a file by many hosts. Clearly, UDP is not suitable for times when you really care if the data gets delivered or not (rwho uses UDP, doesn't it?). As I understand it, NFS uses UDP as the underlying transport protocol but to improve performance, Sun has turned off checksumming in NSF/UDP packets. Presumably NFS does its own error checking at a higher level, so they can get away with ignoring checksums at the lower levels. Has anybody done any studies to determine if this causes any problems? I've heard random comments by people on the net that they don't like what Sun did, but has anybody taken a serious look at the situation and found cases where corrupted UDP packets have caused user-visible NFS errors? On the other hand, has anybody made any measurements to see just how much NFS would be slowed down if UDP checksumming were turned back on? -- Roy Smith, {allegra,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
steve@umcp-cs.UUCP (Steve D. Miller) (09/09/86)
In article <2428@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: (various discussion about broadcast file transfer et al deleted) > Clearly, UDP is not suitable for times when you really care if the >data gets delivered or not (rwho uses UDP, doesn't it?). As I understand >it, NFS uses UDP as the underlying transport protocol but to improve >performance, Sun has turned off checksumming in NSF/UDP packets. >Presumably NFS does its own error checking at a higher level, so they can >get away with ignoring checksums at the lower levels. Nope. As far as I can determine (by looking at the rpc stuff that's going on, and by being familiar with how NFS hands things off to UDP/IP to have them sent), there is no checksumming going on. In fact, the standard Sun kernel has UDP checksums turned off both in udp_output() and in the NFS/RPC kernel UDP fastsend routine. Without source, there isn't any way (other than doing strange things with patching binaries) to turn checksumming on in the kudp_fastsend() routine; the routine just doesn't do it... > Has anybody done any studies to determine if this causes any >problems? I've heard random comments by people on the net that they don't >like what Sun did... Random comments from Chris or from me, probably... >...but has anybody taken a serious look at the situation >and found cases where corrupted UDP packets have caused user-visible NFS >errors? We've not looked at it seriously, but I'm sure that it's possible. It's not too likely, it seems; some people from Sun have said, "yeah, we know we're not 100% safe, but we've never heard of any problems." The IP header checksum still happens; I would hope (and this perhaps is what Sun is thinking) that the IP header would be trashed along with the rest of the packet. Then again, that's only 20 bytes, and the result of a standard NFS read operation is roughly 4K long... Does anyone know how your average Joe Ethernet board hiccups, if and when it does? Is this a valid assumption? >On the other hand, has anybody made any measurements to see just >how much NFS would be slowed down if UDP checksumming were turned back on? Again, no hard data, but the Sun on my desk runs a kernel that goes through the UDP output code for all my kernel RPC, and I have checksums turned on. It seems a little slower, but not much. -Steve -- Spoken: Steve Miller ARPA: steve@mimsy.umd.edu Phone: +1-301-454-4251 CSNet: steve@umcp-cs UUCP: {seismo,allegra}!umcp-cs!steve USPS: Computer Science Dept., University of Maryland, College Park, MD 20742
dave@rsch.wisc.edu (Dave Cohrs) (09/10/86)
We are running kernels on our vaxen with NFS code, which means the UDP checksums are off. I haven't seen any problems with NFS because of this, however, the lack of checksums does cause problems with rwho. One of our networks is a Proteon 10 Mbit Pronet. [ Note-- I refuse to blame Proteon for the extent of our problems. I am positive that the low quality cables we use are causing our problems. ] Every once in a while, the network flakes out and lots of bad packets get generated. The proteon boards are supposed to have a checksum built in that catches single and double bit errors, but these packets have lots of errors. When UDP gets the packets, it just passes them up to rwho. Well, rwho gets the packets and checks to make sure the hostname is all printable ASCII. This is reasonable, but, often, the errors causes the letters to change case or to become different printable ASCII characters. This leads to some pretty strange host lists when one runs 'ruptime'. I have also seen this kind of mangled rwho packet on one of our Ethernets, but, once again, I haven't seen the lack of checksums affecting NFS. -- Dave Cohrs (608) 262-1204 ..!{harvard,ihnp4,seismo,topaz}!uwvax!dave dave@rsch.wisc.edu
jim@cs.strath.ac.uk (Jim Reid) (09/11/86)
In article <2428@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: > Clearly, UDP is not suitable for times when you really care if the >data gets delivered or not (rwho uses UDP, doesn't it?). As I understand >it, NFS uses UDP as the underlying transport protocol but to improve >performance, Sun has turned off checksumming in NSF/UDP packets. >Presumably NFS does its own error checking at a higher level, so they can >get away with ignoring checksums at the lower levels. UDP checksumming is switched off - but not just for performance reasons. I understand that some implementations can't calculate the checksums properly anyway. There has been some discussion of this in mod.protocols.tcp-ip. There is no "higher-level" checking that I'm aware of. Presumably, you can get away with the IP checksums and the ethernet frame checksums in the IP fragments when constructing a UDP packet that comes in off the net. If there were corruption problems, these should show up as errors in NFS "headers" - RPC structures, file handles, and so on. The kernel would scream about these if and when they happened. > Has anybody done any studies to determine if this causes any >problems? I've heard random comments by people on the net that they don't >like what Sun did, but has anybody taken a serious look at the situation >and found cases where corrupted UDP packets have caused user-visible NFS >errors? On the other hand, has anybody made any measurements to see just >how much NFS would be slowed down if UDP checksumming were turned back on? We've had no problems - mind you, we've not been actively looking for them. Our users would be complaining if they found their files corrupted. So far there have been no complaints. Turning on (or off) UDP checksumming may not be all that informative if it introduces erroneous "bad" packets. I will try to establish some performance figures in the next day or so on a couple of SUN-3's. Naturally, I'll post the results.... Jim ARPA: jim%cs.strath.ac.uk@ucl-cs.arpa, jim@cs.strath.ac.uk UUCP: jim@strath-cs.uucp, ...!seismo!mcvax!ukc!strath-cs!jim JANET: jim@uk.ac.strath.cs "JANET domain ordering is swapped around so's there'd be some use for rev(1)!"
ron@celerity.UUCP (Ron McDaniels) (09/16/86)
Per RFC768, "User Datagram Protocol": "Checksum is the 16-bit one's complement of the one's complement of the pseudo header of informantionfromthe IP header, the UDP header, and the data. . .". I would like to point out that the 32-bit CRC generated with every Ethernet packet and checked by the receiver of the packet is (orders of magnitude?) a far more reliable detector of transmission errors than the artifact of the 1st generation of computers, the checksum. If your Ethernet driver passes corrupted packets into the higher protocol levels, it is because it is ignoring the fact that the Ethernet controller chip has run out of memory or some similar problem and not because an error has crept by the CRC checking logic. In my experience, we have never encountered an Internet checksum error with our very vanilla flavored implimentation of BSD 4.2 networking (at least, not since I fixed the 82586 Ethernet controller chip NO RESOURCES bug in my Ethernet controller driver;-). BTW, a ones-complement checksum across a threaded list of mbufs takes a lot longer than you might intuit. The cleverest assembly language programming really pays off. I added a switch in my in_cksum routine which causes it to immediately return a zero. Makes a network of Celerity C1200s and C1260s run about 8% faster with a heavy TCP networking workload. Problem is that the "foreign" machines on our net don't understand this and refuse to talk to me. R. L. (Ron) McDaniels CELERITY COMPUTING . 9692 Via Excelencia Way . San Diego, California . 92126 (619) 271-9940 . {decvax || ucbvax || ihnp4 || philabs}!sdcsvax!celerity!ron "Yes, my Precious. . . we hates them socket(2)eses!"
eggert@sdcrdcf.UUCP (Paul Eggert) (09/17/86)
In article <2428@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: |NFS uses UDP as the underlying transport protocol but to improve |performance, Sun has turned off checksumming in NFS/UDP packets. |... Has anybody done any studies to determine if this causes any problems? No studies, but I have an anecdote. Last week we had an Intel Ethernet controller board go slightly bad on a Sun-2/120 running 3.0. TCP/IP worked fine, but NFS had rare bit errors without issuing any diagnostic messages. The bit errors were in executable files, causing core dumps that were oddly reproducible due to caching. I wasted some time tracking this down. Had NFS checksummed, the problem would had been evident. -- Paul Eggert, SDC Santa Monica
chris@umcp-cs.UUCP (Chris Torek) (09/20/86)
In article <580@celerity.UUCP> ron@celerity.UUCP (Ron McDaniels) writes: >I would like to point out that the 32-bit CRC generated with every >Ethernet packet and checked by the receiver of the packet is (orders >of magnitude?) a far more reliable detector of transmission errors >than the artifact of the 1st generation of computers, the checksum. >If your Ethernet driver passes corrupted packets into the higher >protocol levels, it is because it is ignoring the fact that the >Ethernet controller chip has run out of memory or some similar >problem and not because an error has crept by the CRC checking >logic. Or perhaps the CRC checking logic has failed, or the CRC was correct but the transfer from Ethernet memory to host memory failed, or any number of other possible glitches. However, it is true that one must trust the hardware to some extent. (The exact extent is often a matter of debate.) For the sake of argument I will assume that Ethernet reliability is high enough that a software check is not worthwhile. But what proponents of no software checksums seem not to have considered is this: Not all networks are Ethernets. There are other systems out there. Many of these systems have considerably higher error rates than Ethernets. By disabling software checksums you preclude the use of these less-reliable but nonetheless useful alternative networks. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
mishkin@apollo.uucp (Nathaniel Mishkin) (09/25/86)
Just an anecdote, not a comment on the reliability of NFS: For people who think that the swell checksumming that your favorite Ethernet board does for you, think again: I once had an Interlan controller whose on-board packet buffer memory went flakey. So the bits coming off the wire were appropriately checksummed and passed, and then trashed on the board and passed up to some unwitting Chaosnet software (which, unlike the IP model, doesn't do checksumming) and on up to an FTP program which happily wrote the wrong bit to a file. Moral of the story: Do end-to-end checksumming. -- Nat Mishkin Apollo Computer Inc. {mit-eddie, wanginst, yale}!apollo!mishkin
mike@BRL.ARPA (Mike Muuss) (09/29/86)
You seem to miss the point that the UDP checksum is an End-to-end protection measure, while the Ethernet CRC is a link-level protection measure. In a one-ethernet system, the UDP checksum only protects you from bad interfaces or bad software. (By the way, interface boards DO fail in such a way as to send in garbaged packets... sometime get me to tell you about the board that mangled 50% of the packets it sent. Nice thing was, neither TCP nor UDP were upset, because they were protected by their checksums...) In a complicated network, such as the real InterNet, which at last count had 150 networks operating connected together by scores of gateways, end-to-end protection measures are *vital*, because you have no way to know (or control) what technology is used to convey your packet. Your error rates will vary. -Mike Muuss
henry@utzoo.UUCP (Henry Spencer) (10/04/86)
> I would like to point out that the 32-bit CRC generated with every Ethernet > packet and checked by the receiver of the packet is (orders of magnitude?) > a far more reliable detector of transmission errors than the artifact of the > 1st generation of computers, the checksum... But it does absolutely nothing to detect errors in the Ethernet controller, the low-level software, and the hardware and software of any gateways through which the packets pass. As the Xerox people have been saying for years, if you want to be sure the data is getting there intact, you put a checksum (or CRC, or whatever) on it as it leaves the sending application, and check that checksum when it reaches the receiving application. Particularly when the "application" is something like NFS, which could make an incredible mess if packets got garbled, there is something to be said for such "end-to-end" error checking. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry
aglew@ccvaxa.UUCP (10/09/86)
> Particularly when the "application" is something like NFS, which could make > an incredible mess if packets got garbled, there is something to be said > for such "end-to-end" error checking. > > Henry Spencer @ U of Toronto Zoology I seem to recall a paper in _Computer Networks_ about a year and a half ago that made a rather convincing case for end-to-end error checking. It's really obvious when you think about it - error checking in lower level protocols can really do nothing for your confidence level in upper level protocols, because the criteria they use to evaluate an acceptable rate of errors may be entirely different from your own. In fact, the authors went on to suggest that it always be possible to turn lower level error checking off, as a performance enhancement, since the upper level protocol *should* do it anyway. Of course, this is an environment where communications engineers calculate acceptable error rates. Do we do that in computers, hmmm ;-) ? Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew 1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms