hedrick@athos.rutgers.edu (Charles Hedrick) (11/15/90)
We've been bitten a few times by corrupted data in files and routing updates transmitted by Novell. The ultimate cause is a problem in a serial controller card on one of our routers. The vendor is fixing the problem. However I'm very nervous about using a protocol that doesn't seem to have any protocol-level way of detecting bad packets. Novell is based on XNS, but apparently they always set the checksum field to ones. Does anyone know whether it's possible to configure Novell software to do checksumming? Also, if our routers put a checksum in the checksum field, do you know whether Novell will check it, ignore it, or reject the packet?
xjeldc@tts.lth.se (Jan Engvald) (11/15/90)
In article <Nov.14.15.47.23.1990.1820@athos.rutgers.edu> hedrick@athos.rutgers.edu (Charles Hedrick) writes: >We've been bitten a few times by corrupted data in files and routing >updates transmitted by Novell. The ultimate cause is a problem in a >serial controller card on one of our routers. The vendor is fixing >the problem. However I'm very nervous about using a protocol that >doesn't seem to have any protocol-level way of detecting bad packets. >Novell is based on XNS, but apparently they always set the checksum >field to ones. Does anyone know whether it's possible to configure >Novell software to do checksumming? Also, if our routers put a >checksum in the checksum field, do you know whether Novell will check >it, ignore it, or reject the packet? From what I've seen, a packet with an IPX checksum different from hex FFFF will be rejected. So checksuming is not optional, on the contrary, it is NOT ALLOWED for Novell IPX. They trust in the Ethernet CRC, and on small LAN that's usually enough. But not with LANs connected together with routers. Jan Engvald, Lund University Computing Center ________________________________________________________________________ Address: Box 783 E-mail: Jan.Engvald@ldc.lu.se S-220 07 LUND Earn/Bitnet: xjeldc@seldc52 SWEDEN (Span/Hepnet: Sweden::Gemini::xjeldc) Office: Soelvegatan 18 VAXPSI: psi%2403732202020::xjeldc Telephone: +46 46 107458 (X.400: C=se; A=TeDe; P=Sunet; O=lu; Telefax: +46 46 138225 OU=ldc; S=Engvald; G=Jan) Telex: 33533 LUNIVER S
ron@Eyring.COM (Ron Holt) (11/16/90)
In article <Nov.14.15.47.23.1990.1820@athos.rutgers.edu> hedrick@athos.rutgers.edu (Charles Hedrick) writes: >We've been bitten a few times by corrupted data in files and routing >updates transmitted by Novell. The ultimate cause is a problem in a >serial controller card on one of our routers. The vendor is fixing >the problem. However I'm very nervous about using a protocol that >doesn't seem to have any protocol-level way of detecting bad packets. >Novell is based on XNS, but apparently they always set the checksum >field to ones. Does anyone know whether it's possible to configure >Novell software to do checksumming? Also, if our routers put a >checksum in the checksum field, do you know whether Novell will check >it, ignore it, or reject the packet? As far as I have been able to determine, NetWare always sets the checksum field to 0xffff on transmit. I'm not sure what it does with this field on packet reception. The Novell "NetWare C Interface-DOS", Volume 1 manual says on page 4-4 that "the checksum field contains a dummy checksum of the packet contents and is always set by IPX to 0xffff". The NetWare System Interface Technical Overview states on page 4-7: "Checksum: this field was included to conform to the original Xerox packet header definition. It is always set to 0xffff by IPX. LAN cards do hardware checksums on the entire IPX packet frame, so this field is unnecessary." I'm not aware of any configuration options to make NetWare generate or check checksums. The 4.3 BSD Tahoe implementation of XNS ignores the checksum field if the checksum is set to 0xffff. See nsintr() in netns/ns_input.c. I'm doing my master's thesis in the area of factors that affect transport protocol performance and one of the questions I have been studying is "to checksum or not to checksum". In the case of Novell, one of their strengths is that they have always concentrated on making NetWare is a very high performance product. In the early days when 4.77 MHz 8086 PCs were common, the time taken to compute the checksum on these machines probably was a significant bottleneck to network performance. However, I would really like to know if this is still true with today's 25 MHz 386s. NetWare has worked without a checksum because, in general, hardware is usually reliable and checksums done by the hardware at link-level are sufficient to catch packets corrupted on the wire. On the other hand, I've read several articles suggesting that an end to end checksum, such as the one used in TCP/IP, is indeed a Good Thing, especially in an internetworked environment. See "End-to-End Arguments in System Design" by J. H. Saltzer, et. al. in ACM Transactions on Computer Systems, November 1984. Other articles discuss how to compute checksums efficiently and suggest that they might not be the bottleneck many people believe they are. See RFC 1071 for a good discussion on efficient (e.g. 4 bytes-at-a-time) algorithms for computing the Internet checksum (which I think could be applied to calculating the XNS/IPX checksum). David Clark, et. al. describe in "An Analysis of TCP Processing Overhead" (IEEE Communications Magazine, June 1989) their analysis of the performance of one protocol (TCP/IP) and conclude that, if properly implemented, the protocol itself is not the bottleneck in the design of high performance networks. Saltzer's article described a real-world situation on a LAN at MIT where occasionally the gateways corrupted packets while in the gateway's memory. He uses this example to argue for the need for an end to end checksum even if the link-level also does a checksum or CRC. I personally have had an experience similar to yours while porting Portable Netware (PNW) to an 88k SVR4 machine manufactured by my former employer. The port of PNW went very well; it took about seven hours to get the basics working (which is a tribute to the portability of the PNW group's product). However, we had problems with programs executed on a PC which were loaded from the PNW server on our machine. Data packets were getting mysteriously corrupted causing the executable program loaded into the PC to be corrupted, thus crashing the PC. We finally tracked the problem down to an intermittent bug in our Ethernet driver. We were not aware of this problem because, up to then, we had only used Ethernet with TCP/IP, which detected and corrected any occasional bad packets. Situations like this lead to a debate with one side saying "software and hardware must have a basic level of reliability" and the other side saying, "yes, this is true, but I would lose less sleep at nights if there was a checksum to catch occasional software or hardware glitches and I'm willing to pay for the performance hit of having this checksum". I think Novell could make a good product even better if they made checksumming an option configurable on each NetWare server just as 4.3BSD has the "udpcksum" and "tcpcksum" variables that can be poked in the kernel to enable/disable checksumming. With checksums, you get what you pay for (more reliability at the cost of performance). The user should be able to decide what they are willing to pay for. Anyway, sorry if this discussion doesn't directly solve your problem. But I think the issue of "to checksum or not to checksum" is an interesting one and I welcome comments from others. -- Ron Holt ron@Eyring.COM uunet!lanai!ron Eyring Inc. +1 801-375-2434 x434
haas%basset.utah.edu@cs.utah.edu (Walt Haas) (11/16/90)
In article <1990Nov15.195319.1321@Eyring.COM> ron@Eyring.COM (Ron Holt) writes: >In article <Nov.14.15.47.23.1990.1820@athos.rutgers.edu> hedrick@athos.rutgers.edu (Charles Hedrick) writes: > >>We've been bitten a few times by corrupted data in files and routing >>updates transmitted by Novell... [war story deleted] >[description of Novell deleted]... NetWare has worked without a checksum >because, in general, hardware is usually reliable and checksums done by >the hardware at link-level are sufficient to catch packets corrupted on >the wire....[more stuff deleted] One of the main reasons Novell can survive without a packet-level checksum is that they usually run on simple networks. Novell is now in the process of trying to build a large internal network for their distributed corporation, and I think they are about to hit a big "oops". -- Walt