[comp.dcom.lans] checksumming Novell?

hedrick@athos.rutgers.edu (Charles Hedrick) (11/15/90)

We've been bitten a few times by corrupted data in files and routing
updates transmitted by Novell.  The ultimate cause is a problem in a
serial controller card on one of our routers.  The vendor is fixing
the problem.  However I'm very nervous about using a protocol that
doesn't seem to have any protocol-level way of detecting bad packets.
Novell is based on XNS, but apparently they always set the checksum
field to ones.  Does anyone know whether it's possible to configure
Novell software to do checksumming?  Also, if our routers put a
checksum in the checksum field, do you know whether Novell will check
it, ignore it, or reject the packet?

xjeldc@tts.lth.se (Jan Engvald) (11/15/90)

In article <Nov.14.15.47.23.1990.1820@athos.rutgers.edu> hedrick@athos.rutgers.edu (Charles Hedrick) writes:
>We've been bitten a few times by corrupted data in files and routing
>updates transmitted by Novell.  The ultimate cause is a problem in a
>serial controller card on one of our routers.  The vendor is fixing
>the problem.  However I'm very nervous about using a protocol that
>doesn't seem to have any protocol-level way of detecting bad packets.
>Novell is based on XNS, but apparently they always set the checksum
>field to ones.  Does anyone know whether it's possible to configure
>Novell software to do checksumming?  Also, if our routers put a
>checksum in the checksum field, do you know whether Novell will check
>it, ignore it, or reject the packet?

From what I've seen, a packet with an IPX checksum different from hex
FFFF will be rejected. So checksuming is not optional, on the
contrary, it is NOT ALLOWED for Novell IPX. They trust in the Ethernet
CRC, and on small LAN that's usually enough. But not with LANs connected
together with routers.


Jan Engvald, Lund University Computing Center
________________________________________________________________________
   Address: Box 783                E-mail: Jan.Engvald@ldc.lu.se
            S-220 07 LUND     Earn/Bitnet: xjeldc@seldc52
            SWEDEN           (Span/Hepnet: Sweden::Gemini::xjeldc)
    Office: Soelvegatan 18         VAXPSI: psi%2403732202020::xjeldc
 Telephone: +46 46 107458          (X.400: C=se; A=TeDe; P=Sunet; O=lu;
   Telefax: +46 46 138225                  OU=ldc; S=Engvald; G=Jan)
     Telex: 33533 LUNIVER S

ron@Eyring.COM (Ron Holt) (11/16/90)

In article <Nov.14.15.47.23.1990.1820@athos.rutgers.edu> hedrick@athos.rutgers.edu (Charles Hedrick) writes:

>We've been bitten a few times by corrupted data in files and routing
>updates transmitted by Novell.  The ultimate cause is a problem in a
>serial controller card on one of our routers.  The vendor is fixing
>the problem.  However I'm very nervous about using a protocol that
>doesn't seem to have any protocol-level way of detecting bad packets.
>Novell is based on XNS, but apparently they always set the checksum
>field to ones.  Does anyone know whether it's possible to configure
>Novell software to do checksumming?  Also, if our routers put a
>checksum in the checksum field, do you know whether Novell will check
>it, ignore it, or reject the packet?

As far as I have been able to determine, NetWare always sets the
checksum field to 0xffff on transmit.  I'm not sure what it does with
this field on packet reception.  The Novell "NetWare C Interface-DOS",
Volume 1 manual says on page 4-4 that "the checksum field contains a
dummy checksum of the packet contents and is always set by IPX to
0xffff".  The NetWare System Interface Technical Overview states on
page 4-7:  "Checksum: this field was included to conform to the
original Xerox packet header definition.  It is always set to 0xffff by
IPX.  LAN cards do hardware checksums on the entire IPX packet frame,
so this field is unnecessary."  I'm not aware of any configuration
options to make NetWare generate or check checksums.

The 4.3 BSD Tahoe implementation of XNS ignores the checksum field if
the checksum is set to 0xffff.  See nsintr() in netns/ns_input.c.

I'm doing my master's thesis in the area of factors that affect
transport protocol performance and one of the questions I have been
studying is "to checksum or not to checksum".  In the case of Novell,
one of their strengths is that they have always concentrated on making
NetWare is a very high performance product.  In the early days when
4.77 MHz 8086 PCs were common, the time taken to compute the checksum on
these machines probably was a significant bottleneck to network
performance.  However, I would really like to know if this is still
true with today's 25 MHz 386s.  NetWare has worked without a checksum
because, in general, hardware is usually reliable and checksums done by
the hardware at link-level are sufficient to catch packets corrupted on
the wire.

On the other hand, I've read several articles suggesting that an end to
end checksum, such as the one used in TCP/IP, is indeed a Good Thing,
especially in an internetworked environment.  See "End-to-End Arguments
in System Design" by J. H. Saltzer, et. al. in ACM Transactions on
Computer Systems, November 1984.  Other articles discuss how to compute
checksums efficiently and suggest that they might not be the bottleneck
many people believe they are.  See RFC 1071 for a good discussion on
efficient (e.g. 4 bytes-at-a-time) algorithms for computing the
Internet checksum (which I think could be applied to calculating the
XNS/IPX checksum). David Clark, et. al. describe in "An Analysis of TCP
Processing Overhead" (IEEE Communications Magazine, June 1989) their
analysis of the performance of one protocol (TCP/IP) and conclude that,
if properly implemented, the protocol itself is not the bottleneck in
the design of high performance networks.

Saltzer's article described a real-world situation on a LAN at MIT
where occasionally the gateways corrupted packets while in the
gateway's memory.  He uses this example to argue for the need for an
end to end checksum even if the link-level also does a checksum or
CRC.

I personally have had an experience similar to yours while porting
Portable Netware (PNW) to an 88k SVR4 machine manufactured by my former
employer.  The port of PNW went very well; it took about seven hours to
get the basics working (which is a tribute to the portability of the
PNW group's product).  However, we had problems with programs executed
on a PC which were loaded from the PNW server on our machine.  Data
packets were getting mysteriously corrupted causing the executable
program loaded into the PC to be corrupted, thus crashing the PC.  We
finally tracked the problem down to an intermittent bug in our Ethernet
driver.  We were not aware of this problem because, up to then, we had
only used Ethernet with TCP/IP, which detected and corrected any
occasional bad packets.  Situations like this lead to a debate with one
side saying "software and hardware must have a basic level of
reliability" and the other side saying, "yes, this is true, but I would
lose less sleep at nights if there was a checksum to catch occasional
software or hardware glitches and I'm willing to pay for the
performance hit of having this checksum".

I think Novell could make a good product even better if they made
checksumming an option configurable on each NetWare server just as
4.3BSD has the "udpcksum" and "tcpcksum" variables that can be poked in
the kernel to enable/disable checksumming.  With checksums, you get
what you pay for (more reliability at the cost of performance).  The
user should be able to decide what they are willing to pay for.

Anyway, sorry if this discussion doesn't directly solve your problem.
But I think the issue of "to checksum or not to checksum" is an
interesting one and I welcome comments from others.
-- 
Ron Holt	ron@Eyring.COM  uunet!lanai!ron
Eyring Inc.	+1 801-375-2434 x434

haas%basset.utah.edu@cs.utah.edu (Walt Haas) (11/16/90)

In article <1990Nov15.195319.1321@Eyring.COM> ron@Eyring.COM (Ron Holt) writes:
>In article <Nov.14.15.47.23.1990.1820@athos.rutgers.edu> hedrick@athos.rutgers.edu (Charles Hedrick) writes:
>
>>We've been bitten a few times by corrupted data in files and routing
>>updates transmitted by Novell... [war story deleted]
>[description of Novell deleted]... NetWare has worked without a checksum
>because, in general, hardware is usually reliable and checksums done by
>the hardware at link-level are sufficient to catch packets corrupted on
>the wire....[more stuff deleted]

One of the main reasons Novell can survive without a packet-level
checksum is that they usually run on simple networks.  Novell is now in
the process of trying to build a large internal network for their
distributed corporation, and I think they are about to hit a big "oops".

-- Walt