[comp.protocols.tcp-ip] Response to Long Distance NFS Query

KASTEN@MITVMA.MIT.EDU (Frank Kastenholz) (12/21/87)

A few months ago I made a general request to the list about
running NFS internetwork gateways. The specific configuration
that I have to deal with is two Ethernets separated by some
physical distance (possibly intercontinental). There would be
some kind of gateway/bridge/router/thing at each Ethernet and the
things are connected by a medium to high speed serial link (anything
from about 38K to 1.544M bits/sec).

The responses that I received (minus the "gee, what a great idea's"
or "Well, it had to come someday" etc etc responses) are:

(I have left the originating party's name on each on - the rest of the
SMTP/RFC822 junk has been removed).
======================================================================

From:  "James B. VanBokkelen" <JBVB@AI.AI.MIT.EDU>

By default, none of Sun's implementations of NFS use UDP checksums.  If
you enable them, the last release I heard anything about still had the
4.2 UDP checksum mis-calculation.  They assume they're running one hop,
on a CRC-protected medium like Ethernet.

Accordingly, you're not too likely to catch any situation where the NFS
packet is corrupted on the way through a gateway, or over an error-prone
link.  Instant filesystem corruption.  It can certainly be fixed if you
have source, but I don't own any Suns (2nd hand info, this), so I can't
say exactly what must be done.

=======================================================================
From:  jas@MONK.PROTEON.COM (John A. Shriver)

The nature of the Sun NFS fragmented UDP-grams causes many routers and
bridges to have fits.  You get 6 back-to-back IP fragments.  If ANY of
those fragments is lost, the entire UDP-gram must be retranmitted.

You can, however, reduce the size of the UDP-gram.  In /etc/fstab, you
need to add the undocumented rsize & wsize switches.  For example:

across_gw_machine:/usr /usr nfs rw,noquota,soft,rsize=2048,wsize=2048

This will reduce the size of the UDP-gram to 2048 btyes of data, as
opposed to the default 8192.  This will cuase only two fragments
instead of eight.  (Do keep this parameter a multiple of 1024, as all
the network code likes page-aligned buffers.)

For reference, see bug 135 in Sun Software Technical Bulletin,
February 1987, part number 812-8701-01.  (The page numbering is
botched in this one.)
====================================================================
From: slevy@umn-rei-uc.arpa (Stuart Levy)

One problem we've had in NFSing between disparate machines is with
naming them.  The mount request passes the originating machine -name-
rather than having the server use gethostbyaddr().  It's important
to check that "hostname" on the client yields a name known to the
server and vice versa.  That's probably not the whole problem but
can cause things to break.

A guy from Proteon, Mick Scully (mcs@proteon.com) recently visited here
and mentioned that he had mounted NFS filesystems at Berkeley across ARPAnet.
======================================================================
From:  hedrick@ATHOS.RUTGERS.EDU (Charles Hedrick)

We run NFS over cisco routers, either directly connecting two
Ethernets or connecting Ethernets via a T1 line.  The only problem is
that the Ethernet cards used by cisco (and others) can't handle large
numbers of consecutive packets.  So you need to specify
rsize=nnn,wsize=nnn in the mount.  Typically we use 2048, though I
think someting a bit closer to 3000 might give better performance.  I
haven't tried it over anything slower, though we understand that
somebody a Univ. of Maryland mounted one of our disks over NSFnet.
=====================================================================
From:  John Romkey <ROMKEY@XX.LCS.MIT.EDU>

One problem you'll run into is that NFS does not checksum its packets.
NFS packets are UDP-based instead of TCP-based, and the UDP checksum is
optional. On a single ethernet, the ethernet's CRC is possibly reliable enough
to detect bad packets, but through an IP router there is too high a probability
of losing (.1% would mean one out of 1000 packets was damaged; you really
desparately want 0% errors).

The reason is that there are chances for corruption of data in the ethernet
interface of the IP router, in the IP router's memory, and in the
other interface it routes the NFS packet too. The corruption can be due
to hardware errors, electrical noise, memory errors and software problems.
In fact, you've got the same problem with just an NFS server and client on
the same LAN, but since fewer components are involved, the chances of error
are much smaller.

I've spoken with people who've used NFS over a router, and they've actually
seen files corrupted due to the lack of checksums. I'd recommend against it.

BTW, the reason they turn off checksums is to up performance.
                                        - john romkey
=================================================================
=================================================================

Any further discussion should go to the list, to the original author
or directly to me (unfortinately I have recently moved from MIT-Multics
to MITVMA but the list has yet to catch up to me (whoever is running
the distribution list must be on a verryyyyy loooonnnnggg vacation:-))


Seasons greetings to all

Frank Kastenholz

karn@faline.bellcore.com (Phil R. Karn) (12/21/87)

By the way, one advantage of bridges over routers for NFS traffic (at
least for the Vitalink bridges we use) is that they maintain the
original Ethernet CRC; they encapsulate the entire source packet (CRC
included) over the HDLC link. This means that a broken Ethernet
controller in the bridge won't corrupt your checksumless NFS/UDP packets
like a broken Ethernet controller in an IP router would.

Phil

eshop@saturn.ucsc.edu (Jim Warner) (12/22/87)

In article <1649@faline.bellcore.com> karn@faline.bellcore.com (Phil R. Karn) writes:
>By the way, one advantage of bridges over routers for NFS traffic (at
>least for the Vitalink bridges we use) is that they maintain the
>original Ethernet CRC; they encapsulate the entire source packet (CRC
>included) over the HDLC link....

This is *NOT* a general characteristic of all bridges.  It is true
for DEC and Vitalink.  If this characteristic is important, you should
be sure to ask your vendor how they handle it.

jim

melohn@SUN.COM (Bill Melohn) (12/22/87)

You can corrupt an NFS file system invisably by sending NFS/UDP
packets over an unreliable datalink (like the current version of SLIP)
without first turning on UDP checksums, which has been possible since
SunOS 3.2. With our point to point IP router, we do a CRC at the
serial chip level, making it act like an ethernet. Other people have
done NFS over SLIP using error-detecting modems like the Telebit
Trailblazer. In any case, the trend is towards error-free or at least
error correcting hardware/networks, so the NFS/UDP default seems even
more reasonable in a high-fibre future.

karn@faline.bellcore.com (Phil R. Karn) (12/23/87)

> This [maintaining original Ethernet CRCs] is *NOT* a general
> characteristic of all bridges.

Don't I know it. Before the DEC Lanbridge came out, I built my own out
of PDP-11/73s and DEQNAs. Big mistake! The DEQNA has *major* problems
running in promiscuous mode. One common manifestation was undetected
packet corruption. Lots of funny entries showed up in our routing and
ruptime tables because UDP checksums were disabled on the Sun routers.

This experience made me a firm believer in end-to-end checksums for *all*
packets. The performance impact of UDP checksums in NFS is minimal, but
even if it weren't they would still be worth it.  Even ARP suffers from
the lack of an internal checksum.

Phil

SRA@XX.LCS.MIT.EDU (Rob Austein) (12/23/87)

    Date: Tuesday, 22 December 1987  02:01-EST
    From: melohn@Sun.COM (Bill Melohn)

    ...          In any case, the trend is towards error-free or at least
    error correcting hardware/networks, so the NFS/UDP default seems even
    more reasonable in a high-fibre future.

Sorry, but this is a bad idea.  You really do need end-to-end software
checksuming.  MIT discovered this the hard way years ago when a
Chaosnet "bridge" (a level 3 router in spite of the name) developed a
stuck bit.  Chaosnet hardware does hardware checksumming, like
Ethernet (in fact, these days, most of it IS Ethernet, even at MIT
there are only two subnets left still using Chaosnet hardware).  The
Chaosnet hardware faithfully transported all the bits entrusted to it,
but the packets were corrupted nonetheless.

Things only get worse when you start talking about long haul nets.

--Rob

hedrick@athos.rutgers.edu (Charles Hedrick) (12/25/87)

I guess I'm about to jump on the bandwagon for turning on NFS
checksumming.  We just had Sun field service replace an Ethernet board
because we started noticing corrupted files transported via NFS.  No
gateways or bridges involved.  It was apparently a failure in the
Ethernet interface board.  After the vacation I'm going to look into
turning on checksumming everywhere.  This was not our first problem.
The other one was due to a design bug in the ACC 1822 Multibus card.
When put into a gateway with more than one Ethernet card, the load got
too heavy for the chips they used to drive it.  The bus arbitration
didn't work.  It stomped on the bus cycles of other devices.  Result
was random garbaging of data.  TCP worked fine, but NFS files were
garbaged.  The board has just recently been fixed.  Of course with
these low-rate failures, if checksumming were turned on, we would
probably never even know we had a problem.  On the other hand, it
seems a bit drastic to use garbage in user files as a diagnostic.

LYNCH@A.ISI.EDU (Dan Lynch) (12/26/87)

Gee,  when I used to work in a computer center we had this marvelous
procedure called "running diagnostics".  We did it to make sure all
the equipmetn was in proper working order.  Now that we have networking
have we forgotten our past???  What I see missing is a definite package
of diagnostic prodecures to check out each "piece of the system".

If the "network IS the computer" it needs to be treated like one.

Dan
-------

JBVB@AI.AI.MIT.EDU ("James B. VanBokkelen") (12/28/87)

In the Chaosnet example mentioned, the router was running just fine,
and the memory problem was corrupting one in N packets forwarded.
Yeah, a diagnostic would have found it, but networks are big and
fuzzy, and the failure was intermittent, and I think the people
who first realized there was a problem spent some time just locating
it, and some more time thinking it was a software bug.

It would be nice if everything ran memory diagnostics as the idle
task, and it would be nice if there weren't interfaces which corrupt
packets silently under some conditions.  Maybe someday.  For the
moment, I think end-to-end error detection is a good thing.

jbvb

ron@TOPAZ.RUTGERS.EDU (Ron Natalie) (01/03/88)

Sure, and I remeber running DECX for days and having things turn up
100 % OK, but then having the machine blow up with hardware errors
five minutes after the normal OS was booted.  There's no diagnostic
like actually trying to use the system.

-Ron