[comp.protocols.tcp-ip] NFS Performance through Routers

matthews@alux2.ATT.COM (John Matthews) (03/19/89)

Last week we replaced a DEC Lan Bridge with a new Proteon P4200 router
to create a local subnet for our building.  Ever since, things have
been running extremely slow for the people that get their CAD software
through our gateway.  They do rely heavily on huge CAD executables that
get sent through the gateway.  I have been looking into this for quite
some time and I am finally posting a message here to see what other
people have done in similar situations.  What I have found is that
default mounts in NFS make reads and writes that are 8192 bytes long.
The kernel gets these and then in turn fragments these requests into up
to 9 UDP packets.  If the proteon discards even one of these packets,
all 9 of them have to get retransmitted.  I went around and changed all
of the NFS mounts to do 1024 byte reads and writes.  This seemed to
improve things a little.  Another thing that I have noticed is that we
are getting extremely high collision rates on the SUNS.  They add up to
about a million and a half for the past week.  Someone told me that the
SUNS don't abide by a standard that says they should wait 10
milliseconds between each packet they send to give others a chance to
transmit.  They told me they only wait 1 millisecond, if that.   Could
this be causing alot of collisions?  There are only around 15 Sun
clients and one server sitting on each of two bridged ethernets in the
building where they are having all of the problems.  In the main
building we have things set up the same way with collisions adding up
to around 40,000.  There does seem to be alot of broadcasting going on
that ethernet that could cause this.  There is a problem stemming from
the fact that older versions of UNIX are trying to forward IP broadcast
packets.  When these hosts receive a broadcasted RIP packet addressed
to 128.94.255.255, they think it's a packet destined to a specific
machine and they then try and forward it.  For every such packet, an
ARP request is broadcasted on the ethernet.  There are about 16
machines running the old network software and 5 routers generating up
to 5 rip packets every 30 seconds.  I believe that added up to around
28,000 broadcasts per hour.  Temporarily, I answered these ARP requests
and pointed them to a device that would ignore them, but the network is
still slow.  Is there anything wrong with responding to these ARP
requests with an ethernet address that doesn't really exist on that
network.  Then the machines running the old network software would just
forward it into a black hole.  Am I thinking right or would this cause
problems?  What will the DEC Lan Bridges do with an ethernet packet
when it has no idea which side that ethernet device is really on.  Will
every bridge throughout the network pass this packet everytime it's
sent?

Last night we tried to configure an extra ethernet board on the
fileserver that houses all of the CAD software and connect it to the
other ethernet cable to give them back some speed.  All we did was
uncomment the ie1 interface in the kernel config file, recompile the
kernel and reboot.  We didn't change any of the /etc/*rc* files at
all.  When the sun came back up, all of the old NFS mounts on the
clients just timed out.  The NFS deamons wouldn't service any NFS
requests.  I was able to use telnet and rlogin to connect to hosts on
either side after manually ifconfig'ing the new ie1 interface.  I gave
up and tried it on another fileserver.  It did the exact same thing.
The thing that doesn't make sense is that the only thing we did was add
one ethernet device to the kernel and then nothing worked the way it
used to.  We rebooted on the old kernels and everything was back to
normal. We called SUN but they didn't seem to know what the problem
was.  Has anyone else ever encountered such problems?

We are eventually going to move some of that software to servers in
that building so that they aren't pounding on the gateway.  I wasn't
aware that the proteon had such little bandwidth compared to a LAN
bridge.  How on earth can they go from Pronet-80 to ethernet when they
can't come close to handling ethernet's full 10 megabits/s?  What
percent of 10 mbits/s can a proteon really route from one ethernet to
another?  Has anyone done some real life performance testing?

What other things could I do to optimize NFS traffic?

If there are things that I am wrong about, please let me know.  This
has been a frustrating week to say the least.  If anyone could spare a
few minutes on the phone, please e-mail me your phone number.  I'd
really appreciate it.
				John Matthews
				ulysses!aloft!matthews@princeton.edu
				matthews@aloft.att.com
				matthews@research.att.com

kline@tuna.cso.uiuc.edu (Charley Kline) (03/19/89)

> There is a problem stemming from
> the fact that older versions of UNIX are trying to forward IP broadcast
> packets.  When these hosts receive a broadcasted RIP packet addressed
> to 128.94.255.255, they think it's a packet destined to a specific
> machine and they then try and forward it.  For every such packet, an
> ARP request is broadcasted on the ethernet.  There are about 16
> machines running the old network software and 5 routers generating up
> to 5 rip packets every 30 seconds.  I believe that added up to around
> 28,000 broadcasts per hour.  Temporarily, I answered these ARP requests
> and pointed them to a device that would ignore them, but the network is
> still slow.  Is there anything wrong with responding to these ARP
> requests with an ethernet address that doesn't really exist on that
> network.

Yow! Be careful with your broadcast address! If you have old Suns, they'll
be using a zero's broadcast (128.94.0.0). The p4200 by default will use
ones broadcast. I'm not surprised you're getting 28000 broadcasts an hour
and all those collisions. Mixing broadcast addresses is a good way to get
an ethernet meltdown, because exactly the behavior you describe happens.

Go to the same broadcast address everywhere on your net and I bet a lot
of your problem will go away. Since you're running old Suns, which don't
give you a choice, I suspect you'll have to change everything to zero's
broadcast.

Hope this helps.

-----
Charley Kline, University of Illinois Computing Services
kline@tuna.cso.uiuc.edu
{uunet,seismo,pur-ee,convex}!uiucdcs!uiucuxc!kline

"Flaring high or flaring early makes the little prop tips curly."

dlr@daver.UUCP (Dave Rand) (03/19/89)

In article <633@garcon.cso.uiuc.edu> kline@tuna.cso.uiuc.edu (Charley Kline) writes:
>> the fact that older versions of UNIX are trying to forward IP broadcast
>> packets.  When these hosts receive a broadcasted RIP packet addressed
>> to 128.94.255.255, they think it's a packet destined to a specific
>Yow! Be careful with your broadcast address! If you have old Suns, they'll
>be using a zero's broadcast (128.94.0.0). The p4200 by default will use
Please forgive my ignorance, but I have noticed this as well. Why do
machines ARP for broadcast IP addresses?


-- 
Dave Rand
{pyramid|hoptoad|sun|vsi1}!daver!dlr

hedrick@geneva.rutgers.edu (Charles Hedrick) (03/20/89)

Sun NFS does not violate any standard.  Now and again when problems
like this come up somebody floats this rumor that they somehow
transmit "too fast".  It just isn't true.  It is true that NFS pushes
networking technology very hard, but if it doesn't work something on
your network is broken.  In particular, there is no standard that says
something should wait 10 milliseconds between transmissions.  If two
systems try to transmit at the same time, there is a pattern of random
exponential backoff.  But if only one host is sending, you may see as
little as 9.6 microsecond (I think -- this is from memory) between
packets.  This stuff is done by the Ethernet controller chip, and Sun
uses the same chips as everybody else.

I agree with the other response: you've got to find a way for all
machines that do broadcasts to use an address that all machines on the
network can cope with.  In my opinion the safest is 128.92.0.0.
Presumably there will be a way to set all machines to use this as
their broadcast address.  Responding to ARP's with a bogus Ethernet
address is a standard technique, and I don't know of anything wrong
with it, but it's better to set things up so that you don't need to do
that.  You should also try setting ipforwarding to 0 in all of your
unix kernels:
   adb -w /vmunix
   ipforwarding?W 0
   ^D
This will cause the systems not to attempt to forward stray packets.
They may however still send ICMP unreachables back to the source.
However this is far better than an ARP, because it isn't a broadcast.
If you're willing to do a bit more work in adb on your Sun's, you
can make them accept the correct broadcast address.  Unfortunately
I don't have the exact offset, but if you do  
   adb /vmunix
   ipintr?i
and keep hitting CR, you will eventually find a section of code
that calls if_ifwithaddr, compares an address with -1 (or it might
show as 0xffffffff), and then calls ip_forward.  You want to change
the comparison with -1 to compare with the actual broadcast 
address used on your network.  This advice is Sun-specific.

Now, as to the rsize and wsize settings.  Certainly if a gateway or
bridge is dropping packets, you may do better off by using values
smaller than the original 8192 default.  However 1024 may be going a
bit too far.  Most Ethernet interfaces can handle two packets in a
row, so 2048 would make sense.  It does help performance to use as
large a number as your hardware can handle reliably.  We are able to
use 2048 through all of the bridges and routers that we've tried.
From what I know of Proteon's hardware, I'd be very surprised if they
couldn't handle at least that much.  What is critical is how much
buffering there is on the Ethernet interface.  Newer interfaces often
have enough that you can use the full 8192.  We have no problem with
8192 with cisco routers that use their new MCI Ethernet interfaces.
It's fairly easy to test.  Try changing the value and then doing some
operation that generates a lot of NFS traffic.  Do "nfsstat" before
and after the test.  Look at "retrans" and "timeout" compared to the
total.  A couple of percent is OK.  My normal test is 
"cp /server/usr/lib/* /dev/null" Be sure you specify "intr" as one of
the mount options, so you can abort the test if something goes wrong.
(In fact, I would always specify "intr".  I don't know why it isn't the
default.)

bob@tinman.cis.ohio-state.edu (Bob Sutterfield) (03/21/89)

In article <Mar.19.21.47.45.1989.5916@geneva.rutgers.edu> hedrick@geneva.rutgers.edu (Charles Hedrick) writes:
   ...You should also try setting ipforwarding to 0 in all of your
   unix kernels:
      adb -w /vmunix
      ipforwarding?W 0
      ^D
   This will cause the systems not to attempt to forward stray
   packets...

...in fact, none at all!  Note that Rutgers uses lots of dedicated
little boxes, not UNIX beasties, as IP routers.  The original
question, which included a description of an attempt to install a
second Ethernet interface in a Sun backplane, sounded as if they
planned to use the Sun as an IP router between two Ethernets.  In this
case, you'd want to leave ipforwarding turned on, at least in the
kernel of the Sun to be used as a router.

hedrick@geneva.rutgers.edu (Charles Hedrick) (03/21/89)

You're quite right.  I meant you should set ipforwarding to 0 on
machines that aren't gateways.  If you do it on a machine that is a
gateway, it will stop gatewaying.  You get broadcast storms when every
machine on your network tries to ARP the broadcast address.  If you
turn ipforwarding off on the machines that aren't actually gateways,
then at least you cut down the number of machines participating in the
storm to just the gateways.  On most networks this is a very
worthwhile gain.

mac@proteon.com (Michael A. Curtis) (03/22/89)

John,

	Here are a couple of things to try to help with the throughput
problem with your SUN's running NFS.  Some of this is a rehash of what
you have already seen, hopefully with a better explanation.  Number 1)
deals with configuring the P4200's to quiet down the ARP storms that
the SUN's are causing.  It is also recommended that you also go into
each SUN which is not acting as a gateway and turn off IP forwarding.
By turning off the IP forwarding, you will be able to limit the size
of the ARP storm.  However, you must realize that you will still see a
series of ICMP host unreachable messages for each RIP packet.  The
real fix for this situation is to have all machines configured with
the same broadcast address, ie 0's or 1's.  Again, the future
implications must be considered as you may be able (in fact, required
if you are using systems which are running 4.2 BSD) to set the
broadcast to 0's on all machines; however, you will eventually have to
change all machines to a 1's broadcast as BSD Unix cleans this issue
up.  Number 2) deals with setting window size for NFS in an effort to
improve performance.

1) - Broadcasts

     By default, the Sun machines will try and forward any packet they
receive that:

A. Is not for them, but is on the same (sub)network (depending on what
rev SW they are running).

B. Is on a different network.

     This behaviour occurs irregardless of the method of packet
reception (addressed, multicast, or broadcast).  (This is a generic
problem with all hosts using networking software derived from the 4.2
BSD distribution.)

     The broadcast storm comes from the fact that they are not
recognizing the IP destination as the broadcast address for that
(sub)network.  The older Sun's only recongnize 0.0.0.0 or net.0.  When
they see net.255, they just think that's another host on the
(sub)network (no different from net.254), and forward it.  This is why
we let you set the broadcast so many different ways, and why the
default fill is still 0.

     To change the broadcast fill you need to go into the CONFIG
process (T 6) and to process 0 ("IP Config>" prompt).  Type "set
broadcast".  You might try a LOCAL WIRE broadcast, but first just set
the fill pattern to 0 with a NETWORK broadcast.

     The problem with NFS is that it sends large UDP packets (default
8192 bytes), which are then fragmented at the IP level.  The problem
with this is that a large number of packets (6) are sent in a row.  If
you send a burst of 6 packets, we will probably miss one or more,
especially on a heavily loaded ethernet.  Even though the host will
retransmit, the retransmitted burst has a different IP unique ID, so
the IP fragments from the two transmissions cannot be combined.  This
results in having to retransmit until all 6 get through at once.

2) - NFS window size

     The solution is to shrink the size of the UDP writes.  Sun also
has to do with when running through a Sun as a router, or when using a
Sun-2 as a file server for a fast machine like a Sun-3 or Sun-4.

     Here is an old message on the subject:

     How to slow down NFS is something we've known you could do, but
not known how for some time.  It appears to be important when running
NFS through our p4200, which is why I'm posting it here.  Someone
might also want to post it to the p4200 mailing list, if it is still
an outstanding issue in the field.

>From Sun Software Technical Bulletin, February 1987, in the customer
buglist section:
-------
135. Synopsis: fstab entry for sun-3 mounting from a 3com sys not documented
     Release: 3.2

     Description:
	When a Sun-3 system with a Sun Ethernet board NFS mounts a
	filesystem from a system (sun-2 only? I don't know) that has a
	3com ethernet board, extra entries are required on the
	/etc/fstab line to make it work successfully. The Sun-3/ie
	machine pumps out packets so fast that the 3com system can't
	keep up. The rsize and wsize of packets has to be limited, or
	else you get lots and lots of retransmissions and "server not
	responding" msgs. This was documented in the 2.0 to 3.0 Change
	notes (or release notes), but is not noted in the 3.2 manual
	set. It needs to be in the System Administration manual in the
	section covering entworking and NFS.

     Work Around:
	The line in /etc/fstab needs to be of this form:
	3com_machine:/usr /usr nfs rw,noquota,soft,rsize=2048,wsize=2048 0 0
	(the rsize and wsize entries are what is relevant)

     Additionally, we have no record of you ever calling in to Proteon
for support or assistance.  You should be aware that you can request
help directly from Proteon either via telephone (508-898-3100) or
electronic mail (to bug-cgw@proteon.com) as detailed in the support
pamphlet.  Additionally, there is a P4200 users group which you can
address mail to.  To become a member, send a request stating so to
p4200-request@devvax.tn.cornell.edu.  To send mail to this list, send
it to p4200@devvax.tn.cornell.edu.

SRA@XX.LCS.MIT.EDU (Rob Austein) (03/22/89)

Something we've occasionally done when experiencing ARP storms is to
set up a PC or similar expendable machine ANSWERING the ARP requests
for the incorrect broadcast address (and throwing the resultant
forwarded packets on the floor).  Besides providing a sink for the
traffic and thus calming the storm, this provides us with a good way
to monitor exactly which hosts are using the wrong address, so that we
can go in with the debuggers and service calls and icepicks with some
assurance that we know who the culprits are.

Tasteless, but it works....

--Rob

phil@BRL.MIL (Phil Dykstra) (03/23/89)

We did a similar hack here at BRL.  Our gateways answer broadcast ARP
requests with the ethernet address (hex)

	0:0:D:E:A:D

The buggers still keep forwarding of course but at least they quit ARPing.
This address makes the culprits easy to spot with tcpdump.

- Phil

mogul@WSL.DEC.COM (Jeffrey Mogul) (03/23/89)

Rob Austein writes:
    Something we've occasionally done when experiencing ARP storms is to
    set up a PC or similar expendable machine ANSWERING the ARP requests
    for the incorrect broadcast address (and throwing the resultant
    forwarded packets on the floor).  Besides providing a sink for the
    traffic and thus calming the storm, this provides us with a good way
    to monitor exactly which hosts are using the wrong address, so that we
    can go in with the debuggers and service calls and icepicks with some
    assurance that we know who the culprits are.
    
    Tasteless, but it works....

NOVICES, BEWARE: under no circumstances should the answer to an
ARP for a broadcast IP address return the broadcast Ethernet address.
This is clearly not what Rob is doing, and of course "nobody would
ever do this" ... but I have heard of people doing it.  Chernobyl
was nothing compared to this.

If I were supporting an ARP implementation, I would have it check
to make sure it wasn't inserting a broadcast/multicast address in
any of the hardware address fields of an ARP message ... and if it
detects such an attempt, it should start the "host self-destruct"
sequence to make sure that a person who makes this mistake is
properly chastised.

-Jeff