matthews@alux2.ATT.COM (John Matthews) (03/19/89)
Last week we replaced a DEC Lan Bridge with a new Proteon P4200 router to create a local subnet for our building. Ever since, things have been running extremely slow for the people that get their CAD software through our gateway. They do rely heavily on huge CAD executables that get sent through the gateway. I have been looking into this for quite some time and I am finally posting a message here to see what other people have done in similar situations. What I have found is that default mounts in NFS make reads and writes that are 8192 bytes long. The kernel gets these and then in turn fragments these requests into up to 9 UDP packets. If the proteon discards even one of these packets, all 9 of them have to get retransmitted. I went around and changed all of the NFS mounts to do 1024 byte reads and writes. This seemed to improve things a little. Another thing that I have noticed is that we are getting extremely high collision rates on the SUNS. They add up to about a million and a half for the past week. Someone told me that the SUNS don't abide by a standard that says they should wait 10 milliseconds between each packet they send to give others a chance to transmit. They told me they only wait 1 millisecond, if that. Could this be causing alot of collisions? There are only around 15 Sun clients and one server sitting on each of two bridged ethernets in the building where they are having all of the problems. In the main building we have things set up the same way with collisions adding up to around 40,000. There does seem to be alot of broadcasting going on that ethernet that could cause this. There is a problem stemming from the fact that older versions of UNIX are trying to forward IP broadcast packets. When these hosts receive a broadcasted RIP packet addressed to 128.94.255.255, they think it's a packet destined to a specific machine and they then try and forward it. For every such packet, an ARP request is broadcasted on the ethernet. There are about 16 machines running the old network software and 5 routers generating up to 5 rip packets every 30 seconds. I believe that added up to around 28,000 broadcasts per hour. Temporarily, I answered these ARP requests and pointed them to a device that would ignore them, but the network is still slow. Is there anything wrong with responding to these ARP requests with an ethernet address that doesn't really exist on that network. Then the machines running the old network software would just forward it into a black hole. Am I thinking right or would this cause problems? What will the DEC Lan Bridges do with an ethernet packet when it has no idea which side that ethernet device is really on. Will every bridge throughout the network pass this packet everytime it's sent? Last night we tried to configure an extra ethernet board on the fileserver that houses all of the CAD software and connect it to the other ethernet cable to give them back some speed. All we did was uncomment the ie1 interface in the kernel config file, recompile the kernel and reboot. We didn't change any of the /etc/*rc* files at all. When the sun came back up, all of the old NFS mounts on the clients just timed out. The NFS deamons wouldn't service any NFS requests. I was able to use telnet and rlogin to connect to hosts on either side after manually ifconfig'ing the new ie1 interface. I gave up and tried it on another fileserver. It did the exact same thing. The thing that doesn't make sense is that the only thing we did was add one ethernet device to the kernel and then nothing worked the way it used to. We rebooted on the old kernels and everything was back to normal. We called SUN but they didn't seem to know what the problem was. Has anyone else ever encountered such problems? We are eventually going to move some of that software to servers in that building so that they aren't pounding on the gateway. I wasn't aware that the proteon had such little bandwidth compared to a LAN bridge. How on earth can they go from Pronet-80 to ethernet when they can't come close to handling ethernet's full 10 megabits/s? What percent of 10 mbits/s can a proteon really route from one ethernet to another? Has anyone done some real life performance testing? What other things could I do to optimize NFS traffic? If there are things that I am wrong about, please let me know. This has been a frustrating week to say the least. If anyone could spare a few minutes on the phone, please e-mail me your phone number. I'd really appreciate it. John Matthews ulysses!aloft!matthews@princeton.edu matthews@aloft.att.com matthews@research.att.com
kline@tuna.cso.uiuc.edu (Charley Kline) (03/19/89)
> There is a problem stemming from > the fact that older versions of UNIX are trying to forward IP broadcast > packets. When these hosts receive a broadcasted RIP packet addressed > to 128.94.255.255, they think it's a packet destined to a specific > machine and they then try and forward it. For every such packet, an > ARP request is broadcasted on the ethernet. There are about 16 > machines running the old network software and 5 routers generating up > to 5 rip packets every 30 seconds. I believe that added up to around > 28,000 broadcasts per hour. Temporarily, I answered these ARP requests > and pointed them to a device that would ignore them, but the network is > still slow. Is there anything wrong with responding to these ARP > requests with an ethernet address that doesn't really exist on that > network. Yow! Be careful with your broadcast address! If you have old Suns, they'll be using a zero's broadcast (128.94.0.0). The p4200 by default will use ones broadcast. I'm not surprised you're getting 28000 broadcasts an hour and all those collisions. Mixing broadcast addresses is a good way to get an ethernet meltdown, because exactly the behavior you describe happens. Go to the same broadcast address everywhere on your net and I bet a lot of your problem will go away. Since you're running old Suns, which don't give you a choice, I suspect you'll have to change everything to zero's broadcast. Hope this helps. ----- Charley Kline, University of Illinois Computing Services kline@tuna.cso.uiuc.edu {uunet,seismo,pur-ee,convex}!uiucdcs!uiucuxc!kline "Flaring high or flaring early makes the little prop tips curly."
dlr@daver.UUCP (Dave Rand) (03/19/89)
In article <633@garcon.cso.uiuc.edu> kline@tuna.cso.uiuc.edu (Charley Kline) writes: >> the fact that older versions of UNIX are trying to forward IP broadcast >> packets. When these hosts receive a broadcasted RIP packet addressed >> to 128.94.255.255, they think it's a packet destined to a specific >Yow! Be careful with your broadcast address! If you have old Suns, they'll >be using a zero's broadcast (128.94.0.0). The p4200 by default will use Please forgive my ignorance, but I have noticed this as well. Why do machines ARP for broadcast IP addresses? -- Dave Rand {pyramid|hoptoad|sun|vsi1}!daver!dlr
hedrick@geneva.rutgers.edu (Charles Hedrick) (03/20/89)
Sun NFS does not violate any standard. Now and again when problems like this come up somebody floats this rumor that they somehow transmit "too fast". It just isn't true. It is true that NFS pushes networking technology very hard, but if it doesn't work something on your network is broken. In particular, there is no standard that says something should wait 10 milliseconds between transmissions. If two systems try to transmit at the same time, there is a pattern of random exponential backoff. But if only one host is sending, you may see as little as 9.6 microsecond (I think -- this is from memory) between packets. This stuff is done by the Ethernet controller chip, and Sun uses the same chips as everybody else. I agree with the other response: you've got to find a way for all machines that do broadcasts to use an address that all machines on the network can cope with. In my opinion the safest is 128.92.0.0. Presumably there will be a way to set all machines to use this as their broadcast address. Responding to ARP's with a bogus Ethernet address is a standard technique, and I don't know of anything wrong with it, but it's better to set things up so that you don't need to do that. You should also try setting ipforwarding to 0 in all of your unix kernels: adb -w /vmunix ipforwarding?W 0 ^D This will cause the systems not to attempt to forward stray packets. They may however still send ICMP unreachables back to the source. However this is far better than an ARP, because it isn't a broadcast. If you're willing to do a bit more work in adb on your Sun's, you can make them accept the correct broadcast address. Unfortunately I don't have the exact offset, but if you do adb /vmunix ipintr?i and keep hitting CR, you will eventually find a section of code that calls if_ifwithaddr, compares an address with -1 (or it might show as 0xffffffff), and then calls ip_forward. You want to change the comparison with -1 to compare with the actual broadcast address used on your network. This advice is Sun-specific. Now, as to the rsize and wsize settings. Certainly if a gateway or bridge is dropping packets, you may do better off by using values smaller than the original 8192 default. However 1024 may be going a bit too far. Most Ethernet interfaces can handle two packets in a row, so 2048 would make sense. It does help performance to use as large a number as your hardware can handle reliably. We are able to use 2048 through all of the bridges and routers that we've tried. From what I know of Proteon's hardware, I'd be very surprised if they couldn't handle at least that much. What is critical is how much buffering there is on the Ethernet interface. Newer interfaces often have enough that you can use the full 8192. We have no problem with 8192 with cisco routers that use their new MCI Ethernet interfaces. It's fairly easy to test. Try changing the value and then doing some operation that generates a lot of NFS traffic. Do "nfsstat" before and after the test. Look at "retrans" and "timeout" compared to the total. A couple of percent is OK. My normal test is "cp /server/usr/lib/* /dev/null" Be sure you specify "intr" as one of the mount options, so you can abort the test if something goes wrong. (In fact, I would always specify "intr". I don't know why it isn't the default.)
bob@tinman.cis.ohio-state.edu (Bob Sutterfield) (03/21/89)
In article <Mar.19.21.47.45.1989.5916@geneva.rutgers.edu> hedrick@geneva.rutgers.edu (Charles Hedrick) writes:
...You should also try setting ipforwarding to 0 in all of your
unix kernels:
adb -w /vmunix
ipforwarding?W 0
^D
This will cause the systems not to attempt to forward stray
packets...
...in fact, none at all! Note that Rutgers uses lots of dedicated
little boxes, not UNIX beasties, as IP routers. The original
question, which included a description of an attempt to install a
second Ethernet interface in a Sun backplane, sounded as if they
planned to use the Sun as an IP router between two Ethernets. In this
case, you'd want to leave ipforwarding turned on, at least in the
kernel of the Sun to be used as a router.
hedrick@geneva.rutgers.edu (Charles Hedrick) (03/21/89)
You're quite right. I meant you should set ipforwarding to 0 on machines that aren't gateways. If you do it on a machine that is a gateway, it will stop gatewaying. You get broadcast storms when every machine on your network tries to ARP the broadcast address. If you turn ipforwarding off on the machines that aren't actually gateways, then at least you cut down the number of machines participating in the storm to just the gateways. On most networks this is a very worthwhile gain.
mac@proteon.com (Michael A. Curtis) (03/22/89)
John,
Here are a couple of things to try to help with the throughput
problem with your SUN's running NFS. Some of this is a rehash of what
you have already seen, hopefully with a better explanation. Number 1)
deals with configuring the P4200's to quiet down the ARP storms that
the SUN's are causing. It is also recommended that you also go into
each SUN which is not acting as a gateway and turn off IP forwarding.
By turning off the IP forwarding, you will be able to limit the size
of the ARP storm. However, you must realize that you will still see a
series of ICMP host unreachable messages for each RIP packet. The
real fix for this situation is to have all machines configured with
the same broadcast address, ie 0's or 1's. Again, the future
implications must be considered as you may be able (in fact, required
if you are using systems which are running 4.2 BSD) to set the
broadcast to 0's on all machines; however, you will eventually have to
change all machines to a 1's broadcast as BSD Unix cleans this issue
up. Number 2) deals with setting window size for NFS in an effort to
improve performance.
1) - Broadcasts
By default, the Sun machines will try and forward any packet they
receive that:
A. Is not for them, but is on the same (sub)network (depending on what
rev SW they are running).
B. Is on a different network.
This behaviour occurs irregardless of the method of packet
reception (addressed, multicast, or broadcast). (This is a generic
problem with all hosts using networking software derived from the 4.2
BSD distribution.)
The broadcast storm comes from the fact that they are not
recognizing the IP destination as the broadcast address for that
(sub)network. The older Sun's only recongnize 0.0.0.0 or net.0. When
they see net.255, they just think that's another host on the
(sub)network (no different from net.254), and forward it. This is why
we let you set the broadcast so many different ways, and why the
default fill is still 0.
To change the broadcast fill you need to go into the CONFIG
process (T 6) and to process 0 ("IP Config>" prompt). Type "set
broadcast". You might try a LOCAL WIRE broadcast, but first just set
the fill pattern to 0 with a NETWORK broadcast.
The problem with NFS is that it sends large UDP packets (default
8192 bytes), which are then fragmented at the IP level. The problem
with this is that a large number of packets (6) are sent in a row. If
you send a burst of 6 packets, we will probably miss one or more,
especially on a heavily loaded ethernet. Even though the host will
retransmit, the retransmitted burst has a different IP unique ID, so
the IP fragments from the two transmissions cannot be combined. This
results in having to retransmit until all 6 get through at once.
2) - NFS window size
The solution is to shrink the size of the UDP writes. Sun also
has to do with when running through a Sun as a router, or when using a
Sun-2 as a file server for a fast machine like a Sun-3 or Sun-4.
Here is an old message on the subject:
How to slow down NFS is something we've known you could do, but
not known how for some time. It appears to be important when running
NFS through our p4200, which is why I'm posting it here. Someone
might also want to post it to the p4200 mailing list, if it is still
an outstanding issue in the field.
>From Sun Software Technical Bulletin, February 1987, in the customer
buglist section:
-------
135. Synopsis: fstab entry for sun-3 mounting from a 3com sys not documented
Release: 3.2
Description:
When a Sun-3 system with a Sun Ethernet board NFS mounts a
filesystem from a system (sun-2 only? I don't know) that has a
3com ethernet board, extra entries are required on the
/etc/fstab line to make it work successfully. The Sun-3/ie
machine pumps out packets so fast that the 3com system can't
keep up. The rsize and wsize of packets has to be limited, or
else you get lots and lots of retransmissions and "server not
responding" msgs. This was documented in the 2.0 to 3.0 Change
notes (or release notes), but is not noted in the 3.2 manual
set. It needs to be in the System Administration manual in the
section covering entworking and NFS.
Work Around:
The line in /etc/fstab needs to be of this form:
3com_machine:/usr /usr nfs rw,noquota,soft,rsize=2048,wsize=2048 0 0
(the rsize and wsize entries are what is relevant)
Additionally, we have no record of you ever calling in to Proteon
for support or assistance. You should be aware that you can request
help directly from Proteon either via telephone (508-898-3100) or
electronic mail (to bug-cgw@proteon.com) as detailed in the support
pamphlet. Additionally, there is a P4200 users group which you can
address mail to. To become a member, send a request stating so to
p4200-request@devvax.tn.cornell.edu. To send mail to this list, send
it to p4200@devvax.tn.cornell.edu.
SRA@XX.LCS.MIT.EDU (Rob Austein) (03/22/89)
Something we've occasionally done when experiencing ARP storms is to set up a PC or similar expendable machine ANSWERING the ARP requests for the incorrect broadcast address (and throwing the resultant forwarded packets on the floor). Besides providing a sink for the traffic and thus calming the storm, this provides us with a good way to monitor exactly which hosts are using the wrong address, so that we can go in with the debuggers and service calls and icepicks with some assurance that we know who the culprits are. Tasteless, but it works.... --Rob
phil@BRL.MIL (Phil Dykstra) (03/23/89)
We did a similar hack here at BRL. Our gateways answer broadcast ARP requests with the ethernet address (hex) 0:0:D:E:A:D The buggers still keep forwarding of course but at least they quit ARPing. This address makes the culprits easy to spot with tcpdump. - Phil
mogul@WSL.DEC.COM (Jeffrey Mogul) (03/23/89)
Rob Austein writes:
Something we've occasionally done when experiencing ARP storms is to
set up a PC or similar expendable machine ANSWERING the ARP requests
for the incorrect broadcast address (and throwing the resultant
forwarded packets on the floor). Besides providing a sink for the
traffic and thus calming the storm, this provides us with a good way
to monitor exactly which hosts are using the wrong address, so that we
can go in with the debuggers and service calls and icepicks with some
assurance that we know who the culprits are.
Tasteless, but it works....
NOVICES, BEWARE: under no circumstances should the answer to an
ARP for a broadcast IP address return the broadcast Ethernet address.
This is clearly not what Rob is doing, and of course "nobody would
ever do this" ... but I have heard of people doing it. Chernobyl
was nothing compared to this.
If I were supporting an ARP implementation, I would have it check
to make sure it wasn't inserting a broadcast/multicast address in
any of the hardware address fields of an ARP message ... and if it
detects such an attempt, it should start the "host self-destruct"
sequence to make sure that a person who makes this mistake is
properly chastised.
-Jeff