hahn@amelia.nas.nasa.gov (Jonathan Hahn) (12/19/87)
In article <15@amelia.nas.nasa.gov> hahn@amelia.nas.nasa.gov (Jonathan Hahn) writes: > >Subject: 4.3 BSD /etc/arp bugs >Index: etc/arp.c 4.3BSD > >Description: > ... > > Somewhat related issue: We publish the ARP info for about 75+ > hosts on one of our VAX 780s. I have found that when the ARP table > gets much above 100 entries, it has a strong tendency to crash the > VAX with a Protection Fault panic. This phenomenon has been > repeated on 3 of our 780s (whichever one happens to have the > published entries). Anyone ever heard of this? > In response to this query I heard from: Ken Lalonde <kwlalonde%math.waterloo.edu@RELAY.CS.NET> Jeffrey Mogul <mogul@decwrl.dec.com> Tim Seaver <tas@mcnc.org> Thank you all very much for responding. Two of you suggested arp fixes which we already had in place. Mr. Tim Seaver identified our problem and presented a workaround (described below). I've noticed a potential problem in his workaround, but anyway, the bug really should be fixed, not worked around. Here it is (tersely): The arptable consists of ARPTAB_NB buckets of ARPTAB_BSIZ entries each. arptnew() is called to make a new entry and is responsible for aging and removing entries, if necessary, to place a new entry in the table. Entries can be permanent, in which case they are not aged by arptnew(), but arp table management is designed to always have at least one non-permanent entry in each bucket. This means that there may be no empty entries in a bucket, but there will be at least one age-able entry. BUG: Our panics occur when arptnew() is called to place an entry (non-permanent) in a bucket with no empty entries in it. Under this condition, instead of chucking the oldest non-permanent entry in the bucket, arptnew() returns NULL. However, arptnew() is "guaranteed" not to return NULL when installing a non-permanent entry. Our panics are caused by code which assumes arptnew() works as advertised and dereferences the pointer it returns without checking for NULL. NOTE: A typical site would never invoke the (buggy) aging mechanism since it is unlikely to fill a bucket. As I mentioned in my original posting, we publish lots of arp info and therefore our table is rather full (typically 100+ entries out of 171 max). arptnew() is called from three places. One is an arpioctl() and can be ignored; the other two are arpresolve() and arpinput(). Tim's workaround checked for a NULL return from arptnew() in the former. I think the workaround is incomplete because it should also check for NULL in arpinput(). (There may be a logical reason for this that I missed.) Hope I've made this all clear. I haven't found the bug in the code yet: it all reads just fine. When I find it I'll post it. This isn't a real hot priority, so it probably won't happen for a while. If anyone out there knows any more about this, please let me know. -jonathan hahn NASA Ames Research Center hahn@ames-nas.arpa ...!ames!amelia!hahn