[comp.bugs.4bsd] 4.3 BSD arp kernel bug

hahn@amelia.nas.nasa.gov (Jonathan Hahn) (12/19/87)

In article <15@amelia.nas.nasa.gov> hahn@amelia.nas.nasa.gov (Jonathan Hahn) writes:
>
>Subject: 4.3 BSD /etc/arp bugs
>Index:	etc/arp.c 4.3BSD
>
>Description:
>    ...
>
>    Somewhat related issue:  We publish the ARP info for about 75+
>    hosts on one of our VAX 780s.  I have found that when the ARP table
>    gets much above 100 entries, it has a strong tendency to crash the
>    VAX with a Protection Fault panic.  This phenomenon has been
>    repeated on 3 of our 780s (whichever one happens to have the
>    published entries).  Anyone ever heard of this?
>

In response to this query I heard from:

	Ken Lalonde <kwlalonde%math.waterloo.edu@RELAY.CS.NET>
	Jeffrey Mogul <mogul@decwrl.dec.com>
	Tim Seaver <tas@mcnc.org>

Thank you all very much for responding.  Two of you suggested arp fixes
which we already had in place.  Mr. Tim Seaver identified our problem
and presented a workaround (described below).  I've noticed a potential
problem in his workaround, but anyway, the bug really should be fixed,
not worked around.  Here it is (tersely):

The arptable consists of ARPTAB_NB buckets of ARPTAB_BSIZ entries
each.  arptnew() is called to make a new entry and is responsible for
aging and removing entries, if necessary, to place a new entry in the
table.  Entries can be permanent, in which case they are not aged by
arptnew(), but arp table management is designed to always have at least
one non-permanent entry in each bucket.  This means that there may be
no empty entries in a bucket, but there will be at least one age-able
entry.

BUG:  Our panics occur when arptnew() is called to place an entry
(non-permanent) in a bucket with no empty entries in it.  Under this
condition, instead of chucking the oldest non-permanent entry in the
bucket, arptnew() returns NULL.  However, arptnew() is "guaranteed" not
to return NULL when installing a non-permanent entry.  Our panics are
caused by code which assumes arptnew() works as advertised and
dereferences the pointer it returns without checking for NULL.

NOTE: A typical site would never invoke the (buggy) aging mechanism
since it is unlikely to fill a bucket.  As I mentioned in my original
posting, we publish lots of arp info and therefore our table is rather
full (typically 100+ entries out of 171 max).

arptnew() is called from three places.  One is an arpioctl() and can be
ignored; the other two are arpresolve() and arpinput().  Tim's
workaround checked for a NULL return from arptnew() in the former.  I
think the workaround is incomplete because it should also check for
NULL in arpinput().  (There may be a logical reason for this that I
missed.)

Hope I've made this all clear.  I haven't found the bug in the code
yet: it all reads just fine.  When I find it I'll post it.  This isn't
a real hot priority, so it probably won't happen for a while.  If
anyone out there knows any more about this, please let me know.

-jonathan hahn
NASA Ames Research Center
hahn@ames-nas.arpa
...!ames!amelia!hahn