[comp.protocols.tcp-ip] ARP cache timeouts

mogul@DECWRL.DEC.COM (Jeffrey Mogul) (11/29/88)
smb@ulysses.homer.nj.att.com (Steven M. Bellovin) writes:
    Perhaps the right thing to do is to reset the ARP age timer on receipt of
    a packet from that (IP? Ethernet? both?) address; that will deal with
    persistent attempts to reach a host which has gone away.

This seems inefficient because it adds code to the path of every
received packet, and as you've observed, the layering is weird.
Also, it doesn't really address the problem of bad mappings (someone
has changed their IP or the Ether address; when you receive a packet
from them, how do you figure this out?)

A simple fix is to reset the ARP cache timer only when the entry
is created, not when it is used.  This means that a bad entry will
never persist beyond a single timeout period, no matter how often
you try to make contact (4.xBSD resets on every use, so it is
really hard to get rid of a stale entry).

The inefficiency here is that every 20 minutes (or whenever),
each connection pauses while the ARP cache is reloaded.  On
the other hand, since this all happens at interrupt level, the
pause should be pretty small.

If one wanted to avoid this pause altogether, one could change the
state machine in the ARP implementation from effectively a three-state
system (EMPTY->INCOMPLETE->COMPLETE->EMPTY) by adding a fourth
state, SUSPICIOUS.  The transition from COMPLETE to SUSPICIOUS
would be made when the timer goes off, at which point a new
ARP request would be generated.  A successful reply would return
the state to COMPLETE.  In SUSPICIOUS state, the entry would still be
useable, but after a few seconds in this state it would revert to
INCOMPLETE (i.e., because no reply is received).

Thus, the normal case would be for connections to proceed without
waiting for an ARP reply, but if the mapping changes, there is
a strict bound on the lifetime of a stale entry.  This mechanism
avoids most need for rebroadcasting, too, since we think we know
the target Ethernet address when we get "suspicious".

Neither the "simple fix" nor the 4-state version require any change
to the ARP protocol or to the behaviour of other hosts.  I'd go
with the simple fix, since the cost of waiting 10 mSec once in a
while seems unimportant.

-Jeff