fletcher@cs.utexas.edu (Fletcher Mattox) (10/21/90)
We seem to have a problem when a Sequent arps for a SparcStation 1+. The Sequent (cs.utexas.edu) is a Balance 21k running Dynix 3.0.4. The SS1+ (blox.cs.utexas.edu) is running SunOS 4.1. This etherfind was run on another sun on the same wire as the SS1+ and the Sequent. It is monitoring the arp exchange when someone on the Sequent types "telnet blox". Script started on Sat Oct 20 13:06:08 1990 blitz% etherfind -t -u -arp -between blox cs Using interface ie0 icmp type lnth proto source destination src port dst port 0.00 60 arp cs.utexas.edu blox.cs.utexas. 0.00 60 arp blox.cs.utexas. cs.utexas.edu 2.82 60 arp cs.utexas.edu blox.cs.utexas. 2.82 60 arp blox.cs.utexas. cs.utexas.edu 12.32 60 arp cs.utexas.edu blox.cs.utexas. 12.32 60 arp blox.cs.utexas. cs.utexas.edu 24.96 60 arp cs.utexas.edu blox.cs.utexas. 24.96 60 arp blox.cs.utexas. cs.utexas.edu 69.26 60 arp cs.utexas.edu blox.cs.utexas. 69.26 60 arp blox.cs.utexas. cs.utexas.edu 211.58 60 arp cs.utexas.edu blox.cs.utexas. 211.58 60 arp blox.cs.utexas. cs.utexas.edu 306.34 60 arp cs.utexas.edu blox.cs.utexas. 306.34 60 arp blox.cs.utexas. cs.utexas.edu 495.78 60 arp cs.utexas.edu blox.cs.utexas. 495.78 60 arp blox.cs.utexas. cs.utexas.edu ^C blitz% exit blitz% script done on Sat Oct 20 13:18:43 1990 I've looked at the contents of the arp packets. Nothing unusual there. It's as if the Sequent just isn't seeing the arp response from the SS1. But it works sometimes. If you try the above experiment 10 times, you'll usually get a successful arp entry into the Sequent's cache and TCP/IP then proceeds normally. It even happens (to a lesser extent) on our Symmetry running 3.0.12. That makes me wonder if there's a timing problem here. Could the SS1+ be getting the arp response back on the wire before the Sequent is prepared to deal with it? Hm. I dunno. (Yes, I could add a permanent entry in the Sequent's arp cache. I don't want to.) By the way, as long as I'm talking about Sequent arps: Isn't 495.78 seconds a little too persistant? That's when the telnet session finally timed out. If a host hasn't responded to an arp within a few seconds, it is never going to. The above strategy causes TCP/IP to take 10 minutes(!) to time out to a dead host.