sthaug@idt.unit.no (Steinar Haug) (10/20/89)
[I have posted this to sun-managers and sun-nets earlier, without getting any good suggestions. I'm hoping that the sun-spots readers can help me!] We are having problems with a Sun-2 running 3.5 here, and I wonder if the following description sounds familiar to any of you. The machine with the problems is named sizex. The machine's ypbind process suddenly discovered that its default domain was unbound, but refused to rebind again. I have two YP servers, and I can see with tcpdump (or etherfind) that sizex is sending queries to the servers, and they are answering, but evidently sizex refuses to recognize the answers. I brought machine down to single-user mode and started digging deeper into the problem. I found some interesting information: 1. Using ping from sizex to another machine I can see that sizex is actually sending out ARP packets, and is getting replies. However, it does not recognize the replies, and does not send any ICMP Echo Requests. After a while ping times out, and the entry in the ARP cache is marked (incomplete). But if instead I ping sizex from another machine, sizex *will* enter the Ethernet address for this other machine in its ARP cache! It will *not* answer the ICMP Echo Request from the other machine, however. 2. If sizex *has* an entry for a machine in its ARP cache (obtained as above) it *will* send out ICMP Echo Requests, and receive answers to this from the other machine. But it still doesn't recognize the answers, and ping times out after a while. 3. If sizex *has* an entry for a machine in its ARP cache and I try telnetting to another machine, I get the following from tcpdump: Script started on Tue Oct 17 00:05:52 1989 boheme# tcpdump ehost sizex 00:06:15.26 sizex.1025 > dorma.telnet: S 5124865:5124865(0) win 4096 <mss 1024> 00:06:15.26 dorma.telnet > sizex.1025: S 49486209:49486209(0) ack 5124866 win 4096 00:06:20.98 sizex.1025 > dorma.telnet: S 5124865:5124865(0) win 4096 <mss 1024> 00:06:20.98 dorma.telnet > sizex.1025: . ack 1 win 4096 00:06:21.10 dorma.telnet > sizex.1025: S 49486209:49486209(0) ack 5124866 win 4096 00:06:26.99 sizex.1025 > dorma.telnet: S 5124865:5124865(0) win 4096 <mss 1024> 00:06:26.99 dorma.telnet > sizex.1025: . ack 1 win 4096 00:06:27.11 dorma.telnet > sizex.1025: S 49486209:49486209(0) ack 5124866 win 4096 00:06:38.99 sizex.1025 > dorma.telnet: S 5124865:5124865(0) win 4096 <mss 1024> 00:06:38.99 dorma.telnet > sizex.1025: . ack 1 win 4096 00:06:39.11 dorma.telnet > sizex.1025: S 49486209:49486209(0) ack 5124866 win 4096 00:07:03.09 dorma.telnet > sizex.1025: S 49486209:49486209(0) ack 5124866 win 4096 (connection timed out) Again, it seems to me that sizex refuses to recognize the answer from the other machine; it just keeps resending its initial message. 4. I have tried the above both with correct subnet mask (0xffffff00) to ifconfig, and without a subnet mask. End result exactly the same. I also tried rebuilding the kernel. No difference... I'm stuck, any ideas out there? Thanks for all help! Steinar Haug, System administrator ELAB-RUNIT, University of Trondheim, Norway Email: sthaug@idt.unit.no, steinar@flute.er.sintef.no