[comp.sys.sun] le: missed packet problem

gretzky@unison.larc.nasa.gov (Mitch Wright) (03/04/89)

H-E-L-P-!

I currently have 2 sun 3/60's running 4.0.1.  One is a server with a 327Mb
disk and 4Mb of memory, the client is a 3/60 with a color monitor, and 8Mb
of memory.  They have been running smoothly (as smooth as 4.0.1 can be).
Well, trouble hit Friday when the client just up and died (went down to
the monitor prompt '>').  Tried rebooting it ...

>b

EEPROM boot device ... le(0,0,0)
Using IP Address 128.155.2.94 = 809B025E
Booting from tftp server at 128.155.2.83 = 809B0253
Downloaded 126056 bytes from tftp server.
Using IP Address 128.155.2.94 = 809B025E
le: missed packet
le: missed packet
le: missed packet
No bootparam server responding; still trying
le: missed packet
le: missed packet

and will continue like this until you L1-a the client.  I have double and
triple checked all files in /etc/...  I have rebooted the server (several
times).  I have deleted the client completely and then added the client
again.  Network traffic was quite slow, and I'm fresh out of ideas.  I am
beginning to think it is a hardware problem, but I would hate to jump the
gun.  I have had NO problems in the past with rebooting.

Any help will be greatly appreciated.

-=>gretzky<=-
..mitch

	gretzky@eagle.larc.nasa.gov
	gretzky@uxv.larc.nasa.gov

ehrhart@aai8.istc.sri.com (Tim Ehrhart) (03/14/89)

> Well, trouble hit Friday when the client just up and died (went down to
> the monitor prompt '>').  Tried rebooting it ...
> 
> >b
> 
> EEPROM boot device ... le(0,0,0)
> Using IP Address 128.155.2.94 = 809B025E
> Booting from tftp server at 128.155.2.83 = 809B0253
> Downloaded 126056 bytes from tftp server.
> Using IP Address 128.155.2.94 = 809B025E
> le: missed packet
> le: missed packet
> le: missed packet
> No bootparam server responding; still trying
> le: missed packet
> le: missed packet

I experienced the same problems when I upgraded to 4.0 months ago. After
much head scratching and wire sniffing here is what I discovered:

We had some VMS/VAXen on the wire running both DECnet and TCP/IP. There
are various version of TCP/IP available for VMS, so your mileage may vary.
But nonetheless, most of them ~seem~ to be based on the PD version of RPC
from Sun. What appears to happen is that when the client is requesting his
bootparam server (which corresponds to an indirect RPC request from the
portmapper to bootparamd), the portmapper process running on the VAX sends
back the wrong response. If it can't satisfy the request, it should simply
NOT ANSWER, instead it sends back an RPC error message.  (I can't remember
exactly what the message was, it has been a while, but I think it was "RPC
service unavailable"). We have/had quite of few of these beasts, so the
poor diskless was inundated with bogus RPC replies from the VAXen. The
client didn't like this, so it proceeded to send ICMP messages back to the
VAXen ????. Just about at the timeout of the request, the appropriate file
server would FINALLY respond (about 9ms later), but the client timed out
his request, dropped the repsonse packet from the file server, which then
started the process all over again.

Try to prove this by isolating your client and it's file server from the
rest of the net and attempting the boot again. This is simple for me to do
because we make copious use of multi-port boxes. In lieu of this, get out
either tcpdump or etherfind and watch for all packets coming and going
to/from the affected client. It was AMAZING to watch how fast the VAXen
were pummeling the poor client (reply time was about ~1ms), then finally
about 9ms later the file server replied. In my case, the file server was a
Sun-4 on the same multi-port box right beside the client, and the VAXen
were on distant parts of our campus ethernet.

Tim Ehrhart			ehrhart@spam.istc.sri.com
SRI International

rsd@iroquois.dal.utexas.edu (Shane Davis) (04/07/89)

Been slow to deal with mail this month...better late then never, I reckon...

Tim Ehrhart <ehrhart@aai8.istc.sri.com>:
...
>What appears to happen is that when the client is requesting his
>bootparam server (which corresponds to an indirect RPC request from the 
>portmapper to bootparamd), the portmapper process running on the VAX sends
>back the wrong response. If it can't satisfy the request, it should simply
>NOT ANSWER, instead it sends back an RPC error message....

VAXen aren't the only culprits. We had 2 diskless 3/60's attempting to
boot off a 3/280 that was only 15 feet away from them and a TI Explorer
from the other side of the campus stuck its nose into the boot process in
the same unfriendly manner. We isolated the Cabletron all of the Suns were
on from the rest of the net and they booted normally.

--Shane Davis
  VM and UNIX Systems Programmer
  Univ. of Texas at Dallas Academic Computer Center
  SHANE@UTDALVM1{.BITNET|.dal.utexas.edu} or rsd@dal.utexas.edu