[comp.sys.pyramid] Strange Ether-problem on Pyramid

ahi@nada.kth.se (Anders Hillbo) (08/03/88)

The problem has some different symptoms, we believe that they are all
related.  But, maybe we are wrong and they are not connected at all...
Anyone with a nice good simple solution, that we have overlooked?

1) The most obvious thing is that "netstat -i" NEVER shows any collisions.
All the other hosts show collisions. (We have average 4-8% load on the Ether)
DOES ANYONE ELSE HAVE THIS? I can "swear" that previous versions of OSx
showed collisions... (see below for versions tested)

2) Ping indicates lost packets from and to the Pyramid. No other host
on the net (we have ping-tested about 10 out of 80) drops any packets
with ping.  About 2-4% of the packets to/from the Pyr are lost at
worst.

3) Telnet connections sometimes "hangs" echoing for 2-15 secs. 

4) NFS says "NFS server not responding/NFS server OK" now and then

Conditions 2-4 only happen when the net has *some* load but not necessarily
heavy. It doesn't happen all the time when the net is loaded either.

HISTORY: 
We have a 9820 with one IOP.
The problems has existed at least on 4.0-870921, 4.0-880601,
4.1-std and 4.1-880601. We started noticing the telnet "hangings"
during late spring but it took a while to isolate it to the Pyr, we
first suspected allover net overload etc. Of course the problem
appear even less often during the summer when most people are away.

OUR ACTIONS SO FAR:
o 	We have tried three different xvrs (and xvr-cables): one 
	1.0-xvr, one fan-out (ISOLAN) and one 802.3 (SQE/heartbeat)-xvr.
o	We have let the Pyr representative switch IOP/TPE card. (latest rev)
o	Ditto for the internal cable from the card to the chassis 
	Ether connector. (far fetched...)
o	Tried putting the Pyr on an local Ether behind an Mac-level bridge.
	(if our main Ether had some "bad magic")

----

By the way, the 4.1-860601 has a (minor) NFS bug (SunOs 3.4 clients)
when:
a) a process on a client has a file on the server open.
b) someone removes that file on the server.
c) the client process tries to access the file
Result:
c) a lot of of error messages appear on the server console and the load gets
   high on the server. The user gets no error msg at all!

Typical error msg (repeated *many* times):
ufs_vget: iget(0x82a, 0xfecec000, 8803) gen mismatch 15/16
fhtovp failed:  8 8803 15

sas@pyrps5 (Scott Schoenthal) (08/04/88)

In article <502@draken.nada.kth.se> ahi@nada.kth.se (Anders Hillbo) writes:
>By the way, the 4.1-860601 has a (minor) NFS bug (SunOs 3.4 clients)
>when:
>a) a process on a client has a file on the server open.
>b) someone removes that file on the server.
>c) the client process tries to access the file
>Result:
>c) a lot of of error messages appear on the server console and the load gets
>   high on the server. The user gets no error msg at all!
>
>Typical error msg (repeated *many* times):
>ufs_vget: iget(0x82a, 0xfecec000, 8803) gen mismatch 15/16
>fhtovp failed:  8 8803 15

The message indicates that an incorrect file handle is being used by an
NFS client.  Remember that NFS is "stateless" relative to the server.
By removing the file on the server, you have invalidated the client's
reference (handle) to the file.  If the client later attempts to the use
this handle, the server (Pyramid) will reject it with an error (returned
to the client process by most Sun-based NFS implementations as ESTALE).
The "gen mismatch" refers to the fact that the generational field in
the client's version of the handle does not match the same field in the
referenced inode.

sas
----
Scott Schoenthal   			sas@pyrps5.pyramid.com
Pyramid Technology Corp.		{sun,hplabs,decwrl}!pyramid!sas