ahi@nada.kth.se (Anders Hillbo) (08/03/88)
The problem has some different symptoms, we believe that they are all related. But, maybe we are wrong and they are not connected at all... Anyone with a nice good simple solution, that we have overlooked? 1) The most obvious thing is that "netstat -i" NEVER shows any collisions. All the other hosts show collisions. (We have average 4-8% load on the Ether) DOES ANYONE ELSE HAVE THIS? I can "swear" that previous versions of OSx showed collisions... (see below for versions tested) 2) Ping indicates lost packets from and to the Pyramid. No other host on the net (we have ping-tested about 10 out of 80) drops any packets with ping. About 2-4% of the packets to/from the Pyr are lost at worst. 3) Telnet connections sometimes "hangs" echoing for 2-15 secs. 4) NFS says "NFS server not responding/NFS server OK" now and then Conditions 2-4 only happen when the net has *some* load but not necessarily heavy. It doesn't happen all the time when the net is loaded either. HISTORY: We have a 9820 with one IOP. The problems has existed at least on 4.0-870921, 4.0-880601, 4.1-std and 4.1-880601. We started noticing the telnet "hangings" during late spring but it took a while to isolate it to the Pyr, we first suspected allover net overload etc. Of course the problem appear even less often during the summer when most people are away. OUR ACTIONS SO FAR: o We have tried three different xvrs (and xvr-cables): one 1.0-xvr, one fan-out (ISOLAN) and one 802.3 (SQE/heartbeat)-xvr. o We have let the Pyr representative switch IOP/TPE card. (latest rev) o Ditto for the internal cable from the card to the chassis Ether connector. (far fetched...) o Tried putting the Pyr on an local Ether behind an Mac-level bridge. (if our main Ether had some "bad magic") ---- By the way, the 4.1-860601 has a (minor) NFS bug (SunOs 3.4 clients) when: a) a process on a client has a file on the server open. b) someone removes that file on the server. c) the client process tries to access the file Result: c) a lot of of error messages appear on the server console and the load gets high on the server. The user gets no error msg at all! Typical error msg (repeated *many* times): ufs_vget: iget(0x82a, 0xfecec000, 8803) gen mismatch 15/16 fhtovp failed: 8 8803 15
sas@pyrps5 (Scott Schoenthal) (08/04/88)
In article <502@draken.nada.kth.se> ahi@nada.kth.se (Anders Hillbo) writes: >By the way, the 4.1-860601 has a (minor) NFS bug (SunOs 3.4 clients) >when: >a) a process on a client has a file on the server open. >b) someone removes that file on the server. >c) the client process tries to access the file >Result: >c) a lot of of error messages appear on the server console and the load gets > high on the server. The user gets no error msg at all! > >Typical error msg (repeated *many* times): >ufs_vget: iget(0x82a, 0xfecec000, 8803) gen mismatch 15/16 >fhtovp failed: 8 8803 15 The message indicates that an incorrect file handle is being used by an NFS client. Remember that NFS is "stateless" relative to the server. By removing the file on the server, you have invalidated the client's reference (handle) to the file. If the client later attempts to the use this handle, the server (Pyramid) will reject it with an error (returned to the client process by most Sun-based NFS implementations as ESTALE). The "gen mismatch" refers to the fact that the generational field in the client's version of the handle does not match the same field in the referenced inode. sas ---- Scott Schoenthal sas@pyrps5.pyramid.com Pyramid Technology Corp. {sun,hplabs,decwrl}!pyramid!sas