mrm@nss1.simpact.com (Michael R. Miller) (06/07/91)
We are having a problem with our SCO NFS package. It seems that when we start doing large amounts of NFS work, the NFS and the TCP/IP just simply dies. There are streams resources available when this happens. The SCO OS/NFS is exporting its directory. A SUN OS/NFS is importing the directory. Large numbers of reads and writes are going back and forth for some time -- sometimes just a few minutes to an hour, other times a couple of days -- and then the software decides to lay over and play dead. We need to reboot the machine to breath life into its networking support. The SUN's NFS continues to operate although that window is "dead" with the program running in the window waiting for a never-to-be-answered NFS request. We have determined that the SUN isn't at fault by successfully reading and writing another NFS mounted directory exported by another SUN. The SUN is an OS 4.1 product. The SCO is UNIX 3.2. It's running on an AST Premium 33MHz 8Meg computer. The hard disk has plenty of space left on it when it dies. Our network card is a WD8003. Please email responses to me. I will summarize what I receive plus a description of how we resolved the problem after we resolve the problem. Thanks in advance for the help. Michael R. Miller Simpact Associates, Inc.
jim@tiamat.fsc.com ( IT Manager) (06/07/91)
In article <1991Jun06.171047.15327@nss1.com>, mrm@nss1.simpact.com (Michael R. Miller) writes: > We are having a problem with our SCO NFS package. It seems that when > we start doing large amounts of NFS work, the NFS and the TCP/IP just > simply dies. There are streams resources available when this happens. This reminds me of a question I've been meaning to ask. Under Xenix, there is a program called "sw" which does a really nice job of reporting, in real-time, the Streams resources in use. Is there an equivalent funtion under SCO Unix? We have all the parameters really high on one system, since it is heavily used, but my guess is that some of them are too high, and that there are a lot of unused resources that could be returned to user space. Any ideas? ------------- James B. O'Connor jim@tiamat.fsc.com Ahlstrom Filtration, Inc. 615/821-4022 x. 651
larry@nstar.rn.com (Larry Snyder) (06/08/91)
jim@tiamat.fsc.com ( IT Manager) writes: >This reminds me of a question I've been meaning to ask. Under Xenix, there >is a program called "sw" which does a really nice job of reporting, in >real-time, the Streams resources in use. Is there an equivalent funtion >under SCO Unix? We have all the parameters really high on one system, since >it is heavily used, but my guess is that some of them are too high, and that >there are a lot of unused resources that could be returned to user space. have you tried netstat -m ? -- Larry Snyder, NSTAR Public Access Unix 219-289-0287/317-251-7391 HST/PEP/V.32/v.32bis/v.42bis regional UUCP mapping coordinator {larry@nstar.rn.com, ..!uunet!nstar.rn.com!larry}
jtsillas@sprite.ma30.bull.com (James Tsillas) (06/08/91)
THe only way I've managed to get this info is by running 'crash' and entering 'strstat'. -Jim. -- == James Tsillas Bull HN Information Systems Inc. == == (508) 294-2937 300 Concord Road 826A == == jtsillas@bubba.ma30.bull.com Billerica, MA 01821 == == == == The opinions expressed above are solely my own and do not reflect == == those of my employer. == -== no solicitations please ==-
vlr@litwin.com (Vic Rice) (06/10/91)
In <1991Jun07.210220.5073@nstar.rn.com> larry@nstar.rn.com (Larry Snyder) writes: >jim@tiamat.fsc.com ( IT Manager) writes: >>This reminds me of a question I've been meaning to ask. Under Xenix, there >>is a program called "sw" which does a really nice job of reporting, in >>real-time, the Streams resources in use. Is there an equivalent funtion >>under SCO Unix? We have all the parameters really high on one system, since >>it is heavily used, but my guess is that some of them are too high, and that >>there are a lot of unused resources that could be returned to user space. >have you tried netstat -m ? This yields the following on SCO ODT 1.1 : # netstat -m netstat: Memory information not currently supported -- Dr. Victor L. Rice Litwin Process Automation
wes@harem.clydeunix.com (Barnacle Wes) (06/15/91)
In article <1991Jun06.171047.15327@nss1.com>, mrm@nss1.simpact.com (Michael R. Miller) writes: > The SCO OS/NFS is exporting its directory. A SUN OS/NFS is importing > the directory. Large numbers of reads and writes are going back and > forth for some time -- sometimes just a few minutes to an hour, other > times a couple of days -- and then the software decides to lay over > and play dead. We need to reboot the machine to breath life into its > networking support. > > The SUN's NFS continues to operate although that window is "dead" > with the program running in the window waiting for a never-to-be-answered > NFS request. We have determined that the SUN isn't at fault by successfully > reading and writing another NFS mounted directory exported by another SUN. > The SUN is an OS 4.1 product. This doesn't necessarily mean that the Sun NFS is correct, or bug-free, but just that Sun NFS has a bug-set that is compatible with (surprise!) Sun NFS. If you have another SCO system, try doing the same test with an SCO client & server. This may help to narrow the possibilities. Also, when you encounter this problem, does the entire network on the SCO box die, or just NFS? In other words, do telnet, ping, finger, etc still work? If so, it may just be a problem with SCO-NFS. If it crashing the entire network, including inetd, the problem may be in your TCP/IP software rather than the NFS server. Does nfsstat show any problems before or after the crash, such as lots of rpc badcalls? Good luck bug-hunting. Wes Peters -- #include <std/disclaimer.h> The worst day sailing My opinions, your screen. is much better than Raxco had nothing to do with this! the best day at work. Wes Peters: wes@harem.clydeunix.com ...!sun!unislc!harem!wes
larryp@sco.COM (Larry Philps) (06/17/91)
In <342@harem.clydeunix.com> wes@harem.clydeunix.com (Barnacle Wes) writes: > In article <1991Jun06.171047.15327@nss1.com>, mrm@nss1.simpact.com (Michael R. Miller) writes: > > The SCO OS/NFS is exporting its directory. A SUN OS/NFS is importing > > the directory. Large numbers of reads and writes are going back and > > forth for some time -- sometimes just a few minutes to an hour, other > > times a couple of days -- and then the software decides to lay over > > and play dead. We need to reboot the machine to breath life into its > > networking support. > > > > The SUN's NFS continues to operate although that window is "dead" > > with the program running in the window waiting for a never-to-be-answered > > NFS request. We have determined that the SUN isn't at fault by successfully > > reading and writing another NFS mounted directory exported by another SUN. > > The SUN is an OS 4.1 product. > > This doesn't necessarily mean that the Sun NFS is correct, or bug-free, > but just that Sun NFS has a bug-set that is compatible with (surprise!) > Sun NFS. If you have another SCO system, try doing the same test with > an SCO client & server. This may help to narrow the possibilities. > > Also, when you encounter this problem, does the entire network on the > SCO box die, or just NFS? In other words, do telnet, ping, finger, etc > still work? If so, it may just be a problem with SCO-NFS. If it > crashing the entire network, including inetd, the problem may be in > your TCP/IP software rather than the NFS server. Does nfsstat show any > problems before or after the crash, such as lots of rpc badcalls? > > Good luck bug-hunting. I sent mail to Michael Miller regarding this problem, but since the question has now resurfaced a week later, I figured I should let everybody in on the scoop. This *bug* has already been found and fixed. Please note that the problem is in the WD8003 driver, not NFS. It turns out that in certain circumstances (transmitting while under extremely heavy receive loads), the WD 8003 card can drop a transmit interrupt. The driver did not check for, and thus did not recover from this situation. This will produce exactly the symptoms Michael is seeing. We also found that under even heavier loads, the entire system could hang. This turned out to be the result of the NIC chip on the board putting a bogus value into the next packet pointer register. If this bogus value was 0, the driver would infinite loop at spl5. Both bugs have been fixed in the current driver, and are now shipping as part of the LLI Drivers EFS. You can get this from support for a fee of approx $50 (I think), or uucp download it for free from sosco or ftp it for free from sco-archive on uunet. --- Larry Philps, SCO Canada, Inc. Postman: 130 Bloor St. West, 10th floor, Toronto, Ontario. M5S 1N5 InterNet: larryp@sco.COM or larryp%scocan@uunet.uu.net UUCP: {uunet,utcsri,sco}!scocan!larryp Phone: (416) 922-1937