sch%ee.uofm.cdn@relay.ubc.ca (Roland Schneider) (11/23/88)
Configuration: Sun 4/280, SunOS 4.0 EXPORT (eeserv) [ mounts mymach files ] Sun 3/260, SunOS 3.5 EXPORT (mymach) Has anyone else run into a problem where processes on a 4/280 (eeserv) which access files on a 3/260 (mymach) vi NFS hang in a disk wait ("D" status in ps) and cannot coaxed back into operation or even killed. This seems to be happening with increasing frequency here, and is very annoying as it requires rebooting the NFS client. (the 4/280) Other clients (sun 3's) of mymach continue working fine. The problem is actually a little stranger than I described above. Here is a scenerio: (my home directory is on mymach) Script started on Sat Nov 12 11:01:33 1988 mymach% rlogin eeserv Last login: Sat Nov 12 00:19:36 from mymach eeserv% ls chris fool nnet sun_service xxx .... eeserv% cat xxx ... the contents of xxx ... eeserv% cd fool eeserv% ls testfile eeserv% cat testfile =================================== At this point, cat is stuck in a disk wait, and will not die until the system is rebooted. So: I can log in, cat (or vi, etc) a file in my home directory, do an "ls", change directories, do an "ls" here too, but not access any files. This seemed to be true for EVERY new process I started, no matter if as root or not. I also noticed this behavior on a 4/110 this morning. The process which was hung was on ttyp0, and guess what I ran into? The well known problem of windows dying on the first character typed. A coincidence? I'd be interested in hearing from anyone else who has run into this, especially if they've found a solution. I have not been able to recreate this problem on demand, but I suspect it begins during heavy NFS access. Roland Schneider Dept. Electrical Engineering University of Manitoba Winnipeg, Manitoba, Canada
dwf%hope@lanl.gov (David W. Forslund) (12/03/88)
Your problem sounds like the lockd bug, which has been fixed in SunOS4.0.1. The patch tape is also available from SUN. We had the problem between a 386i NFS mounting a Sun4/260 filesystem. The patch tape fixed our problem. David Forslund Los Alamos National Laboratory (dwf@lanl.gov)