eirik@theory.tn.cornell.edu (Eirik Fuller) (10/08/90)
In a previous posting I described a problem with tmpfs, in which processes can get stuck in disk wait trying to read a directory. Since then a Sun engineer found a way to repeat this problem, based on data from our kernel core dump. One of our users had made a hard link to a directory in /tmp/, and this, apparently, caused the problem. It sounds like the fix will cure both problems, that tmpfs allows mere mortals to hard link directories, and the problems that ensue.
eirik@theory.tn.cornell.edu (Eirik Fuller) (10/08/90)
Our Sun 4/280 running SunOS 4.1 frequently gets into an unusable state, in which numerous processes are stuck in disk wait, all of them in tmpfs. This artificially increases the load average (by about one per wedged process), and eventually it causes telnetd to close connections on new logins. Kernel stack traces on the wedged processes show that all of them are sleeping at priority 10 in _tmpnode_lock, called by _tmpnode_get, called by _tdirlookup, usually (but not always) called by _tmp_lookup. We don't have SunOS 4.1 source code, so I'm not conveniently able to explore the problem much further. Has anyone else seen this problem? Any suggestions for workarounds until we can get a fix would be welcome. The processes always hang in the same directory, which usually has mode 700. The owner of that directory apparently backgrounds jobs which use that directory, so this could be a concurrency problem of some sort.