[comp.sys.sun] tmpfs

eirik@theory.tn.cornell.edu (Eirik Fuller) (10/08/90)

In a previous posting I described a problem with tmpfs, in which
processes can get stuck in disk wait trying to read a directory.

Since then a Sun engineer found a way to repeat this problem, based on
data from our kernel core dump.  One of our users had made a hard link
to a directory in /tmp/, and this, apparently, caused the problem.

It sounds like the fix will cure both problems, that tmpfs allows mere
mortals to hard link directories, and the problems that ensue.

eirik@theory.tn.cornell.edu (Eirik Fuller) (10/08/90)

Our Sun 4/280 running SunOS 4.1 frequently gets into an unusable state, in
which numerous processes are stuck in disk wait, all of them in tmpfs.
This artificially increases the load average (by about one per wedged
process), and eventually it causes telnetd to close connections on new
logins.

Kernel stack traces on the wedged processes show that all of them are
sleeping at priority 10 in _tmpnode_lock, called by _tmpnode_get, called
by _tdirlookup, usually (but not always) called by _tmp_lookup.  We don't
have SunOS 4.1 source code, so I'm not conveniently able to explore the
problem much further.

Has anyone else seen this problem?  Any suggestions for workarounds until
we can get a fix would be welcome.  The processes always hang in the same
directory, which usually has mode 700.  The owner of that directory
apparently backgrounds jobs which use that directory, so this could be a
concurrency problem of some sort.