alg@venture.cs.cornell.edu (Anne Louise Gockel) (07/31/90)
A user in our department has a problem that causes emacs to hang regularly (but not on demand). The problem is possibly associated with using "rmail" in emacs, possibly with starting a shell in emacs. If you think you have seen this problem in similar circumstances, please let me know. I do not know if this problem is unique to the single user or widespread. If you can shed any light on the problem, please let me know. Configuration: DEC3100, Ultrix UWS 2.2, MIT's X11R4 (server, twm, and clients), emacs 18.55.2 (happened with 18.54 also). /usr/spool/mail NFS mounted from Sun 4.0 file system /usr NFS mounted from a Sun 4.0 file system emacs run from /usr/local, NFS mounted from Sun 4.0 filesystem. emacs lock files in /tmp, local to DECstation DECstation is a YP client emacs compiled with X11 support; it comes up in it's own X window. Symptoms: One emacs process starts chewing up CPU (70-80%) and cannot be stopped or interrupted. A parent emacs process is hung in disk wait. I cannot kill these processes except with "kill -9". I've tried to get a core dump of them, but cannot get one that's very meaningful. The following shows the output of the emacs-related processes. There's a "ps -auxww" and "ps -clxa" listing. "emacs-debug" is a version of emacs built with "-g" and no "-O". It appears that the parent process is the one hung in disk wait and the child is spinning away (maybe in a spin lock?) This setup does not make sense to me, is it typical of "rmail" in emacs? USER PID %CPU %MEM SZ RSS TT STAT TIME COMMAND rz 18014 79.5 0.7 2172 28 co R 27:49 emacs-debug rz 18011 0.0 1.4 388 64 p1 I 0:00 /usr/local/lib/emacs/etc/loadst -n 60 rz 18010 0.0 0.0 0 0 co DW 0:00 ???? (emacs-debug) ----------------------- F UID PID PPID CP PRI NI ADDR SZ RSS WCHAN STAT TT TIME COMMAND 1300c000 442 18010 1 8 -1 0 0 0 0 97e8c DW co 0:00 emacs-debug 12008201 442 18011 18010 1 15 0 7e3 296 28 fc000 I p1 0:00 loadst 2009001 442 18014 18010195 73 0 5921600 20 R co 27:40 emacs-debug Looking through the mail logs, it is doubtful, but possible, that the user received mail at the same time as he issued the "rmail" command. We tried to figure out what the process 18010 was "disk waiting" on. We figured that it was a NFS file and we tried to track the ethernet packets. After looking at some of the packets it appeared that the machine was issuing NFS RFS_READLINK and RFS_GETATTR calls for /usr /usr/spool /usr/spool/mail and /usr/spool/mail/rz. These seemed to be repeated at regular intervals of a few seconds. We have seen NFS caching problems between Suns that are sometimes solved by unmounting the bad filesystem (even though the umount fails, the cache is cleared). This trick did not change anything. If anyone has any insights or has experienced similar problems, please let me know. Thanks, Anne Louise Gockel Cornell Computer Science Internet: alg@cs.cornell.edu UUCP: cornell!alg