hanche@imf.unit.no (Harald Hanche-Olsen) (03/07/91)
In article <5021316a.1bc5b@pisa.citi.umich.edu> rees@pisa.citi.umich.edu (Jim Rees) writes: In article <HANCHE.91Feb28201637@hufsa.imf.unit.no>, hanche@imf.unit.no (Harald Hanche-Olsen) writes: Status 03010002: process had a fatal error (process manager/process manager) In routine "/sys/node_data.a0b0/systmp/global_readonly" offset 363A Called from "pgm_$exec_uid_pn" line 1450 Called from "pgm_$exec_xoid_pn" line 1287 This happens when an exec fails after the process is unrecoverably committed to running the new program. Since it can't get back to the original program at this point, the process just exits. This "shouldn't happen" (be glad it's not a kernel panic). You're right it shouldn't happen, at least not when the exec'd file really does exist and is eminently executable. Why it does happen is one of those questions we would maybe rather not know the answer to... I am apparently not the only one seeing this problem. I also had mail from ericb@caen.engin.umich.edu (Eric Bratton) who has seen /bin/sh do the same thing, and "can sometimes reproduce the bug in /bin/sh by doing a very large /com/cpt command accross the network". Now I have learned a really good way to provoke this bug, and that is by running xdm and Xapollo from the MIT X11R4 distribution. And it is not only exec() that fails, fork() can also fail - though not in quite the same destructive manner. Here is the story: THE FAILURE OF fork() In a separate posting I have told about our troubles with the R4 Xapollo server and xdm, wherein Xapollo fails in the initialization, forcing xdm to start a new one. Well, that works fine much of the time, but once in a while the fork() call fails with `no more processes' which, of course, should not be so when the total number of processes is below twenty. It happens anyway. Now, I did a clever (I thought) workaround: First, I modified xdm to just fork() again if it failed, but that never succeeded. Sooo, I figured, somehow that process's tables are all screwed up. Other processes can fork(), I mean, I can run a shell and run all kinds of commands in it, right? So I modifed xdm to exit with a special exit code if it could not fork(), and wrote a tiny parent process (called it xdmfix) which, when the child exits with that status code, just starts up a new one. Works fine, xdmfix itself never reported difficulties with fork(), but it often reports having to restart xdm, which it does just fine. But... THE FAILURE OF exec() Yes, this is where I first saw exec() failing. My clever xdmfix process sometimes cannot exec(), and hence dies an unnatural death. Here is the top end of a typical traceback: Program /bsd4.3/usr/bin/X11-R4/xdmfix Status 03010002: process had a fatal error (process manager/process manager) In routine "<UID 4FD535A6.4001996A>" offset 363A Called from "pgm_$exec_uid_pn" line 1450 Called from "pgm_$exec_xoid_pn" line 1287 Called from "execve" line 224 Called from "execv" line 146 At the same time, I found it impossible to log in on the node via tcp/ip. I get a connection, but nothing happens. After rebooting the hard way, I do a traceback and find that inetd crashed just like xdmfix in the traceback above. I didn't report this problem here, however, until the same problem (with inetd) showed up on a node running vanilla HP supplied software, showing that this is likely to affect other users too. Why the X11R4 server and xdm provoke this bug (bugs?) so consistently, while most nodes are unaffected most of the time, I don't know. As always, I am happy for any suggestions, though I am not too optimistic at the moment. If nothing else, maybe here is more fuel for the current flames directed at Domain/OS... - Harald Hanche-Olsen <hanche@imf.unit.no> Division of Mathematical Sciences The Norwegian Institute of Technology N-7034 Trondheim, NORWAY