lbo%ztivax@ZTIVAX.SIEMENS.COM (Lothar Borrmann) (08/09/89)
After getting used to mush on an Ultrix VAX I tried to put it onto an Apollo DN 4000, running Domain IX SR 9.7. It compiled nicely in the BSD4.2 environment after minimal changes to the makefile.bsd (for the Aegis 'bind' linker) and config.h (defining VPRINTF). Now, while most functions seem to work, there is one major problem that arises when subprocesses are generated: - On invoking an editor (eg. by ~e), mush will not be resumed after termination of the editor subprocess. It simply hangs and has to be killed. - After invoking some external command, mush will have disappeared and I am back in my shell again. I presume that this is a problem of insufficient BSD compatibility of the Apollo SR 9.7 system (a known problem). I looked into the mush code but it is not quite obvious to me what the corresponding mechanism should work like and where the problem comes up. Any hints from the mush experts ? _____________________________________________________________________________ Lothar Borrmann | Email: Siemens AG | EUnet lbo@ztivax.uucp Corporate R&D, ZFE F2 SYS 3 | or ...!unido!ztivax!lbo Otto-Hahn-Ring 6 | Internet lbo@ztivax.siemens.com D-8000 Muenchen 83 | (non-MX: lbo%ztivax@siemens.siemens.com) West Germany | or lbo%ztivax.uucp@uunet.uu.net ----------------------------------------------------------------------------- --
argv%eureka@Sun.COM (Dan Heller) (08/10/89)
In article <8908091029.AA17481@ztivax.uucp> lbo%ztivax@ZTIVAX.SIEMENS.COM (Lothar Borrmann) writes: > - On invoking an editor (eg. by ~e), mush will not be resumed after > termination of the editor subprocess. It simply hangs and has to be killed. > - After invoking some external command, mush will have disappeared and I am > back in my shell again. > I presume that this is a problem of insufficient BSD compatibility of > the Apollo SR 9.7 system (a known problem). The problem sounds suspiciously like your SIGCHLD is broken. I'd be willing to bet that your signal() library is really sys-v based than BSD based. What's happening is that your wait() routine is hanging because the apollo's OS has the same bug that xenix's OS has where if SIGCHLD is set to a signal catcher, then the dead child causes the signal catcher to get called, it calls wait() to pick up the dead child, and the wait() in "pclose()" or in execute.c never returns because it never gets the terminated process. It also doesn't return -1 as it should. I don't remember the exact problem with this -- I wrestled for months trying to get xenix to work properly with respect to wait() and SIGCHLD to no avail. What you might try doing is editing loop.c where SIGCHLD is set to either SIG_DFL for sys-v or sigchldcatcher for BSD by using #ifdef's -- force it to be SIG_DFL and see what happens. Also, before and after you try my suggestion, turn on debugging using the "debug" command. (no args.) Then run ~e and return. You should get a message that pid XXXXX died. If you do, and Mush is still hung, force a core dump using ^\ and look at the stack trace using adb: % adb mush $c (output) ^D % One of the things in the stack should be a call to wait() or something similar. This means that the problem you're having is the one I've described above. Note: the last I checked the xenix documentation, the interplay between wait() and setting the signal handler for SIGCLD does *not* act the way it is documented. In my opinion, the doc is right, but the implementation is wrong. The symptoms you described sound very similar to the xenix problem. dan <island!argv@sun.com> ----- My postings reflect my opinion only -- not the opinion of any company.