[comp.mail.mush] Mush on Apollos

lbo%ztivax@ZTIVAX.SIEMENS.COM (Lothar Borrmann) (08/09/89)

After getting used to mush on an Ultrix VAX I tried to put it
onto an Apollo DN 4000, running Domain IX SR 9.7.
It compiled nicely in the BSD4.2 environment after minimal changes to
the makefile.bsd (for the Aegis 'bind' linker) and config.h (defining
VPRINTF).
Now, while most functions seem to work, there is one major problem that
arises when subprocesses are generated:
- On invoking an editor (eg. by ~e), mush will not be resumed after 
  termination of the editor subprocess. It simply hangs and has to be killed.
- After invoking some external command, mush will have disappeared and I am
  back in my shell again.
I presume that this is a problem of insufficient BSD compatibility of
the Apollo SR 9.7 system (a known problem). I looked into the mush code 
but it is not quite obvious to me what the corresponding mechanism should
work like and where the problem comes up.

Any hints from the mush experts ?

_____________________________________________________________________________
Lothar Borrmann                | Email:
Siemens AG                     |    EUnet     lbo@ztivax.uucp
Corporate R&D, ZFE F2 SYS 3    |     or       ...!unido!ztivax!lbo
Otto-Hahn-Ring 6               |    Internet  lbo@ztivax.siemens.com
D-8000 Muenchen 83             |     (non-MX: lbo%ztivax@siemens.siemens.com)
West Germany                   |     or       lbo%ztivax.uucp@uunet.uu.net
-----------------------------------------------------------------------------

-- 

argv%eureka@Sun.COM (Dan Heller) (08/10/89)

In article <8908091029.AA17481@ztivax.uucp> lbo%ztivax@ZTIVAX.SIEMENS.COM (Lothar Borrmann) writes:
> - On invoking an editor (eg. by ~e), mush will not be resumed after 
>   termination of the editor subprocess. It simply hangs and has to be killed.
> - After invoking some external command, mush will have disappeared and I am
>   back in my shell again.
> I presume that this is a problem of insufficient BSD compatibility of
> the Apollo SR 9.7 system (a known problem).

The problem sounds suspiciously like your SIGCHLD is broken.  I'd be
willing to bet that your signal() library is really sys-v based than
BSD based.  What's happening is that your wait() routine is hanging
because the apollo's OS has the same bug that xenix's OS has where if
SIGCHLD is set to a signal catcher, then the dead child causes the signal
catcher to get called, it calls wait() to pick up the dead child, and
the wait() in "pclose()" or in execute.c never returns because it never
gets the terminated process.  It also doesn't return -1 as it should.
I don't remember the exact problem with this -- I wrestled for months
trying to get xenix to work properly with respect to wait() and SIGCHLD
to no avail.  What you might try doing is editing loop.c where SIGCHLD
is set to either SIG_DFL for sys-v or sigchldcatcher for BSD by using
#ifdef's -- force it to be SIG_DFL and see what happens.

Also, before and after you try my suggestion, turn on debugging using
the "debug" command.  (no args.)  Then run ~e and return.  You should
get a message that pid XXXXX died.  If you do, and Mush is still hung,
force a core dump using ^\ and look at the stack trace using adb:
% adb mush
$c
(output)
^D
%

One of the things in the stack should be a call to wait() or something
similar.  This means that the problem you're having is the one I've
described above.

Note: the last I checked the xenix documentation, the interplay between
wait() and setting the signal handler for SIGCLD does *not* act the way
it is documented.  In my opinion, the doc is right, but the implementation
is wrong.  The symptoms you described sound very similar to the xenix
problem.

dan <island!argv@sun.com>
-----
My postings reflect my opinion only -- not the opinion of any company.