[comp.sys.isis] Apollo UNIX

ken@gvax.cs.cornell.edu (Ken Birman) (08/19/89)
I decided to fix ISIS on the Apollo UNIX system, and
started hacking... now I'm at a bit of a dead end and
need some help from a hacker who knows their lightweight
processes facility moderately well.

The situation is this.  First, ISIS V1.2 has a minor
assumption that Apollo turns out to violate, namely
that the id of a thread is always non-null.  Turns out
that on Apollo, the id of the system stack actually is
null, and this confused clib.  (A fix is to initialize
isis_main_task.threadid to something very implausible, 
like 1, in isis_tinit()).

Testtasks works fine at this point, so task switching
seems to be solid.

However, I am seeing two problems:

1) If I call vfork() early in execution, before even
calling my task init routine, I get a message from the Apollo
UNIX that it "can't unwind my stack because it is corrupted".

I switched this call to a fork and got past it.

2) If I call fork after initializing my task package,
e.g. acquiring a few locks and forking some tasks,
the child process hangs immediately on startup.  Even
if I unlock all mutex variables before forking it.

If I call vfork instead, the child starts running but
the parent hangs.  Then the child hangs too, when it
calls execve.

3) FInally, when I kill isis off on the Apollo, the
UDP sockets I created usually don't go away, ever.
That is, with no processes connected to them and no
data on them, the sockets linger anyhow, for hours
at a minimum.

Oddly, if I hand-start ISIS to avoid the fork and
exec's it runs fine.  I ran several demos and they
are all robustly healthy.  The only problem is that
ISIS is unable to fork off new heavyweight processes 
or exec new binaries.  And, that the UDP sockets hang
around.

Help!  Obviously Apollo UNIX release 10.1 is a bit "sensitive"!
What can be done about this?  If you know, I need your advice.

Notes: ISIS implements condition variables as mutex variables
on the Apollo.  I lock them by calling mutex_$lock() and
then pfm_$enable_faults() immediately after, and later
unlock them by calling mutex_$unlock().

Basically, everything else I do is very simple.  When
running, a task always holds locks on a mutex called isis_mutex
and on another called isis_ctp->task_runme.  All other
tasks are always blocked, either trying to lock isis_mutex
or (more often) trying to lock tp->task_runme in some
other task descriptive structure.  At init time for each
task we "pre-lock" task_runme, so that a task blocks
by relocking it, and is unblocked by someone else unlocking
it.  As I mentioned, this definitely works correctly
because all of ISIS and also the test programs run.

The pfm_$enable() business is a work-around for a known
Apollo UNIX bug, namely that all signals and faults are
disabled and left disabled by the mutex_$lock routine.
(if you don't do this call after each and every mutex_$lock,
your process ignores all signals including unstoppable kill
signals and hangs around until the next system reboot.
There is nothing to do, and even if it tries to exit by
calling exit(), it remains active and is listed as sleeping)
[what a great system!]

Thus, all my problems are with the UNIX fork, vfork, and exec
system calls hanging.  Has anyone seen this?  Do you have
a work-around?

Ken

PS: respond directly to me