ken@gvax.cs.cornell.edu (Ken Birman) (08/19/89)
I decided to fix ISIS on the Apollo UNIX system, and started hacking... now I'm at a bit of a dead end and need some help from a hacker who knows their lightweight processes facility moderately well. The situation is this. First, ISIS V1.2 has a minor assumption that Apollo turns out to violate, namely that the id of a thread is always non-null. Turns out that on Apollo, the id of the system stack actually is null, and this confused clib. (A fix is to initialize isis_main_task.threadid to something very implausible, like 1, in isis_tinit()). Testtasks works fine at this point, so task switching seems to be solid. However, I am seeing two problems: 1) If I call vfork() early in execution, before even calling my task init routine, I get a message from the Apollo UNIX that it "can't unwind my stack because it is corrupted". I switched this call to a fork and got past it. 2) If I call fork after initializing my task package, e.g. acquiring a few locks and forking some tasks, the child process hangs immediately on startup. Even if I unlock all mutex variables before forking it. If I call vfork instead, the child starts running but the parent hangs. Then the child hangs too, when it calls execve. 3) FInally, when I kill isis off on the Apollo, the UDP sockets I created usually don't go away, ever. That is, with no processes connected to them and no data on them, the sockets linger anyhow, for hours at a minimum. Oddly, if I hand-start ISIS to avoid the fork and exec's it runs fine. I ran several demos and they are all robustly healthy. The only problem is that ISIS is unable to fork off new heavyweight processes or exec new binaries. And, that the UDP sockets hang around. Help! Obviously Apollo UNIX release 10.1 is a bit "sensitive"! What can be done about this? If you know, I need your advice. Notes: ISIS implements condition variables as mutex variables on the Apollo. I lock them by calling mutex_$lock() and then pfm_$enable_faults() immediately after, and later unlock them by calling mutex_$unlock(). Basically, everything else I do is very simple. When running, a task always holds locks on a mutex called isis_mutex and on another called isis_ctp->task_runme. All other tasks are always blocked, either trying to lock isis_mutex or (more often) trying to lock tp->task_runme in some other task descriptive structure. At init time for each task we "pre-lock" task_runme, so that a task blocks by relocking it, and is unblocked by someone else unlocking it. As I mentioned, this definitely works correctly because all of ISIS and also the test programs run. The pfm_$enable() business is a work-around for a known Apollo UNIX bug, namely that all signals and faults are disabled and left disabled by the mutex_$lock routine. (if you don't do this call after each and every mutex_$lock, your process ignores all signals including unstoppable kill signals and hangs around until the next system reboot. There is nothing to do, and even if it tries to exit by calling exit(), it remains active and is listed as sleeping) [what a great system!] Thus, all my problems are with the UNIX fork, vfork, and exec system calls hanging. Has anyone seen this? Do you have a work-around? Ken PS: respond directly to me