[comp.sys.amiga] Apollo unkillable processes

anderson@atc.sps.mot.com (howard anderson) (09/18/90)

I really need help.  (Apollo Release 10.2 with patch m0118.)
I am having difficulty killing certain processes.  Sometimes the processes
can be killed easily.  Sometimes they can't.  Some random factor is
at work.  When whatever it is does go wrong the process cannot be killed.
Sigp -q doesn't work.  Sigp -s doesn't work.  Sigp -b works but sometimes
appears to destroy important parts of the operating system so that the node has
to be shut down.  Here is a traceback of one of these unkillable processes:

   $ tb -t b.report
   --------------------
   Task 514 "RPC reply acknowledger" (active) 
   In routine  system service "ec2_$wait"
   Called from "task_$sched" line 2702
   Called from "task_$ec2_wait" line 1411
   Called from "ec2_$wait_svc" line 164
   Called from "sleep" line 53
   Called from "periodic_task" line 1494
   Called from "task_$base_proc" line 874
   --------------------
   Task 0 "distinguished_task" (distinguished task) (ready) 
   In routine  "<UID 00000407.004A006A>" offset 7FFFF55C
   Called from "task_$ec2_wait" line 1411
   Called from "ec2_$wait_svc" line 164
   Called from "ec2_$wait" line 203
   Called from "task_$handle_mark_release" line 1942
   Called from "pm_$proc_release" line 2151
   Called from "pgm_$invoke_uid_pn" line 1160
   Called from "pm_$init" line 834
   --------------------
   Task 257 "reaper_task" (waiting) 
   In routine  "<UID 00000407.007600B2>" offset 7FFFEE48
   Called from "task_$ec2_wait" line 1411
   Called from "ec2_$wait_svc" line 164
   Called from "ec2_$wait" line 203
   Called from "task_$reaper_task" line 2248
   Called from "task_$base_proc" line 874
   

Now it looks to me like these are all Apollo routines and that the
user tasks have all been eliminated.  Apollo response center people
agreed that this was the case.  They said that their system routines
may be waiting for some resource that a third-party vendor didn't release.
Since all user code AND the third party vendor code has been sucessfully 
blown away at this point it looks like we will be waiting here a long time. 
(The Apollo response center is closing my call.  They told me to contact
the third-party vendor because it is obviously a problem in the third-party
vendor code.) 

Questions I have for you are these:

  1.  This situation runs counter to my philosophy regarding the
role of an operating system.  The user task has been eliminated
by the operating system.  So now we wait forever for an event 
that cannot happen?  I would not have expected the operating 
system to lose control in a case such as this.  Are my expectations
too high??

  2.  Has anyone else seen processes that cannot be killed with
a sigp -s??  Perhaps I am the only Apollo user with this problem.

  3.  Does anyone know a way to fake out the ec2_wait tasks and make
them think they got what they are looking for?  How much damage do you
think would result if one could do this?

  4.  Does anyone know what blasting processes such as these actually
does to the operating system??  The server_process_manager sometimes
exits.  Is this a possible effect of blasting a process such as the
above??

  5.  When processes such as the above that are using sio lines are blasted,
the sio lines are left "locked".  They cannot be unlocked since they are
not really "locked objects".  I found that copying /dev/sio1 to /dev/siox
then deleting /dev/sio1 then changing the name of /dev/siox to /dev/sio1
will restore /dev/sio1 to service.

  6.  If you are using DANFORD serial lines as well, they become "locked"
in a similar manner.  Copying them and changing their names does not work.
The ssiomonitor must be killed and restarted.  This means that all consoles
served by the ssiomonitor must be shutdown and restarted in order to restart
one line!    

  7.  The group id of a forked child process is sometimes set to zero.
This seems to occur randomly about 20 percent of the time.
When the parent is killed, the child is not killed.  Has anyone
else seen this problem?  (This seems to be unrelated to the unkillable
process problem but perhaps in some way it IS related?)  

PLEASE HELP!