[comp.sys.isis] More on use of kill

ken@cs.cornell.edu (Ken Birman) (06/25/91)

A few weeks ago I suggested a simple loop for detecting process failures
using the kill() system call, as a hack for a situation where SUN OS might
fail to report a broken pipe.

Some new insight on this:

1) The broken pipe business is much less common than I appreciated.
   If you see ISIS systematically fail to detect process termination
   in some situation, perhaps the process isn't really exiting, or perhaps
   someone forked a child and didn't close isis_socket, leaving a dup
   around that fools bin/protos into not noticing when the child 
   exits.  isis_disconnect() does this (it closes isis_socket and 
   intercl_socket), and hence should be called after a fork/vfork in
   the child process

   I tried to write a test program to demonstrate this bug and actually
   did see it perhaps once in a hundred runs, but it clearly depends on
   something uncommon happening just when the pipe breaks -- paging
   activity, perhaps.  Usually, SUN OS detects the condition perfectly.

2) the kill() solution works quite well, but you can't use the debugger
   on the programs being probed this way, since you get a debugger trap
   every few seconds.  So, if you use this, you will not be able to use
   dbx/gdb on the active program, a strong disadvantage in my opinion.

My plan is to make this an compile time option to protos, disabled by
default.

Ken
-- 
Kenneth P. Birman                              E-mail:  ken@cs.cornell.edu
4105 Upson Hall, Dept. of Computer Science     TEL:     607 255-9199 (office)
Cornell University Ithaca, NY 14853 (USA)      FAX:     607 255-4428

chip@tct.com (Chip Salzenberg) (06/27/91)

According to ken@cs.cornell.edu (Ken Birman):
>2) the kill() solution works quite well, but you can't use the debugger
>   on the programs being probed this way, since you get a debugger trap
>   every few seconds.  So, if you use this, you will not be able to use
>   dbx/gdb on the active program, a strong disadvantage in my opinion.

First, at least on System V, you can kill with the pseudo-signal zero.
This feature provides a "process exists" test.

Second, gdb allows you to specify that specific signals should not
cause a debugger trap.  See the gdb "handle" command.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.com>, <uunet!pdn!tct!chip>
 "I want to mention that my opinions whether real or not are MY opinions."
             -- the inevitable William "Billy" Steinmetz

ken@CS.Cornell.EDU (Ken Birman) (06/28/91)

In article <2869F793.4F50@tct.com> chip@tct.com (Chip Salzenberg) writes:
>... on System V, you can kill with the pseudo-signal zero.
>This feature provides a "process exists" test.

This is really interesting!  I would appreciate it if ISIS users could
do the following and let me know if they get the "wrong" result
	% kill -0 12345 (any non-existent process id)

On SUN OS I get "Non existent process id", which is just what I would
hope for.  If anyone gets "Bad signal number", let me know what type of
UNIX system you are on.

I feel a bit uncomfortable relying on a non-documented UNIX feature,
but there are definite advantages to having a fall-back way to detect
failure.

I checked and on SUN OS, at least, the debugger never sees an
interrupt when you use signal 0.  In contrast, signal IO definitely
causes debugger activity, and even if there is a way to tell gdb
to ignore it, most ISIS users would find that pretty annoying.

So, thanks for a great suggestion!

(Also, thanks to Max Heffler and Compaq for contribuing their 386
version of ISIS -- I know this was a lot of work for them and I
hope that ISIS users will find it valuable.  The same version of 
ISIS should be fairly easy to move to the SCO UNIX, for those who
prefer SCO to straight System V)

-- 
Kenneth P. Birman                              E-mail:  ken@cs.cornell.edu
4105 Upson Hall, Dept. of Computer Science     TEL:     607 255-9199 (office)
Cornell University Ithaca, NY 14853 (USA)      FAX:     607 255-4428