ken@cs.cornell.edu (Ken Birman) (06/25/91)
A few weeks ago I suggested a simple loop for detecting process failures using the kill() system call, as a hack for a situation where SUN OS might fail to report a broken pipe. Some new insight on this: 1) The broken pipe business is much less common than I appreciated. If you see ISIS systematically fail to detect process termination in some situation, perhaps the process isn't really exiting, or perhaps someone forked a child and didn't close isis_socket, leaving a dup around that fools bin/protos into not noticing when the child exits. isis_disconnect() does this (it closes isis_socket and intercl_socket), and hence should be called after a fork/vfork in the child process I tried to write a test program to demonstrate this bug and actually did see it perhaps once in a hundred runs, but it clearly depends on something uncommon happening just when the pipe breaks -- paging activity, perhaps. Usually, SUN OS detects the condition perfectly. 2) the kill() solution works quite well, but you can't use the debugger on the programs being probed this way, since you get a debugger trap every few seconds. So, if you use this, you will not be able to use dbx/gdb on the active program, a strong disadvantage in my opinion. My plan is to make this an compile time option to protos, disabled by default. Ken -- Kenneth P. Birman E-mail: ken@cs.cornell.edu 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office) Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428
chip@tct.com (Chip Salzenberg) (06/27/91)
According to ken@cs.cornell.edu (Ken Birman): >2) the kill() solution works quite well, but you can't use the debugger > on the programs being probed this way, since you get a debugger trap > every few seconds. So, if you use this, you will not be able to use > dbx/gdb on the active program, a strong disadvantage in my opinion. First, at least on System V, you can kill with the pseudo-signal zero. This feature provides a "process exists" test. Second, gdb allows you to specify that specific signals should not cause a debugger trap. See the gdb "handle" command. -- Chip Salzenberg at Teltronics/TCT <chip@tct.com>, <uunet!pdn!tct!chip> "I want to mention that my opinions whether real or not are MY opinions." -- the inevitable William "Billy" Steinmetz
ken@CS.Cornell.EDU (Ken Birman) (06/28/91)
In article <2869F793.4F50@tct.com> chip@tct.com (Chip Salzenberg) writes: >... on System V, you can kill with the pseudo-signal zero. >This feature provides a "process exists" test. This is really interesting! I would appreciate it if ISIS users could do the following and let me know if they get the "wrong" result % kill -0 12345 (any non-existent process id) On SUN OS I get "Non existent process id", which is just what I would hope for. If anyone gets "Bad signal number", let me know what type of UNIX system you are on. I feel a bit uncomfortable relying on a non-documented UNIX feature, but there are definite advantages to having a fall-back way to detect failure. I checked and on SUN OS, at least, the debugger never sees an interrupt when you use signal 0. In contrast, signal IO definitely causes debugger activity, and even if there is a way to tell gdb to ignore it, most ISIS users would find that pretty annoying. So, thanks for a great suggestion! (Also, thanks to Max Heffler and Compaq for contribuing their 386 version of ISIS -- I know this was a lot of work for them and I hope that ISIS users will find it valuable. The same version of ISIS should be fairly easy to move to the SCO UNIX, for those who prefer SCO to straight System V) -- Kenneth P. Birman E-mail: ken@cs.cornell.edu 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office) Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428