[comp.unix.wizards] Why am I crashing my system?

bin@rhesus.primate.wisc.edu (Brain in Neutral) (11/16/88)

System: either VAXstation w/Ultrix 2.2 or VAX 8200 w/Ultrix 1.2

I have an rsh-like program: opens socket connections to remote port, tells
remote machine to execute program, forks.  Child reads local input sends
to remote command.  Parent reads remote stdout and stderr.  Signals to
parent get sent to remote command on a socket as well.

I get the following weird behavior on occasion, apparently only IF
remote machine is same as local, and if the remote command does a lot
of fast writing back to local:  some of the data gets stuck in the
network (Send-Q for "remote" end has non-zero count in netstat output,
and the local parent never sees it.  It's waiting for it, because gcore
gives a dump that shows it's in a select call.  What I don't understand
is that the child (which is done writing to remote command and is ready
to exit) is shown as a zombie by ps, AND the ps flags for the child
include SSEL (=400000), which indicate that *it* is selecting!  How can
this be?  gcore on the child fails (gcore says "Zombie").

1) Why does ps show the child and not the parent as selecting when the child
is exiting and the parent is selecting?
2) Why does the parent select fail when there is actually something to
read (or why is the output stuck in the network?)

Now the ugly part of the above scenario.  The remote command has
finished, it's just that the local parent is hung up waiting to receive
the rest of the output.  Ok, fine, says I, I'll just ^C it.  That's
supposed to send a signal into the socket.  Of course, that socket goes
nowhere.  The system dies with a segmentation violation.  This does not
seem friendly to me.  Why does it occur?  Alternatively, how do I tell
that I'd better not write into that socket?

Paul DuBois
dubois@primate.wisc.edu

bin@primate.wisc.edu (Brain in Neutral) (11/18/88)

> Now the ugly part of the above scenario.  The remote command has
> finished, it's just that the local parent is hung up waiting to receive
> the rest of the output.  Ok, fine, says I, I'll just ^C it.  That's
> supposed to send a signal into the socket.  Of course, that socket goes
> nowhere.  The system dies with a segmentation violation.  This does not
> seem friendly to me.  Why does it occur?  Alternatively, how do I tell
> that I'd better not write into that socket?

My problem seems to be fixed by making sure that the client socket has
keepalive and linger turned on.  Apparently a socket was getting shut
down too early on one end and writing something into it from the other
end did nasty things.  I am still of the opinion that it would be more
reasonable for the system to report an error in this case than to
panic and crash, however.

Paul DuBois
dubois@primate.wisc.edu