[comp.unix.wizards] Strange SUN behaviour.

paj@hrc63.uucp (Mr P Johnson "Baddow") (10/06/89)

We have a strange bug here when running certain programs, some of
which use a lightweight process package we have written.  This package plays
tricks with the stack and also does some signal management for non-blocking
I/O.  It has been used before without problems, but some changes have
recently been made to the i/o buffering routines.

When one of these programs is killed by a signal (even kill -9), the shell
it is running in dies as well.  Under rsh a stream of "Use 'logout' to log
out" messages is printed before the shell dies.  Debuggers also die.  The
behaviour seems to be consistent with a stream of ^D characters being sent
to the terminal (nested shells all die, so it is not just the parent shell).

On one occasion, an attempt was made to start up a shelltool after a
previous one had died and the new shelltool died as well, as did subsequent
shelltools.  Creating two shelltools in quick succession got round this: it
seemed that ^Ds were being sent to /dev/ttyp4, killing any shell which took
that as standard input.  Two minutes later the computer crashed.

This problem has been observed on a Sun 3/160 and a Sun3/260 running SunOS
4.0.3_EXPORT and a Sun 3/60 running 4.0_Export.  The programs in question
were compiled on the Sun 3/60 by the Oregon C++ compiler and by the
Glockenspiel/Oasys "Designer" C++ (a cfront variant).  All combinations show
the same symptoms.  The same problem occurs with terminal login over the
serial ports.

Does anyone know what is going wrong here?  Is this a known bug?  Is there a
work-around?

Any info or suggestions gratefully received.

Thank-you for your time.

-- 
Paul Johnson,         | `The moving finger writes, And having writ, moves on,'
GEC-Marconi Research  |                    Omar Kyham when contemplating `vi'.
------------------------------------------------------------------------------
The company has put a radio inside my head: it controls everything I say!

vlcek@mit-caf.MIT.EDU (Jim Vlcek) (10/09/89)

Mr. P Johnson "Baddow" is having problems with a process which, when
killed, takes its parent shell with it.  He gets a stream of "Use
'logout' to log out" messages just before the parent dies.  The
processes which exhibit this behavior use a lightweight process
package which performs some signal management for non-blocking i/o.
This is all going on under SunOS, and he's wondering what's going
wrong.

It's the non-blocking i/o that's doing it - I've run into the same
thing working under 4.3BSD, and I assume SunOS does non-blocking
terminal i/o in the same way.  When you set up the terminal for
non-blocking i/o, probably with a line like

  res = fcntl(fd, F_SETFL, FNDELAY);

as it would be in 4.3BSD, subsequent read()s return -1 with errno set
to EWOULDBLOCK if no input is available to be read.  This is in
contrast to the default action of blocking until input is available.

The problem is, this level of terminal attribute is unique to the
terminal - not to the process which sets the attribute.  Thus, if a
process sets up a terminal for nonblocking i/o using FNDELAY, and is
subsequently terminated without resetting, read()s performed by the
parent shell on the same terminal will return spurious EOF conditions.
Since the read()s executed by the shell do not block anymore, the
effect is the same as a (very fast) stream of ^Ds being sent from the
terminal.  Eventually, the shell tires of receiving these EOF
conditions, and exits.

This will happen if the process which set the non-blocking i/o is
interrupted (with, say, ^Z) or terminated abruptly by any signal.  One
might trap all signals which stop or terminate the process, in order
to reset the terminal before relinquishing it, but that's a major
hassle and there's always SIGKILL and SIGSTOP.

I think the best way to do it in BSD-derived systems is to use the
asynchronous i/o facilities of fnctl():

  res = fcntl(fd, F_SETFL, FASYNC);

This will cause a SIGIO to be delivered to the calling process when
input is available on descriptor fd.  This is better than simply
polling a nonblocking descriptor for two reasons: it is truly
asynchronous, and it doesn't leave the terminal in a funky state if
the calling process dies unexpectedly.

You're still not completely out of the woods, however, as terminal
input which triggers a SIGIO may still disappear if erased, and a
subsequent read() would then block.  You can also have an arbitrarily
large amount of input available, not just one character.  To handle
this, I wrap the actual read()s in my code with a nonblocking section:

  res = fcntl(fd, F_SETFL, FNDELAY);
  while ((newly_read = read(fd, buf, buf_size)) > 0) {
    /* process newly_read characters */
  }
  res = fcntl(fd, F_SETFL, FASYNC);

You still run a risk of being interrupted or terminated while in
nonblocking mode, but the risk is much reduced in that very little
time is spent in this section of code.  Further, the set of signals
which can interrupt or terminate the process can be reduced to
externally generated signals through proper debugging of the code in
the nonblocking section.  Thus, a SIGFPE, SIGBUS, or SIGSEGV generated
elsewhere in your application won't blow away the parent shell
anymore.

One might also use select(2) to avoid blocking on the terminal
read()s; I haven't yet tried this, and probably never will as I'm
coming to the conclusion that diddling with the terminal from within
an application is a Bad Idea in general.  I think the Right Idea is to
set up the terminal editor as a separate process, and pipe its output
to the application.  The application can do what it wants with the
pipe - catch SIGIOs on it, set it nonblocking, whatever - without
messing up the terminal state.  The terminal editor can operate in a
more sane mode, like CBREAK, and is better able to anticipate what
signals it might need to catch to restore the terminal's proper state
before exiting.  This scheme also isolates the (very system-dependent)
terminal handling code in its own process, making porting easier, and
it would further reduce somewhat the postings to comp.unix.whatever and
comp.lang.c asking ``How can I emulate kbhit() and getch()?''

Jim Vlcek  (vlcek@caf.mit.edu  vlcek@athena.mit.edu  uunet!mit-caf!vlcek)

richard@aiai.ed.ac.uk (Richard Tobin) (10/10/89)

In article <703@hrc63.uucp> paj@hrc63.uucp (Mr P Johnson "Baddow") writes:
>This package plays
>tricks with the stack and also does some signal management for non-blocking
>I/O.

>When one of these programs is killed by a signal (even kill -9), the shell
>it is running in dies as well. 

Sounds like the terminal is being left in non-blocking i/o mode - this
will cause reads by the shell to return EWOULDBLOCK.

You could put a wrapper round the program that resets the tty to blocking
i/o mode if it's killed; it just needs to do something like

   fcntl(open("/dev/tty"), F_SETFL, 0);

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

yh87@mrcu (Paul Johnson) (10/18/89)

A couple of weeks ago I posted a notice about the death of the parent
shell when certain programs exited.  Here is a summary of the mailed
responses.  Thanks to Steve Nuchia of Houston Public Access, Seth
Robertson of Columbia University, Larry Allen of the OSF, Casper Dik
of the University of Amsterdam and David DiGiacomo of Sun
Microsystems.  To anyone else who has sent a reply before reading
this, thank-you for your time.  The problem is now solved.

My program was using fcntl(F_SETFL,...) to set the FNDELAY flag on
descriptor 0 (non-blocking mode).  In this mode, if there are no
characters waiting to be read, a read() call returns 0 or -1
(depending on your version of Unix) instead of
blocking.  This is what I had intended.  What I had not intended was
that this should affect the parent shell.  When my program terminated
abnormally (say from a kill -9), it left this flag set.  I had assumed
that something like this would be local to the process which set the
flag.  The man page on fcntl says something about affecting all
descriptors which point to that device, but I thought it meant only
those descriptors in the same process.  I still think this is
something of a mis-feature in Unix.

The solution suggested by Steve Nuchia was to include a fork in my
program.  The child would execute the main area of the program and the
parent would wait for the child to terminate and then reset the
descriptor.  I did not like this, but had I been desparate I would
probably have done it.

Seth Robertson mentioned that ksh gets around the problem and
described a program called "background" which seems to implement Steve
Nuchia's solution (or something similar).

Larry Allen agreed with me that this is a bug in Unix, and remarked
that it would be a win to fix the csh.

The solution I adopted was to abandon the use of non-blocking mode.
Instead I use select() to determine if the descriptor has any
characters waiting and then read them one by one.  The only problem I
can see with this is the possibility of race conditions if two such
programs are trying to read from the same terminal: I think both would
get a SIGIO, both would be told by select() that characters are
available, and then one would get the character and the other would
block.  However I am not too worried by this.  Another solution I
considered was to set/reset FNDELAY around every call to read().  I
may yet do this.

Paul.

-- 
-----------------------------------------------------------------------------
Send replies to paj@uk.co.gec-rl-hrc, not the address above.  Thanks.
-----------------------------------------------------------------------------
GEC is not responsible for my opinions (says my shrink).

rowe@cme.nist.gov (Walter Rowe) (10/23/89)

--------------------------------------------------------
I tried replying directly, but it got sent back to me...
--------------------------------------------------------
Here is another possibility for you:

    ioctl (fd, FIONREAD, &nchar)
    int fd;
    int nchar;

The value returned in `nchar' is the number of characters waiting to
be read from file descriptor `fd'.

---
Walter Rowe, Sun Sys Admin
Robot Systems Division, NIST
rowe@cme.nist.gov

sms@WLV.IMSD.CONTEL.COM (Steven M. Schultz) (10/24/89)

In article <ROWE.89Oct23093452@rosie.cme.nist.gov> rowe@cme.nist.gov
(Walter Rowe) writes:
>Here is another possibility for you:
>
>    ioctl (fd, FIONREAD, &nchar)
>    int fd;
>    int nchar;
>
>The value returned in `nchar' is the number of characters waiting to
>be read from file descriptor `fd'.

	int nchar;

	should be 
	
	long nchar;

	sizeof (int) != sizeof (long)


	Steven M. Schultz
	sms@wlv.imsd.contel.com