[net.unix-wizards] Hangups about vhangup

joe@fluke.UUCP (Joe Kelsey) (08/02/84)

vhangup() doesn't do ALL of what I want it to do.  What it DOES do
quite well is turn off I/O to the particular tty in question when
someone logs out with background processes still running.
Unfortunately, the background process still has the tty in question
open, so the various reference count fields associated with the
terminal never drop to zero, and if it happens to be set HUPCLS, the
tty close routine never gets called to actually hang up the line.  This
is a problem especially if you have your terminals connected to a port
selector (like a Micom 600), since the line won't be automatically
disconnected at logout.  I've been staring at kernel source all morning
and I can't really see any easy way to fix vhangup().  One possibility
would be to actually call closef() from within the inner loop of
forceclose(), but this might cause problems when the process in
question eventually issues its own close().  I can't really tell if it
WOULD cause a problem, although I suspect that there are a lot of race
conditions  possible.  If I could substitute a reference to /dev/null
somehow that would be the best, but I'm not sure exactly how I would do
all this from the kernel without causing lots of trouble.  I can think
of lots of potential panic()s that could arise here.  Does anyone have
any good ideas about the best way to try to approach modifying
vhangup()s behavior?

I guess the practical hack is to modify the shells to toggle DTR just
before they exit at logout.  This is NOT what I really want - I'm not
even sure if I can do that to /bin/sh - I'm sure another hack couldn't
hurt csh any!

Well, I guess I'll just go back and stare at the kernel code some more.
I'll either get a sudden inspiration or maybe just catch up on my
beauty sleep!

/Joe

joe@fluke.UUCP (Joe Kelsey) (08/03/84)

I have been staring at the code more, and am on the verge of inserting
some new code.  What I am thinking of doing is calling the device close
routine from vhangup() if it detects any processes still running.  Now,
these processes may or may not terminate when they receive the SIGHUP
that vhangup() sends.  If they do, fine.  If they ignore or otherwise
catch SIGHUP, then my original problem occurs.  What I propose to do is
change the vhangup code as follows:

#ifdef FLUKE
	i = forceclose (u.u_ttyd);
	if ((u.u_ttyp -> t_state) & TS_ISOPEN) {
		gsignal (u.u_ttyp -> t_pgrp, SIGHUP);
		/* Insert call to device close routine here */
		printf ("vhangup: dev %d/%d forceclose returns %d\n",
			major (u.u_ttyd), minor (u.u_ttyd), i);
	}
#else
	forceclose (u.u_ttyd);
	if ((u.u_ttyp -> t_state) & TS_ISOPEN)
		gsignal (u.u_ttyp -> t_pgrp, SIGHUP);
#endif FLUKE

As you can see, it also involves a trivial change to forceclose() to
return the number of file descriptors that it found associated with the
device.  I am going to run with the printf statement for a while before
I decide whether or not I need to use the number returned in a
conditional call to the device close or whether the test for the device
being open is sufficient.  It seemed to me that this was the solution
least likely to cause problems with file and inode counts, but still
have the effect of dropping DTR on the line and hanging up the
connection.

Any comments about this proposal?  After I run with the printf in the
kernel for a week or two I'll be able to tell more about how often the
problem really occurs.

/Joe

pc@ukc.UUCP (R.P.A.Collinson) (08/11/84)

The vhangup problem discussed by Joe Kelsey and also the getty/dialup
problem are related.

One fundamental problem with the vhangup call is that it takes a file
descriptor and not a device name. This means that the terminal has to be
opened, vhangup'ed and then closed. This means that the act of opening
the device may foul up reference counts on inodes etc etc. I also suspect
that there is a fundamental problem in the fact that inodes for terminals
are never locked so that UNIX can scribble when you are inputting, there
are occasional race problems with the close code.

The terminals on our systems are connected via a network and we want all
terminals to detach cleanly from a machine on logout. The problem
is to do this in a benign way, not killing everything which a user has
carefully left running in the background. I have had various attempts
at this, and the current scheme is as follows:

1)	Invent a new system call (yes YET ANOTHER one) called
	hangtty(dev)
	char *dev;
	The dev argument is used to obtain an pointer to the incore inode
	for the terminal.

2)	Set a flag in this inode
	IDEAD	(UCB thoughtfully left a few spare bits)
	call the terminal close routine.

	We must now make sure that any further action on the inode
	does not get through to the device.

3)	A write/read to an IDEAD inode results in a SIGKILL being sent
	to the offending process.
	The user has no business leaving a background process running
	which writes or reads when the user has gone away.

4)	The call to get a new inode for the terminal (iget) ignores any
	inodes with IDEAD bits. So for short periods, there is more than
	one inode referring to the terminal. One is live, all the others
	have the IDEAD bit set.

This scheme appears to work, and is in use at mcvax. I believe that certain
ex-UCB people don't like it because it cuts across the layering in the
kernel. But still.....
(On the UCB/System V discussion, Bill Joy did us all a disservice in csh when
he allowed people to leave background jobs running without having said
nohup).

Peter Collinson
University of Kent, UK
vax135!ukc!pc
mcvax!ukc!pc

joe@fluke.UUCP (Joe Kelsey) (08/14/84)

>From: pc@ukc.UUCP (R.P.A.Collinson)
>
>3)	A write/read to an IDEAD inode results in a SIGKILL being sent
>	to the offending process.
>	The user has no business leaving a background process running
>	which writes or reads when the user has gone away.

I object strenuously to any solution which generates a SIGKILL!  This
is completely bogus!  I can't write software which ignores this signal!
What you want is to quietly replace the inode referring to the terminal
with one referring to /dev/null and also send a SIGHUP.  The current
implementation correctly sends the SIGHUP, but for processes which are
either nohup or choose to ignore SIGHUP, this doesn't work.  Going
around killing processes for no reason is really and extremely bad
practice and should not be placed in any kernel code!

>(On the UCB/System V discussion, Bill Joy did us all a disservice in csh when
>he allowed people to leave background jobs running without having said
>nohup).

I see absolutely no disservice here.  You can do exactly the same thing
in System V.  The problem is NOT csh - the problem is a process which
CHOOSES to ignore SIGHUP!  Let's stop this ucb bad mouthing unless you
really know what you are talking about.

BTW - I have heard from jwp@sdchema that there is an implementation of
a new system call called chfile() which will substitute one file for
another and does EXACTLY what I (and chris@umcp-cs) want.  I am
anxiously awaiting more details about the code so I can install it
here.  I could write it myself, but why re-invent the wheel.

/Joe