[net.unix-wizards] Supporting call-waiting for dial-in users

draper (01/18/83)

(Note: those of you puzzled by the commented out HUPCLS call in the
distributed version of dzopen() will find a solution below.)

Here are the details of what to do to your system to support call waiting
for dial-in users (see companion article in net.general).  We are using
Vadic triple modems (models VA3437 & VA3481) and running 4.1bsd and dz
interfaces, but similar changes should work for dh-s.  Mark Wallen was
mostly responsible for working them out -- I provided the motivation i.e.
was a nagging user.

The aim is to have the modem's connection to the phone line, and the life
of the processes associated with the tty line, to depend on the presence of
carrier until the line is fully opened; and then to change to being
insensitive to carrier until the user explicitly kills those processes e.g.
by logging out.

1) Modem
1a)  set switch/jumper to not hang up on carrier loss.
1b)  set switch/jumper to hang up the phone line on DTR loss (from computer).
	(I.e. you need modem control.)

2) User's settings
	The idea is to use the presence of carrier (only) while establishing a
	connection, but to ignore its disappearance later on.
2a)  "stty nohang" in the user's .login file.
    	This prevents the SIGHUP signal being generated and killing the shell
	when carrier drops. 
The advantage of doing this with stty is:
a)  .login is the right time for the changeover from carrier dependence to
take place -- before then loss of carrier should kill the job (i.e. if the
user abandons login before completion), and anyway the user hasn't invested
time and effort in building up a set of jobs.
b)  It allows the effect to apply only to users who specifically ask for it
(by having this stty call in their .login), thus reducing the need for the
daemon to kill abandoned shells -- see below.

3) Clear the soft bits in the dzdriver which control the test for carrier
	-- a sysgen option (unfortunately, it really should be a runtime
	one). This ensures that the handler waits for carrier before
	completing the open(), so that getty doesn't print it's message
	into the ether.  The variable is dzsoftCAR and is used in the
	routine dzscan(). The flags should be cleared.  This can be done at
	sysgen time by setting to 0 the bits for the dial-in lines in the
	parameter following "flags" on the dz lines in /sys/conf/MACHINE. 

4)  Mod. to the dz driver.
	There is a timing hole left by the above for non-call-waiting
users, in that it is possible for the phone not to be hung up on final
close (because carrier is still present i.e. the user has not yet hung up)
but then for carrier to go away so that the following attempt by init to
open() the line hangs indefinitely waiting for carrier with the phone still
not hung up.
	The fix is in dzopen(), in the code that is executed for initial
open() i.e. if no other process already has an open channel to the tty line.
There is already commented out code there but it would not completely fix
the problem if used.  The fix drops DTR momentarily, thus hanging up the
phone line, on initial open if there is no carrier present.

The code is now:
	if ((tp->t_state & ISOPEN) == 0) {
		ttychars(tp);
		tp->t_ospeed = tp->t_ispeed = B300;
		tp->t_flags = ODDP|EVENP|ECHO;
		/* tp->t_state |= HUPCLS; */
		dzparam(unit);

and should be changed to:
	if ((tp->t_state & ISOPEN) == 0) {
		register int bit;
		register struct device *dzaddr;
		bit = 1<<(unit&07);
		dzaddr = dzpdma[unit].p_addr;
		if ((dzsoftCAR[unit>>3]&bit)==0 &&
			(dzaddr->dzmsr&bit)==0)
		{
			/* may need a delay after this */
			dzmodem(unit, DZ_OFF);
		}
		ttychars(tp);
		tp->t_ospeed = tp->t_ispeed = B300;
		tp->t_flags = ODDP|EVENP|ECHO;
		/* tp->t_state |= HUPCLS; */
		dzparam(unit);

5) Software daemon
	A daemon is needed to check for hung lines, where there is no
caller on the other end, and to free these lines by hanging them up.  This
is done by resetting the nohang bit which kills the jobs on such lines
including the shell and hangs up the phone line.  We just check for: (tty
lines coming from dialins && idle time > limit  &&  LNOHANG bit set on that
line).  Since only nohang is reset this is safe -- it is impossible for it
to hang up connected users while there is carrier present.
	In fact it looks for 3 cases: call-waiting users who have
apparently lost their connection, leaving the line tied up but inaccessible
(this is the important case); cases where carrier is present but getty owns
the line and has been idle for a while -- this occurs if a user logs out
but forgets to hang up -- this in not uncommon as a mistake with the modem
switches is easy to make; and cases where getty is hung without carrier
because nohang is still set (this will not happen if the fix to getty below
is installed).
	The daemon should be called by the following line in /etc/rc:
/etc/DialInDaemon;		echo -n ' DialInDaemon'		>/dev/console
	The source is posted to net.sources.

6) Mod. to getty (optional).
	The following line should be inserted in getty, say at the top of
main().  It is not necessary to make it tty-dependent.
	{ int i = LNOHANG;  ioctl(0, TIOCLBIC, &i); }
	This is necessary to reset the LNOHANG bit, which is otherwise left
set indefinitely until a final close() is executed, and to hang up the
phone in cases where the fix to dzopen() is ineffective because the
existence of background jobs prevented a final close() and therefore an
initial open() being executed.  It is optional in that the daemon catches
these after 3 min.s anyway and resets the bit, hanging up the phone.
Putting this code in login would be neater from the point of view of
resetting the bit for each new user, but would leave lines hung in getty.

NOTES
The crucial question is how do phone lines get hung up at the computer end
(thus freeing the dialin line for other users), and secondly how do jobs
associated with the line die.
	The main mechanism is the generation of the SIGHUP signal by
dzscan() when (carrier absent && LNOHANG bit is clear) becomes true.  This
condition occurs when carrier drops if the nohang bit is not set, or when
the bit is cleared.  When the condition is satisfied both the signal (which
will kill shells, init and getty as well as other programs) is sent and DTR
is dropped thus hanging up the phone line.

	The following cases are of interest:

1)  A user not using call-waiting, who does not set "stty nohang".  The line
behaves normally, and the computer hangs up only when carrier drops i.e. when
the user hangs up.  At that point his shell dies if he did not logout, or
getty dies if he did.  Init wakes up and hangs waiting for carrier to allow
the open() to complete.
	However there is a timing hole here.  If the user (with nohang not
set) hangs up prematurely, dzscan() detects loss of carrier, generates a
signal that kills the shell, and drops DTR.  If the user logs out, pauses
while init wakes up, does an open(), and execs to getty, then when s/he
hangs up the signal will again be generated.  However if s/he hangs up
after logging out but before init awakes -- we have seen this happen --
then the signal-generating code in dzscan() will not be executed because it
is dependent on the software bit representing carrier (CARR_ON) which is
cleared by the final close: it will not become sensitive to carrier again
until open() is completed.  The fix to dzopen() closes this timing hole by
causing DTR to be dropped, ensuring the phone is hung up, on all attempts
to open the line in the absence of real carrier.

2)  The user who sets "stty nohang" and logs out leaving background jobs.
Final close will not occur so "stty hup" would be ineffective.  Init will
complete open() without trouble, since other processes have the line open,
getty clears the nohang bit and the line is hung up then or when the user
hangs up, whichever is later.

3)  A user who sets "stty nohang", and logs out normally without leaving
background jobs.  If s/he also sets "stty hup" the final close will hang up
the phone;  otherwise this case becomes the same as (1) above -- the
changed code in dzopen() is indifferent to the nohang bit, while the latter
will be reset whenever the open() completes and getty is entered.  Thus if
the user hangs up late getty will be entered normally and the phone will be
hung up when the user hangs up; if s/he hangs up early, dzopen() will hang
up the phone.

4)  A user who set "stty nohang" and does not log out.  This is the case where
the user uses call-waiting to put the dial-in line on hold.  The job persists
and the phone line is not hung up.  If this lasts longer than the daemon's
threshold, the daemon clears the line's nohang bit thus hanging up the phone
and killing the job.

5)  As in (4) but the user has accidentally hung up.  The daemon would kill it
eventually but the user logs in on another line and kills the shell by hand
with kill.  If there are background jobs on that line, getty will wake and
hang the phone up.  If the user had set "stty hup", the final close will hang
the phone up.  However otherwise the line is not hung up until dzopen() drops
DTR.  This is a second motive besides the timing hole for making the fix to
dzopen().

	In all of the above, a substitute for getty resetting the nohang bit is
the daemon's detection of a getty on a hung phone line, after a short threshold
(3 minutes).  dzopen() will drop DTR and therefore hang up the phone whenever
an initial open() (by init) is attempted in the absence of carrier: this
eliminates the timing hole in which the signal-generating code is not executed
when carrier is lost.

PROBLEMS
1)  The dzclose() routine drops DTR (if stty hup enabled) *before* completing
the flushing of the dz hardware buffer, so you can lose the last few chars
put out to the home terminal.  This only happens with the old tty line
discipline, so we don't see it when logging out but we do if we run
information program logins like "w".
	This is a bug in the dz handler (and other tty handlers).

2)  An open() that was hanging waiting for carrier, and is aborted by the
caller (using alarm()) nevertheless has used up a process I/O channel.
Such programs must close() the fildes, even though open never returned the
number i.e. you must calculate the number.

3)  open() will not complete without carrier (if there are no existing
opens on that device).  Yet you need an open() to issue the ioctl to drop
DTR and hang up the line.  This means it is possible for a line to be hung
uselessly and not to be able to reset this by software outside the handler
-- hence in part the need to fix dzopen() -- even though the operation makes
sense.  (This occurs if DTR was not dropped on final close() due to a "stty
hup" call, and the shell died some other way than by a SIGHUP signal which
is always associated with dropping DTR.)

4)  The HUPCLS ioctl call is not what is wanted.  It can only hang the
phone up on final close, and if there are background jobs this will not
occur.  Because of the timing hole, it is not guaranteed that the phone
will be hung up on final close unless ALL users set it, not just those who
set "stty nohang".  And that is slightly undesirable because it results in
the phone being hung up from the computer end even if there is still
carrier i.e. with the user is still dialled in -- possibly wanting to login
under another name.  This is especially unfriendly if it is set early
before login -- then any typing error during login causes the phone to be
hung up so the user has to re-dial (login dies, final close is done prior
to init rolling round to getty and then login again).  This is presusmably
why that HUPCLS setting is commented out in dzopen().

4b)  The stty option for this, "stty hup", is undocumented, undisplayed
even if you do "stty everything", and irreversible.

Mark Wallen		  	 Steve Draper
UCSD, San Diego			 UCSD, San Diego
ucbvax!sdcsvax!sdcsla!wallen	 ucbvax!sdcsvax!sdcsla!draper
wallen@nprdc			 draper@nprdc