[comp.sys.sun] Pty problem on sun4's under SunOs 4

idall@augean.ua.OZ.AU (Ian Dall) (03/08/90)

We are seeing an interesting problem on our Sun 4's running SunOs 4.03.
Sometimes pty's seem to become stuck in some funny state so that anything
that tries to write to them dies. I see this most often with emacs
sub-processes but sometimes it also prevents rlogins.

The open of the pty must be returning success as that is how emacs (and I
assume most processes) knows it has found a "free" pty. The sub process
really does start (as seen from top or ps) but as soon as it does any
output it dies. The pty seems to remain in this state indefinitely and the
only hope of getting your rlogin or what ever going is to hope (or
arrange!) that another process grab that pty and hang on to it while the
desired process opens the next pty.

As far as we can tell, this is not happening on our sun 3's also running
SunOs 4. Has anyone else seen this? Is Sun doing anything about it?

 Ian Dall           life (n). A sexually transmitted disease which afflicts
                              some people more severely than others.
idall@augean.oz

tonyr@tekadg.adg.tek.com (Tony Rick) (03/10/90)

In article <5586@brazos.Rice.edu> idall@augean.ua.OZ.AU (Ian Dall) writes:
>X-Sun-Spots-Digest: Volume 9, Issue 73, message 8
>
>We are seeing an interesting problem on our Sun 4's running SunOs 4.03.
>Sometimes pty's seem to become stuck in some funny state so that anything
>that tries to write to them dies. I see this most often with emacs
>sub-processes but sometimes it also prevents rlogins....

I've seen this a couple of times in here and on sun-managers.  The problem
shows up here, too.  We have a SUN 4/370 in a server configuration running
4.0.3.  All of the terminals except the console (tvi970 on ttya) are
connected through a Xylogics Annex II ethernet t erminal server on the
local net.  It shows up as rlogin refusal with the response "Connection
closed".

The SUN hotline guy I talked to said there was a known bug (ID# 1014706)
causing this.  I would appreciate it if someone could confirm that.  That
bug is supposed to be fixed in 4.1 (according to mr. hotline).  The
workaround I use is this:

o   look at 'who' to see which tty is hung up.  It's usually the first one
    missing in an otherwise continuous list of assigned ttys.  If not, it
    would be the next one in the series.

o   remove the entry for that tty in /etc/ttytab

o   remove the corresponding tty and pty entry in /dev

o   kill -HUP 1 to reset init

This is a small hassle.  When you reboot, you have to remember to put the
tty/ptys back that you removed.  You will have to reboot eventually, since
this workaround guarantees that you will eventually run out of ttys.

The other fixes are to reboot, or as Ian said, capture the process with
the offending tty attached and stash it. 

Some sun-managers responded when I sent this out to them.

Brian Parent <bparent@ucsd.edu> said:
>
>Your fix sounds a bit more drastic than what I've found
>to work.  I find out what other processes are running on
>the offending tty, and kill them off.  Thats it.

Andy Sherman (andys@ulysses.att.COM) said:
>
>I've got a tip on how to handle restoration without having to strain
>your memory.  I have a file on all my systems called /etc/nextboot.
>When I need for something to happen on the next boot, I put it in there.
>A section in /etc/rc.local runs the file and then clears it.

Any @SUN.COM watchers out there?  Is this the right bug ID?  Is there a
fix?  What gives?

Tony Rick
Tektronix, Inc.  Beaverton, OR
Internet: tonyr@tekadg.adg.tek.com
Voice: 503-627-2942