[comp.unix.wizards] pty bugs & features

libes@cme.nist.gov (Don Libes) (08/23/90)

I've recently done some work with ptys and thought I'd share my
experiences with you - especially since the manuals weren't very
sharing for me.  I'm interested in comments.

The Sun 3/60 used is running SunOS4.1
The DecStation 3100 used is running V2.1.14

1) After slave side closes fd, master side read() returns -1.

master reads X seconds after slave close    Sun errno	Dec errno
             0				      5 (EIO)    35 (EWOULDBLOCK)
	    10				      5 (EIO)     5 (EIO)
	    20				      5 (EIO)	  5 (EIO)

I had expected read() to return 0 to indicate EOF.  The Sun engineer
said the manuals are in error not to document this behavior, but could
not explain why the driver was written this way.  Can anyone?

I didn't bother to call Dec, but I couldn't find that behavior
documented either, nor can I rationalize it.  (The fd was NOT marked
no-delay.)

2) After slave writes data and closes fd, the master side reads:

master reads X seconds after slave close	Sun	Dec
             0					data	data
            10					data	data
            20					-1,EIO	data

The Sun manual actually does document this, but doesn't phrase it
quite the way I'd say it.  Specifically, it is a byproduct of the
underlying streams implementation - "close() waits up to 15 seconds,
for each module and driver, for any output to drain before dismantling
the stream."

In other words, if you don't read your data quickly enough, you lose it!

The Dec behavior is what I would've expected.  The Sun engineer could
not explain where the number 15 came from, although he was kind enough
to point out that Sun Consulting could change it on my machine for a
small fee.  Otherwise it is not user-settable.

He said no one had ever reported these as bugs before.  He added that
they might their implementation [both (1) and (2)] in the future but
made no guarantees.  He did guarantee to change the manual, however.

Don Libes          libes@cme.nist.gov      ...!uunet!cme-durer!libes

guy@auspex.auspex.com (Guy Harris) (08/24/90)

>The Dec behavior is what I would've expected.  The Sun engineer could
>not explain where the number 15 came from,

It comes from the AT&T S5R3.x source code.  Where it got the number, I
don't know.

libes@cme.nist.gov (Don Libes) (08/24/90)

In article <3948@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>The Dec behavior is what I would've expected.  The Sun engineer could
>>not explain where the number 15 came from,
>
>It comes from the AT&T S5R3.x source code.  Where it got the number, I
>don't know.

Uhhh, this wasn't quite the answer I was looking for.  Let me rephrase
the question(s):

Why do pty's return EIO instead of 0 upon EOF?
Is SunOS, Ultrix, or neither doing the "right" thing?
If Ultrix is BSD-based and SunOS is SV-based, obviously this EIO
behavior is common practice, yet I don't find any documentation on it,
nor could a company engineer explain it.  What's the rationale?

Why do stream's dump their buffers when the writer closes?  I would
think this could be a problem with other drivers besides ptys.  Or am
I confused and is this just an error in the way the pty driver uses
streams?  Is it possible to change this behavior using a flow control
option without taking a severe hit in efficiency?  (The manual alluded
to this but didn't give enough to go on.)

What is it you are expected to do in 15 seconds?  It sure seems
unusually large for internal system cleanup purposes.  Yet it is
worthless for user purposes if you can't make it infinite.  Why can't
you change this number?

And finally, why does everyone answer easy questions with "[long essay
deleted] and this should probably be in the FAQ if it isn't already"?
That indicates to me that neither the asker nor the answerer has read
the FAQ.  (This is a rhetorical question at the moment since the FAQ
has slipped off the face of the earth.  Hey, repost that sucker!)

Don Libes          libes@cme.nist.gov      ...!uunet!cme-durer!libes

guy@auspex.auspex.com (Guy Harris) (08/26/90)

>Why do pty's return EIO instead of 0 upon EOF?

Ask Berkeley.  The standard 4.3BSD pseudo-tty driver returns EIO if
nobody's holding the slave side open. 

>Is SunOS, Ultrix, or neither doing the "right" thing?

Beats me, what's "the 'right' thing?"

A program can probably deal with any sort of indication that the slave
side is closed, either a zero-length read or -1 and EIO.

If 4.3BSD compatibility is considered important, returning -1 and
setting "errno" to EIO is "the 'right' thing."

If one considers a zero-length read to be philosophically correct, or if
it simplifies programs that run on the master side of a pseudo-tty, or
something like that, returning -1 and setting "errno" to EIO isn't "the
'right' thing."

>If Ultrix is BSD-based and SunOS is SV-based,

SunOS is based on both BSD and S5, and also includes Sun stuff based on
neither of them.

SunOS's pseudo-tty driver is more based on the BSD one than on the S5
one, the fact that the slave side (but not the master side) in SunOS 4.x
is STREAMS-based nonwithstanding.

The S5R4 pseudo-tty subsystem (it's more than just a driver, it includes
a couple of STREAMS modules) returns an EOF (zero-length read)
when the slave side closes, rather than returning EIO.

>obviously this EIO behavior is common practice, yet I don't find any
>documentation on it, nor could a company engineer explain it.  What's
>the rationale?

Ask Berkeley, it was their idea.  We preserved it at Sun, and I assume
DEC did the same thing.  Other vendors probably did so as well.

>Why do stream's dump their buffers when the writer closes?

It has to do *something* with them when the queue is deleted, since
they're attached to that queue....

>I would think this could be a problem with other drivers besides ptys.

It could be.  Unfortunately, just about *any* close behavior is going to
screw *somebody*.

Waiting forever for output to drain can lock up a tty port forever if it
gets ^S'ed and there's output waiting.

Un-^S-ing when the port is closed screws terminals that depend on strict ^S/^Q
behavior (yes, this actually happened).  (System V Release "1"-to-3
behavior.)

Giving the port a finite amount of time to drain and then flushing
output means you can lose output if the port stays ^S'ed for too long. 
(SunOS 4.0[.x] and maybe S5R4 behavior; also S5R3 behavior if the vendor
has made any streams-based ttys.)

The ideal, at least for tty ports, is *probably* to wait until ^S is
received or "carrier" goes away (real carrier in the case of serial
ports; on pseudo-ttys, wait until ^S is recieved or the process on the
master side goes away), but I don't guarantee that this won't screw
anybody, either.  (This is, I think, what 4.3BSD does.)

>Or am I confused and is this just an error in the way the pty driver uses
>streams?  Is it possible to change this behavior using a flow control
>option without taking a severe hit in efficiency? (The manual alluded
>to this but didn't give enough to go on.)

I'm not sure where it alludes to this, nor why it does, nor what it
means by "a flow control option".

You can tweak processes on the slave side to wait for output to drain
using the TCSBRK "ioctl", but this means you have to change those
processes.

As I remember, we decided to change SunOS 4.1 to, in effect, wait
forever for output to drain, by having the "ldterm" streams module do
said "ioctl" internally for you as part of its "close" procedure, before
its queue gets destroyed and before any of the queues below it get
closed.

>What is it you are expected to do in 15 seconds?  It sure seems
>unusually large for internal system cleanup purposes.

It's not for internal system cleanup purposes; it's waiting for output
to drain.

>Yet it is worthless for user purposes if you can't make it infinite.  Why
>can't you change this number?

Ask AT&T, it was their idea.  I think I tried to sell them on having an
"ioctl" to change it at one point.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (08/27/90)

In article <3954@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
> >I would think this could be a problem with other drivers besides ptys.
> It could be.  Unfortunately, just about *any* close behavior is going to
> screw *somebody*.

Not necessarily.

> Waiting forever for output to drain can lock up a tty port forever if it
> gets ^S'ed and there's output waiting.

This is the correct behavior. The difficulties with locking up tty ports
are reflections of two different problems: first, that ptys aren't
dynamically allocated in 4BSD; and second, that standard ttys exist at
all. Hardwired /dev/tty* should be replaced with raw /dev/modem* and so
on; *all* tty use should go through a common interface provided by a
pseudo-terminal session manager. This would solve many problems at once.

---Dan

boyd@necisa.ho.necisa.oz (Boyd Roberts) (08/27/90)

In article <3954@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>
>It could be.  Unfortunately, just about *any* close behavior is going to
>screw *somebody*.
>

This is the _classic_ `virtual circuit problem'.  The problem of
deciding what is circuit shutdown [error] and what is end of data,
and which is appropriate.  You've got to make _all_ the right choices,
and some of them are _hard_.

The way I like to think about it is the way pipes work.  A close
on a pipe indicates EOF to the reader.  But, a write on a pipe
with no one to read it is an error (SIGPIPE/EPIPE).  But, to
generalise this correctly you need to be able to say `kill this
circuit for me because an error's occurred', so that one
end can say to the other that somethings up.

I say that each protocol layer should be self contained and _clean_.

Now, the ISO people are not going to like this, but with virtual
circuits you require two ways to shutdown a circuit at the protocol
level itself, and not make it the responsibility of the layer above.

I remember all too well the existential horror when I realised (while
writing this X.25 `spool across the wire' print server/client) that
when I said close() it shut the circuit down -- _right now_!!  No
waiting for the data to arrive at the other end -- nothing.  I had
to write this _revolting_ gore, using the Q bit to say:

    X.25 software ABC		    X.25 software DEF

    Client: write(You got that?|Q_BIT) 
				    Server: read() You got that?|Q_BIT
    				    Server: write(I got it|Q_BIT)
				    Server: hangup circuit after `I got it'
					    is delivered
    Client: read() I got it|Q_BIT
    Client: close()		    Server:	close()
    Client: exit()		    Server:	Loop

I didn't want to write any file transfer protocol -- why should I?  After
all, I was using a reliable, sequenced, unduplicated, connection based
virtual circuit.  I just wanted close() to block correctly and for a
subsequent server read() to return 0.  But, X.25 software ABC had an
`interesting' idea about virtual circuits.

So I got to thinking that this was just _wrong_ and that Dennis* did it right.

Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

* pure, vanilla, no foul gore, straight streaming V8 stream code

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/27/90)

In article <6038@muffin.cme.nist.gov> libes@cme.nist.gov (Don Libes) writes:
>Why do pty's return EIO instead of 0 upon EOF?

If they do this, it is clearly wrong and would most likely be due to
UNIX development now being done, or at least directed, by people who
don't understand UNIX.  read() should not return -1 upon encountering
normal EOF on ANY object.  If the end of the stream is due to a
communication link failure, for example, then an error indication
would in fact be preferable to an undifferentiated EOF indication.

>Why do stream's dump their buffers when the writer closes?

They're not supposed to; do they really do that?  Is USO really
screwing up UNIX to such an extent??

>What is it you are expected to do in 15 seconds?

There is no justification for penalizing an application for not
consuming all the data buffered in the kernel within 15 seconds.

I lost track of the origin of this thread; if this timeout is supposed
to be related to the TCP protocol, my guess would be that somebody has
yet again tripped up on the "FIN_WAIT_2" issue.

>And finally, why does everyone answer easy questions with "[long essay
>deleted] and this should probably be in the FAQ if it isn't already"?
>That indicates to me that neither the asker nor the answerer has read
>the FAQ.

Not really, usually it indicates that the responder has not memorized
the FAQ list but feels that the question should be there and that either
the asker has failed to read the FAQ list or else the question wasn't
there.  (Or, as in in your case,

>... at the moment ... the FAQ has slipped off the face of the earth.

)

guy@auspex.auspex.com (Guy Harris) (08/28/90)

>>Why do stream's dump their buffers when the writer closes?
>
>They're not supposed to; do they really do that?

If by "dump their buffers" the original poster meant "throws data away
if it isn't sent downstream in 15 seconds", the answer is "yes, they
really do that".

guy@auspex.auspex.com (Guy Harris) (08/28/90)

>This is the correct behavior. The difficulties with locking up tty ports
>are reflections of two different problems: first, that ptys aren't
>dynamically allocated in 4BSD; and second, that standard ttys exist at
>all. Hardwired /dev/tty* should be replaced with raw /dev/modem* and so
>on; *all* tty use should go through a common interface provided by a
>pseudo-terminal session manager.

Even "ttys", i.e.  serial ports, to which, say, a printer or plotter is
attached?

What happens if, for whatever reason, a ^Q sent by said printer or
plotter is lost?  Is the idea that you detach the printer from the
session, attach the session to a regular terminal, and type ^Q at it?

stevea@i88.isc.com (Steve Alexander) (08/28/90)

In article <3954@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>Yet it is worthless for user purposes if you can't make it infinite.  Why
>>can't you change this number?
>
>Ask AT&T, it was their idea.  I think I tried to sell them on having an
>"ioctl" to change it at one point.

System V Release 4.0 has the I_SETCLTIME ioctl, which allows one to change
the close wait time on the stream.  I believe that the time is specified
in milliseconds.  There is also I_GETCLTIME which does what you'd expect.
I guess Guy should move into sales...

--
Steve Alexander, Software Technologies Group    | stevea@i88.isc.com
INTERACTIVE Systems Corporation, Naperville, IL | ...!{sun,ico}!laidbak!stevea

les@chinet.chi.il.us (Leslie Mikesell) (08/29/90)

In article <13650@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
>In article <6038@muffin.cme.nist.gov> libes@cme.nist.gov (Don Libes) writes:
>>Why do pty's return EIO instead of 0 upon EOF?

>If they do this, it is clearly wrong and would most likely be due to
>UNIX development now being done, or at least directed, by people who
>don't understand UNIX.  read() should not return -1 upon encountering
>normal EOF on ANY object.

Is this meant to imply that the developers of STREAMS don't understand
unix?  A read on a STREAMS file is documented to return -1 when O_NDELAY
is set and there is no data available (which has unfortunately been
propagated into the tty emulation of at least some network implimentations).
Apparently there is some reason to want to know about zero length
messages.

Les Mikesell
  les@chinet.chi.il.us

les@chinet.chi.il.us (Leslie Mikesell) (08/29/90)

In article <3964@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:

>Even "ttys", i.e.  serial ports, to which, say, a printer or plotter is
>attached?

>What happens if, for whatever reason, a ^Q sent by said printer or
>plotter is lost?  Is the idea that you detach the printer from the
>session, attach the session to a regular terminal, and type ^Q at it?

Most printers will supply a ^Q when powered up, when the lid is closed,
when the on-line button is pressed, etc.   I'd prefer for the computer
to wait for such an occurrance rather than trying to guess when the
paper supply has been replenished.  The real problem is when you have
placed a long distance call to or from a modem on a unix machine and
pick up a ^S from line noise.  I've even seen cases where the device
driver would lock up so that even a kill -9 wouldn't release the
process and there was no way to drop the call without physical access to
the modem.

Les Mikesell
  les@chinet.chi.il.us

thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen) (09/09/90)

In article <6038@muffin.cme.nist.gov> libes@cme.nist.gov (Don Libes) writes:
>Why do pty's return EIO instead of 0 upon EOF?

In my opinion, there is no such thing as an EOF when reading from a
(master) pty. After all, the pty is designed to let a daemon, script
program or similar pretend that it is on the outside of the machine
looking in through a serial interface, receiving exactly the same
bytes as would be passed over a serial line. And, barring wire-cutters
and over-voltage, a serial line is very open-ended and EOF-free.

However, a serial interface may have some control lines, such as DTR
or RTS. These will typically be asserted when the corresponding UNIX
device file is opened; they may be negated when it is closed (but only
if the HUPCLS flag is set, I think). So, apart from the HUPCLS
business, the EIO error on a pty master corresponds to ``DTR not
asserted'', not to EOF.

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk