[comp.unix.internals] problems with TCSETAF and rlogin

andy@xwkg.Icom.Com (Andrew H. Marrinson) (11/03/90)

Hello,

I originally found this problem using nn when logged in via rlogin
from one Interactive Unix 2.0.2 box to another.  The symptom was
missing output in the nn menu.

Further investigation revealed that this was due to a bug which was
exercised whenever nn (or any other program) used TCSETAF under
rlogin.  The manual page for TCSETAF states that it ``waits for output
to drain, then flushes the input, then sets the parameters''.
However, when using rlogin it seems that the output gets flushed as
well.

In further communication with other nn users, this bug has been observed
on several systems that combine System V or POSIX style termio and BSD
style networking.  Anybody who is (or knows someone who is)
maintaining such a system should definitely look into this to see
whether it affects your kernel.

Essentially, what happens is, a program outputs some data to a
pseudo-tty.  Rlogind reads that data and sends it to TCP, which begins
assembling it into a packet.  Then the program does a TCSETAF.
Evidently, this results in a pseudo-tty control packet containing
TIOCPKT_FLUSHREAD|TIOCPKT_FLUSHWRITE (03H on most systems).  (This
would seem to be the bug right here -- it appears to be in the pty
driver.)  Rlogind sends that using the MSG_OOB flag to send(2).  The
03H gets appended to the packet being constructed from the previous
(normal) output and the urgent pointer is pointed at it.  The packet
then gets sent to the client rlogin process looking like this:

	<TCP HEADER><NORMAL OUTPUT FROM NN><03H>
                                            ^
	URGENT POINTER POINTS HERE----------+

The receiving rlogin client then either ignores the normal output
because of the urgent pointer or flushes it because that's what the
03H says to do (I can't remember exactly how OOB data works) either
way output disappears that shouldn't have.

As I mentioned above, I believe that the bug lies in the pty driver,
which should not flush the output in this situation.  What it should
do is open to conjecture, I'm not sure there is anyway to match
exactly the semantics of TCSETAF using the pty packet protocol, but
what it does now clearly loses big.

I urge everyone maintaining a system with the combination of BSD
pseudo-ttys and System V/POSIX termio(s) to check their implementation
for this bug.  Below is a short test program that can be used to do
this.  It prints some data, does a TCSETAF, then waits for a keypress.
Because TCP may sometimes have already sent the packet containing the
data when the flush is received it doesn't happen everytime, so run
the program several times.  If even once it waits for a keypress
without outputting anything you have the bug!

If you want more information on this, please don't hesitate to email
(andy@icom.icom.com) and I'll try to help you out.

BEGIN TEST PROGRAM
#include <stdio.h>
#include <termio.h>

main ()

{
  static struct termio buf;
  static char string[] = "Andy was here.  Let's make this really long\n\
and put a lot of separate lines in it.  This will more or less\n\
simulate what nn is doing when it screws up...\n";

  ioctl (0, TCGETA, &buf);
  write (1, string, sizeof (string));
  ioctl (0, TCSETAF, &buf);
  getchar ();
}
END TEST PROGRAM
--
		Andrew H. Marrinson
		Icom Systems, Inc.
		Wheeling, IL, USA
		(andy@icom.icom.com)

dougm@ico.isc.com (Doug McCallum) (11/04/90)

In article <andy.657571549@xwkg> andy@xwkg.Icom.Com (Andrew H. Marrinson) writes:
...
>Further investigation revealed that this was due to a bug which was
>exercised whenever nn (or any other program) used TCSETAF under
>rlogin.  The manual page for TCSETAF states that it ``waits for output
>to drain, then flushes the input, then sets the parameters''.
>However, when using rlogin it seems that the output gets flushed as
>well.
...
>I urge everyone maintaining a system with the combination of BSD
>pseudo-ttys and System V/POSIX termio(s) to check their implementation
>for this bug.  

I did a little investigation and the problem is not in the pty driver
but in the generic tty code.  In the tty.c code for TCSETAF, a test
is done to see if it is a TCSETAF and then it calls ttyflush with both
the read and write flush bits set.  Our pty driver doesn't implement the
TCSETAF directly, it depends on the System V one.  A ttywait is done first,
but that only means that the data has been read by the server dealing with
the pty master side.  The next read will see the flush for both read and
write sides.  The additional steps taken in dealing with the network allow
for the flush to catch up with the data and then it is lost.

I would expect this problem to occur on all System V's regardless of the
origin of the pty driver unless the tty code is fixed.  Because of the
ttywait before the flush, this doesn't show up on a real tty.

Doug McCallum
Interactive Systems Corp
dougm@ico.isc.com