[net.bugs.4bsd] Serious telnetd/rlogin/telnet performance bug

stuart@quest.UUCP (Stuart Levy ) (04/15/86)

(This message was sent a few weeks back, but probably got lost.)


I think the main problem with telnetd is not inefficiencies
in the kernel interface, pty code, and so on.
Any telnetd which uses select() and non-blocking I/O, as 4.2 and presumably
4.1c telnetd do, faces a serious performance problem.
So also do rlogin and user telnet on at least 4.2.
So, for that matter, will any program which uses select() to determine
when to attempt non-blocking write()'s.

To demonstrate, run a program that generates a fair amount of output
(the prototype was od /vmunix), XOFF your user telnet terminal
and watch telnetd's CPU time increase.

Select() indicates that a file descriptor can be written to if ANY
buffer space is available; but a write() with the
non-blocking mode set will return EWOULDBLOCK unless there is enough room
for the entire write buffer.   So, if the user telnet gets behind
the daemon, telnetd goes into a full-speed select()/write()/select()...
loop, being told by select that it can write and by write that it can't.

Several solutions are possible.  Here's one.

Change the code surrounding the select() call from something like

	/* If net output buffer nonempty, or pseudo-tty input buffer nonempty,
	 * we should write to net; otherwise read from pty.
	 */
	if(nfrontp - nbackp || pcc > 0)
		obits |= (1 << net);
	else
		ibits |= (1 << pty);
	/* Likewise in the other direction.
	 */
	if(pfrontp - pbackp || ncc > 0)
		obits |= (1 << pty);
	else
		ibits |= (1 << net);

	... select(16, &ibits, &obits, (int *)0, (struct timeval *)0);

	...
	if(obits & (1 << net) && nfrontp - nbackp)
		netflush();
	if(obits & (1 << pty) && pfrontp - pbackp)
		ptyflush();

The existing code is something like that, with variable names changed a bit
The variable names vary a bit from one program (telnetd, telnet, rlogin)
to another.  Anyway, change it to:

	static struct timeval awhile = { 1, 0 };	/* 1-second limit */
	register struct timeval *timelimit;

	timelimit = (struct timeval *) 0;

	/* Timeout is infinite unless we're unable to select() on input
	 * from some direction.  In that case, limit the maximum delay,
	 * retrying blocked write()'s when the time is up.
	 * Don't try to select on output, as it's unreliable.
	 *
	 * Always look for input unless the -input- buffer is clogged
	 * (presumably because the matching output buffer is clogged too).
	 */
	timelimit = (struct timeval *) 0;

	if(pcc >= 0)
		timelimit = &awhile;	/* Can't take input, don't wait forever */
	else
		ibits |= (1 << pty);
	if(ncc > 0)
		timelimit = &awhile;
	else
		ibits |= (1 << net);

	... select(16, &ibits, (int *)0, (int *)0, timelimit);

	/* Don't worry about obits, just try writing.  If it fails
	 * due to blocking, we'll retry later.
	 */
	if(nfrontp - nbackp)
		netflush();
	if(pfrontp - pbackp)
		ptyflush();

This arrangement also halves the number of select()s that need to be done
to transfer data -- the original required one select per read and one per
write, while this arrangement normally gets away with one select per
transferred packet.


The REAL way to fix this problem would be to redefine non-blocking
I/O, I think, to make it possible to indicate a partial read or write
accompanied by an error, such as EWOULDBLOCK or EINTR.
Then the problematic case of a terminal read interrupted by a signal,
where some data had already been transferred, and of this non-blocking
write where some but not all data could be transferred,
could be handled neatly.

				Stuart Levy
				UUCP: ...!ihnp4!stolaf!umn-cs!quest!stuart
				ARPA: <slevy@umn-rei-uc>
				AT&T: (612) 638-0502

thomson@uthub.UUCP (Brian Thomson) (04/17/86)

From Stuart Levy:
> Select() indicates that a file descriptor can be written to if ANY
> buffer space is available; but a write() with the
> non-blocking mode set will return EWOULDBLOCK unless there is enough room
> for the entire write buffer.  
[...]
> The REAL way to fix this problem would be to redefine non-blocking
> I/O, I think, to make it possible to indicate a partial read or write
> accompanied by an error, such as EWOULDBLOCK or EINTR.

I submitted such a fix in December 1983.  It has, I believe, been adopted
in 4.3; in any case, we have been running happily with partial reads/writes
for over three years.  Check your Mt. Xinu buglist, or send me a note if
you want a copy of the fix.
I also have a performance fix for pseudo-ttys that originally went out
in June 1984.  You may want that, too.
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!uthub!thomson

steveh@hammer.UUCP (Stephen Hemminger) (04/18/86)

In article <380@quest.UUCP> stuart@quest.UUCP (Stuart Levy ) writes:
>
>I think the main problem with telnetd is not inefficiencies
>in the kernel interface, pty code, and so on.
>Any telnetd which uses select() and non-blocking I/O, as 4.2 and presumably
>4.1c telnetd do, faces a serious performance problem.
>So also do rlogin and user telnet on at least 4.2.
>So, for that matter, will any program which uses select() to determine
>when to attempt non-blocking write()'s.
>
>The REAL way to fix this problem would be to redefine non-blocking
>I/O, I think, to make it possible to indicate a partial read or write
>accompanied by an error, such as EWOULDBLOCK or EINTR.
>Then the problematic case of a terminal read interrupted by a signal,
>where some data had already been transferred, and of this non-blocking
>write where some but not all data could be transferred,
>could be handled neatly.

If you make the simple change (done from 4.2 to 4.3) all these problems
go away.

In <sys/socketvar.h>
Wrong (4.2)
-------
/* do we have to send all at once on a socket? */
#define	sosendallatonce(so) \
    (((so)->so_state & SS_NBIO) || ((so)->so_proto->pr_flags & PR_ATOMIC))
-------
Correct (4.3)
-------
/* do we have to send all at once on a socket? */
#define	sosendallatonce(so) \
    ((so)->so_proto->pr_flags & PR_ATOMIC)
------


Fix this and rebuild your kernel and this problem goes away.

write on a non-blocking socket will do a partial write (unless it is atomic)
-- 
  Stephen Hemminger		{ihnp4,decvax,ucbvax}!tektronix!hammer!steveh
  Tektronix GWD Networking	(503)685-2103