[net.unix-wizards] questions about select and non-blocking I/O

steve@alberta.UUCP (Steve Sutphen) (07/14/85)

I hope I don't loose my wizards wings over this one. I have been
looking at the rather inelegant code for /etc/telnetd (the same
arguments apply to rlogind and likely the user-interfaces) and have a
couple of questions about `select' (there are several bugs in the
server which I am fixing and they will be posted soon). Here is the
skeleton of the relevant code - I have interspersed my questions in
the code preceeded by a > in column 1.

telnet(net, pty)
{

	ioctl(net, FIONBIO, &on);	/* set non-blocking I/O */
	ioctl(pty, FIONBIO, &on);

	for (;;) {
		int ibits = 0, obits = 0;
		register int c;

		/*
		 * Never look for input if there's still
		 * stuff in the corresponding output buffer
		 */
		if (net_queue_not_empty())
			obits |= (1 << net);
		else
			ibits |= (1 << pty);
		if (pty_queue_not_empty())
			obits |= (1 << pty);
		else
			ibits |= (1 << net);
		if (ncc < 0 && pcc < 0)
			break;
		select(16, &ibits, &obits, 0, 0);
		if (ibits == 0 && obits == 0) {
			sleep(5);
			continue;
		}
> This `sleep' seems really silly - select is perfectly capable of
> waiting for one of the events to come ready. Since none of the
> conditions will change before we get here again I would think that a
> select call with a "wait_until_something_is_ready" (i.e. &NULL) would
> have much less overhead (one would have to check the return code for
> interrrupted system call). 

		/* Something to read from the network...  */
		if (ibits & (1 << net)) {
			ncc = read(net, netibuf, BUFSIZ);
			if (ncc < 0 && errno == EWOULDBLOCK)
				ncc = 0;
> doesn't anyone trust `select' to tell the truth? I don't see why we
> need to check for EWOULDBLOCK here since select told us we could do
> the read. Or is there a race condition lurking about.
			else {
				if (ncc <= 0)
					break;
				netip = netibuf;
			}
		}

		/* Something to read from the pty...  */
		if (ibits & (1 << pty)) {
			. . .  (like above) . . .
		}

		if (pcc > 0)
			copy_pty_chars_to_pty_queue();
		if ((obits & (1 << f)) && net_queue_not_empty())
			netflush();	/* write to network */
> Since netflush can handle the EWOULDBLOCK condition on the write I
> am not sure that we really need to mess with the select checking of
> `obits'. It would seem like less overhead to just try the output here,
> otherwise the essence of the program flow is:
> 	ask select if we can read
> 	read
> 	ask select if we can write
> 	write
> if we trust non-blocking I/O (which was set on!!) then we should be
> able to just go ahead and write. Either that or else we didn't need to
> set the non-blocking attribute (and check it!!).

		if (ncc > 0)
			telrcv();	/* copy net chars to net_queue */
		if ((obits & (1 << p)) && pty_queue_not_empty())
			ptyflush();	/* write to pty */
	}
	cleanup();
}

So what is the verdict? is this code as bad as it looks or have I
overlooked the obvious and get my wings clipped.

	ihnp4!alberta!steve

chris@umcp-cs.UUCP (Chris Torek) (07/16/85)

Well, *because* it doesn't check the return value of select it
needs to be careful in the presence of signals.  However I always
thought the telnet code was a bit silly myself.  Besides, the only
signals telnetd should ever get should just kill it off anyway....

> This `sleep' seems really silly - select is perfectly capable of
> waiting for one of the events to come ready. Since none of the
> conditions will change before we get here again I would think that a
> select call with a "wait_until_something_is_ready" (i.e. &NULL) would
> have much less overhead (one would have to check the return code for
> interrrupted system call). 

Not &NULL, rather (struct timeval *)NULL, but yes....

> doesn't anyone trust `select' to tell the truth?

I guess not.  Also remember that select does what is arguably the
wrong thing in the presence of mismatched process groups (claims
the read won't block when it will actually generate a SIGTTIN).
Again this should never happen to telnetd anyway.

Chris
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

mrose@UDel-Dewey.ARPA (Marshall Rose) (07/16/85)

    Actually, things are worse off than you think.  If you set
    non-blocking I/O, then select() ALWAYS returns immediately saying
    that a descriptor can be written.  I found this out the hard way:

	 I had a telnet connection which I suspended, even though the
	 remote host was generating output to the network at the time.
	 As soon as all the buffers (network, pty, etc.) had filled up
	 so that the write() from telnetd actually returned
	 EWOULDBLOCK, then telnetd ate the machine alive in a very
	 tight loop: do select() on writing to the network, select()
	 IMMEDIATELY returns, do write() on network, write() fails with
	 EWOULDBLOCK, loop back to select...

    I fixed the problem by removing the two FIONBIO ioctl()s in
    telnetd.  Now it works "right".

    Moral of the story: use select() or ioctl(,FIONBIO,), not both for
    descriptors you want to write().

/mtr