[comp.unix.wizards] Select

gnu@hoptoad.uucp (John Gilmore) (07/06/89)

I've been thinking of reasons why interactive response is so random
when running a network window system such as NeWS under a Unix kernel.
A slow window system can be tolerated, but one where response time
varies at random from subsecond to 10-second or longer is VERY hard to
get used to.  I have been working to eliminate many factors (running in
physical memory rather than virtual, eliminate other running processes,
etc) but one occurred to me that I can't fix.

NeWS has a central dispatch loop that does a select() on a mess of file
descriptors and handles whatever ones are ready to read or write.  This
works fine if those file descriptors are network sockets or pipes.  The
problem is if they are "normal" disk files; select() always claims that
they're ready to read or write.  Even on local disks, this is a lie,
but with remote files, it's a "damned lie".  A read() on such a file
will hang for an indefinite period, after select() says "sure, you can
read it".  NeWS reads from normal files when loading in images, fonts,
and PostScript code.  While doing so, response time to network sockets
is impacted, since NeWS will hang waiting for disk rather than handle a
network request that arrives first.

My first cut at a proposed fix is:  When a "normal" file is select()ed
for read, only return "ready" when a block has been read ahead into the
buffer cache.  If necessary, start a readahead on that file.  This
should be done regardless of whether the file is local or remote
(rewinding the tape drive causes MY scsi bus to hang for more than a
minute, how about yours!).  Select() for write should only succeed if a
write would accept a block for writebehind without sleeping.

Of course, by the time the process is dispatched, the readahead block or
the block available for writing could be gone; that's the breaks.  Also,
avoiding hangs requires that the application do its I/O in chunks of the
buffer size or less, but that's no problem.
-- 
John Gilmore      {sun,pacbell,uunet,pyramid}!hoptoad!gnu      gnu@toad.com
      "And if there's danger don't you try to overlook it,
       Because you knew the job was dangerous when you took it"

news@bbn.com (News system owner ID) (06/07/90)

I'm observing some strange behavior when using a select() call on 
a Unix stream socket. Simply put, a select() with the fd set in the 
read file descriptor mask returns 0 (indicating that the file
descriptor is not readable), but a subsequent recv() on that fd
returns data. The following is a map of the behavior 
From: cbrooks@bbn.com (Charles L. Brooks)
Path: bbn.com!cbrooks

Client				Server

connect->				
				->accept

/* initial connection phase */
send->
				->recv
				<-send
recv<-

/* message reading loop */
/* here's where the select() returns 0 */

*** select 			<-send ?? // first message 

/* but this recv() returns data! 

recv<- // first message


Other points of interest: 

1) the socket in the client is set for non-blocking i/o. 
2) once the recv of the first message is completed, select()
(seems to) operate correctly
3) program run on a SparcStation1, Sun OS 4.0.3

Any hints, suggestions, references?


Charlie Brooks 20/636 x3589

leight@mozartamd.com (Timothy Leight) (06/12/90)

Can anyone out there give me some hints as to what
is causing this problem:

In attempting to read from a socket, I first use a select(2),
to wait on data to arrive on the socket. The select(2) call 
returns with errno=0 and the correct discriptor
value. After checking everything, I then read(2) from
the socket descriptor. In checking the number of characters read 
from the socket, I find the number of characters in the buffer 
is zero and errno after the read(2) is zero.

This happens about 1% of the time this section of code is
executed. The rest of the time the code works as I 
expect it to, returning a non-zero number of characters in
a buffer.

Any hints at all about what could be happening whould  be
appreciated.


Thanks in advance

Tim Leight