gnu@hoptoad.uucp (John Gilmore) (07/06/89)
I've been thinking of reasons why interactive response is so random when running a network window system such as NeWS under a Unix kernel. A slow window system can be tolerated, but one where response time varies at random from subsecond to 10-second or longer is VERY hard to get used to. I have been working to eliminate many factors (running in physical memory rather than virtual, eliminate other running processes, etc) but one occurred to me that I can't fix. NeWS has a central dispatch loop that does a select() on a mess of file descriptors and handles whatever ones are ready to read or write. This works fine if those file descriptors are network sockets or pipes. The problem is if they are "normal" disk files; select() always claims that they're ready to read or write. Even on local disks, this is a lie, but with remote files, it's a "damned lie". A read() on such a file will hang for an indefinite period, after select() says "sure, you can read it". NeWS reads from normal files when loading in images, fonts, and PostScript code. While doing so, response time to network sockets is impacted, since NeWS will hang waiting for disk rather than handle a network request that arrives first. My first cut at a proposed fix is: When a "normal" file is select()ed for read, only return "ready" when a block has been read ahead into the buffer cache. If necessary, start a readahead on that file. This should be done regardless of whether the file is local or remote (rewinding the tape drive causes MY scsi bus to hang for more than a minute, how about yours!). Select() for write should only succeed if a write would accept a block for writebehind without sleeping. Of course, by the time the process is dispatched, the readahead block or the block available for writing could be gone; that's the breaks. Also, avoiding hangs requires that the application do its I/O in chunks of the buffer size or less, but that's no problem. -- John Gilmore {sun,pacbell,uunet,pyramid}!hoptoad!gnu gnu@toad.com "And if there's danger don't you try to overlook it, Because you knew the job was dangerous when you took it"
news@bbn.com (News system owner ID) (06/07/90)
I'm observing some strange behavior when using a select() call on a Unix stream socket. Simply put, a select() with the fd set in the read file descriptor mask returns 0 (indicating that the file descriptor is not readable), but a subsequent recv() on that fd returns data. The following is a map of the behavior From: cbrooks@bbn.com (Charles L. Brooks) Path: bbn.com!cbrooks Client Server connect-> ->accept /* initial connection phase */ send-> ->recv <-send recv<- /* message reading loop */ /* here's where the select() returns 0 */ *** select <-send ?? // first message /* but this recv() returns data! recv<- // first message Other points of interest: 1) the socket in the client is set for non-blocking i/o. 2) once the recv of the first message is completed, select() (seems to) operate correctly 3) program run on a SparcStation1, Sun OS 4.0.3 Any hints, suggestions, references? Charlie Brooks 20/636 x3589
leight@mozartamd.com (Timothy Leight) (06/12/90)
Can anyone out there give me some hints as to what is causing this problem: In attempting to read from a socket, I first use a select(2), to wait on data to arrive on the socket. The select(2) call returns with errno=0 and the correct discriptor value. After checking everything, I then read(2) from the socket descriptor. In checking the number of characters read from the socket, I find the number of characters in the buffer is zero and errno after the read(2) is zero. This happens about 1% of the time this section of code is executed. The rest of the time the code works as I expect it to, returning a non-zero number of characters in a buffer. Any hints at all about what could be happening whould be appreciated. Thanks in advance Tim Leight