MAH@awiwuw11.wu-wien.ac.at (Michael Haberler) (08/30/90)
I have encountered a strange behaviour of several programs which use select(2) on hp-ux 7.0 on the Series 800. All of these programs are 'ported' BSD code, so I have the suspicion there's something in common: It seems that programs which have select(2) in their inner loop sometimes start using enormous amounts of system cpu time, just as if the select() call would return immediately as if it were polling. Among those programs are Xemacs 18.55, Greg Minshall's tn3270, and named4.8.3. Xemacs tends to do this especially if the X server terminates before emacs. I did'nt find a explanation for named behaviour. With tn3270, it looks like a modem disconnect and thus eof on the tty would cause tn3270 looping. tn3270 seems to spend it's time in select() itself, while named returns to user mode and immediately calls select again. One can see this when attaching the process in question to the debugger (xdp -P <pid> <program>). Since several programs show this behaviour, I suspect it has to do with the way select() is implemented in hpux 7.0. Has anybody else encountered this behaviour? Is this a bug? If so, is there a workaround? - michael
staffan@isy.liu.se (Staffan Bergstrom) (09/03/90)
MAH@awiwuw11.wu-wien.ac.at (Michael Haberler) writes: >I have encountered a strange behaviour of several programs which use >select(2) on hp-ux 7.0 on the Series 800. All of these programs are >'ported' BSD code, so I have the suspicion there's something in common: >It seems that programs which have select(2) in their inner loop sometimes >start using enormous amounts of system cpu time, just as if the select() >call would return immediately as if it were polling. Among those programs >are Xemacs 18.55, Greg Minshall's tn3270, and named4.8.3. - - - >Has anybody else encountered this behaviour? Is this a bug? If so, is there >a workaround? >- michael I have had similar problems but it turned out to be the macros FD_SET, FD_CLR etc, that was the cause of the problem. FD_SET is defined as follows. #define FD_SET(n, p) ((p)->fds_bits[(n)/NFDBITS] |= (1 << ((n) % NFDBITS))) One has to be carful when using them on closed files otherwise it could cause an atempt to do a negative shift. I had a program that worked fine on sun3, sun4 and hp300 (hp-ux 7.0), but did not work at all on when I tried to port it to hp 800, because of this. /Staffan
cph@zurich.ai.mit.edu (Chris Hanson) (09/06/90)
From: MAH@awiwuw11.wu-wien.ac.at (Michael Haberler) Date: 30 Aug 90 15:16:05 GMT I have encountered a strange behaviour of several programs which use select(2) on hp-ux 7.0 on the Series 800. All of these programs are 'ported' BSD code, so I have the suspicion there's something in common: It seems that programs which have select(2) in their inner loop sometimes start using enormous amounts of system cpu time, just as if the select() call would return immediately as if it were polling. Among those programs are Xemacs 18.55, Greg Minshall's tn3270, and named4.8.3. Xemacs tends to do this especially if the X server terminates before emacs. I did'nt find a explanation for named behaviour. With tn3270, it looks like a modem disconnect and thus eof on the tty would cause tn3270 looping. I managed to get emacs into that state last night, and debugged it. What happened was as follows. I normally run several subprocesses under emacs. At the time that the problem occurred, there were two active subprocesses, and two exited subprocesses. Emacs still had all four subprocesses in its tables. Emacs's command reader checks all of the subprocesses periodically for input, using the `select' call on the input file descriptors of the processes, and due to some peculiarities of its design, it was checking all four of the subprocesses, even though two of them no longer existed. This `select' call was returning with a single bit set, which indicated that the input file descriptor from one of the dead subprocesses had some input that could be read. Emacs then dutifully went into a `read' call on that descriptor, which fortunately was set to non-blocking mode, and the `read' call returned saying that of course there was no data. In summary: we have two processes and a pipe from one to the other. The read side of the pipe has been set to non-blocking mode by the use of O_NONBLOCK. The process on the write side of the pipe finishes by calling `exit'. The process on the read side receives SIGCHLD and uses `waitpid' to extract the exit status of the now-dead subprocess. It then does a `select' on the read side of the pipe, which returns indicating that the pipe has some data to be read. The process calls `read' on the pipe, which returns zero indicating no data is available. Etc. Now I'm no expert, but it's my belief that `select' shouldn't indicate that the pipe has input in this situation. For information: this behavior has been observed (by others) when the subprocess is using a PTY to communicate with emacs, although it has not been debugged and thoroughly examined in such a case. PS: Emacs is being changed so that it does not attempt to use `select' on connections to dead processes. Version 18.56 will not have this problem. If anyone is interested in a patch for 18.55, they should contact me directly by e-mail.
eliot@dg-rtp.dg.com (Topher Eliot) (09/07/90)
|> It seems that programs which have select(2) in their inner loop sometimes |> start using enormous amounts of system cpu time, just as if the select() |> call would return immediately as if it were polling. Among those programs |> are Xemacs 18.55, Greg Minshall's tn3270, and named4.8.3. |> |> This `select' call was returning with a single bit set, which |> indicated that the input file descriptor from one of the dead |> subprocesses had some input that could be read. Emacs then dutifully ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |> went into a `read' call on that descriptor, which fortunately was set |> to non-blocking mode, and the `read' call returned saying that of |> course there was no data. |> |> In summary: we have two processes and a pipe from one to the other. |> The read side of the pipe has been set to non-blocking mode by the use |> of O_NONBLOCK. The process on the write side of the pipe finishes by ^^^^^^^^^^ |> calling `exit'. The process on the read side receives SIGCHLD and |> uses `waitpid' to extract the exit status of the now-dead subprocess. |> It then does a `select' on the read side of the pipe, which returns |> indicating that the pipe has some data to be read. The process calls |> `read' on the pipe, which returns zero indicating no data is |> available. Etc. |> |> Now I'm no expert, but it's my belief that `select' shouldn't indicate |> that the pipe has input in this situation. Well, in fact, it isn't. I've bumped into this problem before, in a different context. I can't remember what any of the applicable documentation said, but the bottom line was that the semantics of select are that it will return with a particular bit set if a read on the corresponding file descriptor WILL NOT BLOCK. It is NOT saying that there is data to be read there. In my opinion, in such cases the correct way to handle this is that all reads should be prepared to detect that the descriptor from which they are reading has been closed, or reached end of file, or whatever, and handle that appropriately. Apparantly emacs does not do so in this case. -- Topher Eliot Data General Corporation eliot@dg-rtp.dg.com 62 T. W. Alexander Drive {backbone}!mcnc!rti!dg-rtp!eliot Research Triangle Park, NC 27709 (919) 248-6371 Obviously, I speak for myself, not for DG.
cph@zurich.ai.mit.edu (Chris Hanson) (09/09/90)
From: eliot@dg-rtp.dg.com (Topher Eliot) Date: 6 Sep 90 17:21:44 GMT |> This `select' call was returning with a single bit set, which |> indicated that the input file descriptor from one of the dead |> subprocesses had some input that could be read. Emacs then dutifully ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |> went into a `read' call on that descriptor, which fortunately was set |> to non-blocking mode, and the `read' call returned saying that of |> course there was no data. |> |> In summary: we have two processes and a pipe from one to the other. |> The read side of the pipe has been set to non-blocking mode by the use |> of O_NONBLOCK. The process on the write side of the pipe finishes by ^^^^^^^^^^ |> calling `exit'. The process on the read side receives SIGCHLD and |> uses `waitpid' to extract the exit status of the now-dead subprocess. |> It then does a `select' on the read side of the pipe, which returns |> indicating that the pipe has some data to be read. The process calls |> `read' on the pipe, which returns zero indicating no data is |> available. Etc. |> |> Now I'm no expert, but it's my belief that `select' shouldn't indicate |> that the pipe has input in this situation. Well, in fact, it isn't. I've bumped into this problem before, in a different context. I can't remember what any of the applicable documentation said, but the bottom line was that the semantics of select are that it will return with a particular bit set if a read on the corresponding file descriptor WILL NOT BLOCK. It is NOT saying that there is data to be read there. In my opinion, in such cases the correct way to handle this is that all reads should be prepared to detect that the descriptor from which they are reading has been closed, or reached end of file, or whatever, and handle that appropriately. Apparantly emacs does not do so in this case. Since I posted the original message, I've changed my mind. The only thing I now believe about `select' is that it should return a "readable" bit when there is data to be read from that channel. I have no opinion about what it should do in any other case. The real problem here is that the documentation for `select' doesn't define what it does. The documentation defines the results as "ready for reading, writing, or has an exceptional condition", but fails to say what that means. For example, I spoke to an HP engineer recently who had no idea what an "exceptional condition" is in this context. And I have no idea what it means either -- despite the fact that I'm quite knowledgeable about unix. Please, HP documenters, rewrite this man page so that it is possible for us to know what it means! It's no excuse that every other unix says the same thing. An aside: if it were the case that "ready for reading" meant "a read on this channel will not block", then `select' would always say "readable" for every non-blocking channel. But emacs did a `select' on four channels, all non-blocking, and it indicated only one of them was "readable". So I don't believe this definition is correct. To paraphrase what I said above, I dunno what to believe. In any case, emacs is now fixed so that it doesn't care what `select' says in this case.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (09/09/90)
In article <CPH.90Sep9022935@kleph.ai.mit.edu> cph@zurich.ai.mit.edu (Chris Hanson) writes: > The real problem here is that the documentation for `select' doesn't > define what it does. The documentation defines the results as "ready > for reading, writing, or has an exceptional condition", but fails to > say what that means. That a read or write wouldn't block if the descriptor were blocking. Exceptional conditions are defined by the device. Since passing nonblocking descriptors to application programs is a serious violation of convention, you shouldn't run into problems unless you're creating them. ---Dan
chris@mimsy.umd.edu (Chris Torek) (09/09/90)
In article <CPH.90Sep9022935@kleph.ai.mit.edu> cph@zurich.ai.mit.edu (Chris Hanson) writes: >The real problem here is that the documentation for `select' doesn't >define what it does. ... Well, the most likely reason is that select does not *do* ANYthing, except time out. The `selecting' is all done in lower layers, much like ioctl. There is no way that ioctl(2) can list what ioctl() does, because it really does not do *any*thing. >"... reading, writing, or has an exceptional condition" Of course, the lower layers try to do something sensible. For `read', select is supposed to return true whenever a read() system call would not block, regardless of any `non-blocking' mode on the file descriptor. That is, it should return true when there are data, and it should return true when there is a `boundary condition' like an `EOF' on a tty or a socket. For `write', it is supposed to return true when the lower layer can accept (some) more data without blocking. `Exceptions' are left entirely up to the lower layers, and you have to look at those (or read their manuals, provided that someone bothered to document them properly) to find out which descriptor entities (sockets, ptys, ttys, disks, tapes, ...) actually do something, and what, with `exceptions'. >An aside: if it were the case that "ready for reading" meant "a read >on this channel will not block", then `select' would always say >"readable" for every non-blocking channel. But emacs did a `select' >on four channels, all non-blocking, and it indicated only one of them >was "readable". So I don't believe this definition is correct. No, but it is close---closer than `there are data', at any rate. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
cph@zurich.ai.mit.edu (Chris Hanson) (09/10/90)
In article <26445@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: From: chris@mimsy.umd.edu (Chris Torek) Date: 9 Sep 90 12:53:02 GMT In article <CPH.90Sep9022935@kleph.ai.mit.edu> cph@zurich.ai.mit.edu (Chris Hanson) writes: >The real problem here is that the documentation for `select' doesn't >define what it does. ... Well, the most likely reason is that select does not *do* ANYthing, except time out. The `selecting' is all done in lower layers, much like ioctl. There is no way that ioctl(2) can list what ioctl() does, because it really does not do *any*thing. >"... reading, writing, or has an exceptional condition" Of course, the lower layers try to do something sensible. For `read', select is supposed to return true whenever a read() system call would not block, regardless of any `non-blocking' mode on the file descriptor. That is, it should return true when there are data, and it should return true when there is a `boundary condition' like an `EOF' on a tty or a socket. For `write', it is supposed to return true when the lower layer can accept (some) more data without blocking. `Exceptions' are left entirely up to the lower layers, and you have to look at those (or read their manuals, provided that someone bothered to document them properly) to find out which descriptor entities (sockets, ptys, ttys, disks, tapes, ...) actually do something, and what, with `exceptions'. OK, now I think I understand -- thanks. I guess my complaint is that the man page for `select' could have contained something of what you said in these two paragraphs. Here is a suggestion: why not define the meaning of "readable" and "writable" in the `select' man page, since it seems that most devices will satisfy this definition in the same way. Also have a sentence that says "exceptional conditions" are device-specific, because the current man page doesn't say anything of the kind. Then have specific devices document (in section 7) how their "readable" and "writable" differ from the standard (if at all), and what their "exceptional conditions" are. This would be a great improvement over the current situation, because then it would be possible to understand how this works.
eliot@chutney.rtp.dg.com (Topher Eliot) (09/11/90)
In article <CPH.90Sep9022935@kleph.ai.mit.edu>, cph@zurich.ai.mit.edu (Chris Hanson) writes: |> From: me |> |> I've bumped into this problem before, in a different context. I can't |> remember what any of the applicable documentation said, but the bottom line |> was that the semantics of select are that it will return with a particular |> bit set if a read on the corresponding file descriptor WILL NOT BLOCK. It |> is NOT saying that there is data to be read there. In my opinion, in such |> cases the correct way to handle this is that all reads should be prepared |> to detect that the descriptor from which they are reading has been closed, |> or reached end of file, or whatever, and handle that appropriately. |> Apparantly emacs does not do so in this case. |> |> An aside: if it were the case that "ready for reading" meant "a read |> on this channel will not block", then `select' would always say |> "readable" for every non-blocking channel. But emacs did a `select' |> on four channels, all non-blocking, and it indicated only one of them |> was "readable". So I don't believe this definition is correct. Well, this shows that your kernel is different from the one I dealt with, because with ours, select did indeed say that the descriptor was "readable" all the time. -- Topher Eliot Data General Corporation eliot@dg-rtp.dg.com 62 T. W. Alexander Drive {backbone}!mcnc!rti!dg-rtp!eliot Research Triangle Park, NC 27709 (919) 248-6371 Obviously, I speak for myself, not for DG.