chris@mimsy.UUCP (Chris Torek) (09/25/88)
[moved from comp.os.vms] In article <3951@psuvax1.cs.psu.edu> schwartz@shire (Scott Schwartz) writes: >I've seen unix programs (things like a grep replacement) that got >similar speedups by replacing stdio calls with read/write and a large >buffer. I wonder how much of that 2-6x is from overhead in stdio, >rather than in the filesystem. Quite a bit. The overhead is not all in stdio itself, but in inefficient expansions of stdio macros: > while ((c = getchar()) != EOF) > putchar(c); If you are not going to inspect each character, it seems a bit silly to bring them in and put them out one at a time, but let us take a look at what the traditional dumb Unix C compiler generates for this: jmp while_test loop: <<code for putchar>> while_test: | while ((c = getchar()) != EOF) sub $1,stdin_cnt | --cnt >= 0 ? jls refill mov stdin_ptr,r1 | *(unsigned char *)p++ clr r0 movb @r1+,r0 mov r1,stdin_ptr jmp merge refill: push stdin | : _filbuf(&stdin) call _filbuf tst @sp+ | (some sort of pop) merge: mov r0,C(fp) | c = ... cmp r0,$-1 | c == EOF? jne loop | no, loop <<code to exit>> Now, stdio reads in blocks, typically 512 to 65536 characters at a time, so the most common path through this code is sub $1,stdin_cnt; no_jmp; mov stdin_ptr,r1 | 3 clr r0; movb @r1+,r0; mov r1,stdin_ptr | 3 jmp merge; merge: | 1 mov r0,C(fp); cmp r0,$-1; jmp loop | 3 = 10 instrs Most of this is unnecessary! If we put stdin_ptr in a register in advance, and note that r0 (and hence C) can never be -1 since it is zero-extended from a byte, we should see something like sub $1,stdin_cnt; no_jmp | 2 clr r0; movb @p+,r0 | 2 mov r0,C(fp); jmp loop | 2 = 6 instrs The code to call `filbuf' has to save this extra pointer, but that is no big deal. If BUFSIZ is 1024, we saved 4*1023=4092 instructions at the expense of one instruction when calling filbuf, for a total savings of 4091 instructions per read! We might save `only' ~2000 instructions if stdin_ptr must be left global (which is true if the loop is nontrivial). What about putchar? It is even worse: | putchar(C(fp)) sub $1,stdout_cnt | --cnt >= 0 ? jls flush mov stdout_ptr,r0 | *p = c, mov C(fp),@r0 tstbit $7,stdout_flg | flag&IS_LINE_BUFFERED jz no_nl_flush cmp @r0,$10 | && *p == '\n' ? jne no_nl_flush push $10 | _flsbuf(stdout, '\n') jmp fls0 | (merge) no_nl_flush: clr r1 | : *(unsigned char *)p++ movb @r0+,r1 mov r0,stout_ptr mov r1,r0 jmp merge flush: push C(fp) | : _flsbuf(stdout, c) fls0: push stdout call _flsbuf cmp @sp+,@sp+ merge: The most common case is, again, that the buffer does *not* fill. Unfortunately, this case still has to test for line buffered output, and the flag can (and in fact does) change after calls to _flsbuf, so the test cannot easily be hoisted out of the loop (it can be done, by splitting and cross-branching in the flush code, but this is not often a good optimisation: this is a peculiar case). Eliminating line-buffered output descriptors shrinks putchar() considerably: | p is `unsigned char *' | --cnt >= 0 ? *p++ = c : _flsbuf(stdout, c) sub $1,stdout_cnt jls flush mov stdout_ptr mov C(fp),@r1 clr r0 mov @r1+,r0 mov r1,stdout_ptr jmp merge flush: push C(fp) push stdout call _flsbuf cmp @sp+,@sp+ merge: The copy-a-buffer-at-a-time loop looks like this: jmp while_tst loop: push n | write(1, buf, n) push buf push 1 call write add $6,sp | (pop) while_tst: push size | n = read(0, buf, size) push buf push 0 call read add $6,sp mov r0,n tst r0 | while (... > 0) jgt loop which has *no* per-character overhead, compared with a minimum 6+11 (getchar+putchar) instruction per-character overhead for stdio. When the buffer size is large (e.g., 1024), that is 17000 instructions of overhead per read and write call. read and write are each thousands of instructions, but they no longer swamp the overhead in stdio. In programs that do real work with each character, at least some of the stdio `overhead' is in fact necessary, so you lose less. Using fread and fwrite can help, if they are intelligently written. (They are NOT so written in old BSD releases.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
edward@ucbarpa.Berkeley.EDU (Edward Wang) (10/04/88)
Chris Torek writes: >What about putchar? It is even worse: > > | putchar(C(fp)) > sub $1,stdout_cnt | --cnt >= 0 ? > jls flush > mov stdout_ptr,r0 | *p = c, > mov C(fp),@r0 > tstbit $7,stdout_flg | flag&IS_LINE_BUFFERED > jz no_nl_flush > cmp @r0,$10 | && *p == '\n' ? > jne no_nl_flush > push $10 | _flsbuf(stdout, '\n') > jmp fls0 | (merge) > no_nl_flush: > . . . I should point out that this is not 4.3 putc(). The real 4.3 putc looks like this #define putc(x, p) (--(p)->_cnt >= 0 ?\ (int)(*(unsigned char *)(p)->_ptr++ = (x)) :\ (((p)->_flag & _IOLBF) && -(p)->_cnt < (p)->_bufsiz ?\ ((*(p)->_ptr = (x)) != '\n' ?\ (int)(*(unsigned char *)(p)->_ptr++) :\ _flsbuf(*(unsigned char *)(p)->_ptr, p)) :\ _flsbuf((unsigned char)(x), p))) The block buffered path is the second line and does not check the _IOLBF bit. It is, in fact, exactly as fast as the old putc. Of course this doesn't mean the C compiler generates good code for it. Peter da Silva writes: >It always depresses me when I think that people are still doing line >buffered output. The problem of handling stdout and stdin is a solved >problem: do a flushbuf on all interactive streams whenever you do a >fillbuf on any interactive stream. . . The reason for implicit line buffering is to make certain programs more responsive when writing to a tty while still run at full speed when output is redirected. These are not necessarily interactive programs. For example, "last -10 foo" isn't interactive (the flsbuf when filbuf fix won't work on it), but you do want it to give you output line-by-line in case foo logs in only infrequently. I will go as far as to say that standard output to pipes should be line buffered, since we no longer put pipes in files.
chris@mimsy.UUCP (Chris Torek) (10/04/88)
In article <26315@ucbvax.BERKELEY.EDU> edward@ucbarpa.Berkeley.EDU (Edward Wang) writes: >I should point out that [Chris Torek's expansion] is not 4.3 putc(). >The real 4.3 putc looks like this > >#define putc(x, p) (--(p)->_cnt >= 0 ?\ > (int)(*(unsigned char *)(p)->_ptr++ = (x)) :\ > (((p)->_flag & _IOLBF) && -(p)->_cnt < (p)->_bufsiz ?\ [rest deleted] >The block buffered path is the second line and does not >check the _IOLBF bit. It is, in fact, exactly as fast >as the old putc. Of course this doesn't mean the C compiler >generates good code for it. And, also of course, it does not mean that the code is right! 4.3BSD putc and printf (actually _doprnt) do not speak to each other. _doprnt assumes that a negative _cnt field is an error, and clobbers the count. Every once in a while, this results in putc scribbling all over memory. Fortunately, instances of this bug are rare (it requires mixing calls to printf and calls to putc in a particular way). This is fixed in 4.3BSD-tahoe, which no longer has _doprnt in Vax assembly. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
peter@ficc.uu.net (value) (10/05/88)
In article <26315@ucbvax.BERKELEY.EDU>, edward@ucbarpa.Berkeley.EDU (Edward Wang) writes: > Peter da Silva writes: > >It always depresses me when I think that people are still doing line > >buffered output. The problem of handling stdout and stdin is a solved > >problem: do a flushbuf on all interactive streams whenever you do a > >fillbuf on any interactive stream. . . > The reason for implicit line buffering is to make certain programs > more responsive when writing to a tty while still run at full speed > when output is redirected. A laudable goal. I'm not so sure that making the output line-buffered is the way to do it. After all you may want to make them run at full speed to a terminal device. And certainly interactive programs want to be able to do this... Really the best thing to do would be to have seperate flags for line buffering and interactive buffering. I'm more worried about block buffering on pipes on input, a long-standing practice that greatly reduces the effectiveness of shared file descriptors. Oh well... stdio certainly has its warts. -- Peter da Silva `-_-' Ferranti International Controls Corporation. "Have you hugged U your wolf today?" peter@ficc.uu.net
edward@ucbarpa.Berkeley.EDU (Edward Wang) (10/05/88)
In article <13853@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >. . . _doprnt >assumes that a negative _cnt field is an error, and clobbers the >count. Every once in a while, this results in putc scribbling all over >memory. Fortunately, instances of this bug are rare (it requires >mixing calls to printf and calls to putc in a particular way). This is >fixed in 4.3BSD-tahoe, which no longer has _doprnt in Vax assembly. Well I don't know about this, but the negative _cnt field is as old as line buffering, before the fast putc. The old putc depended on the negative _cnt to invoked _flsbuf for line buffered streams (the --fp->_cnt >= 0 test always fails so _flsbuf is always called). But enough on this already.
andrew@alice.UUCP (Andrew Hume) (10/08/88)
to my way of thinking the bufering problem is not solved; the best youcan say is that we recognise several different situations that seem to need different kind of buffering solutions. the solution fio (fast i/o) supports best is that all I/O is buffered and if you want unbuffered i/o, use the unbuffered routines (say, fprint(2, instead of Fprint(2 ) which do one system call. There is one extra hook; Ftie(ifd, ofd) has the menaing that if fio wants to do a read() system call on ifd, it will flush any output associated with ofd.
jc@minya.UUCP (John Chambers) (10/11/88)
> I will go as far as to say that standard output to pipes should be > line buffered, since we no longer put pipes in files. And I will go as far as to say that, in my experience, this is one of the biggest pains in the [expletive deleted] when you want to get your software to work reliably. I've long since lost count of how many times I've wasted time trying to solve the problem that program A has written to a pipe (or socket), program B has done a read on the pipe, and both are sitting there getting no cpu time. True, there are always ways to get it to work right. But the ways are different on every release of Unix, and so far, when I ask how to do it right and portably, all I get are insults from supposed experts (who I suspect are just trying to cover up the fact that they don't know the answer, either). Perhaps with POSIX, there will be a way for B to say "wake me up when there is input on a file (pipe or disk or stream or socket or ...), and give me whatever is there". On the other hand, I can't quite tell from the manual just how to do it *exactly* right, so maybe they won't help either, and I'll have to add a new #ifdef for every vendor's POSIX-compliant system. OK, maybe I'm just a stupid idiot, not to understand something so trivial. But I suspect I'm not the only one. BTW, on every Unix system I've ever used, a pipe is a file. What could the stuff after the comma in the above quote possibly mean? Is there a new implementation of pipes that doesn't use file descriptors? I sure hope not; that sure would be a giant step backwards. -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393) [Any errors in the above are due to failures in the logic of the keyboard, not in the fingers that did the typing.]
domo@riddle.UUCP (Dominic Dunlop) (10/11/88)
Those interested in the speed of stdio may care to follow up this reference: Grep Wars Andrew Hume AT&T Bell Laboratories, Murray Hill, New Jersey 07974 Published in proceedings of European UNIX Systems User Group spring 1988 conference, ISBN 0 9513181 0 1 (available at pounds 20 + pounds 6.08 surface mail or pounds 22.63 airmail from EUUG, Owles Hall, Buntingford, Herts SG9 9PL, England; phone +44 763 73039 -- Visa cards accepted) Abstract: Subsequent to the sixth edition of the UNIX operating system there have been different versions of the searching tool grep using different algorithms tuned for different types of search patterns. Friendly competition between the tools has yielded a succession of performance enhancements. We describe the latest round of improvements, based on the fio fast I/O library and incorporating the Boyer-Moore algorithm. Although grep is now 3-4 times faster than it was, egrep is typically 8-10 (for some common patterns 30-40) times faster than the new grep. I believe that a similar paper has been presented at Usenix, possibly at the summer 1987 conference, but I'm afraid I don't have the proceedings to hand. At the EUUG conference, Andrew said (as I recall) that he'd like to see fio hit the public domain, but that whether it did or not was a matter which was out of his hands. It was also not clear whether fio would ever be incorporated into a licensed release of the UNIX operating system, either as a set of library functions, or even merely as a part of [ef]*grep. -- Dominic Dunlop domo@sphinx.co.uk domo@riddle.uucp
chris@mimsy.UUCP (Chris Torek) (10/17/88)
In article <104@minya.UUCP> jc@minya.UUCP (John Chambers) writes: >... I've long since lost count of how many times >I've wasted time trying to solve the problem that program A has written >to a pipe (or socket), program B has done a read on the pipe, and both >are sitting there getting no cpu time. True, there are always ways to >get it to work right. But the ways are different on every release of >Unix, and so far, when I ask how to do it right and portably, all I get >are insults from supposed experts (who I suspect are just trying to cover >up the fact that they don't know the answer, either). Use fflush() in the writing program (since you are using stdio [see Subject]). If you do not have control over the writing program, you are out of luck. Many standard programs never bother to call fflush since they know that the output is obviously line buffered since it is *of course* going to a terminal. Be aware that if you use buffered input in the reading program, it may make a second call to read() on the pipe or socket descriptor after first consuming all the available data, and then go back to sleep waiting for more data. You can disable this portably and incredibly inefficiently by setting the input FILE to unbuffered. [Happy? :-) ] >BTW, on every Unix system I've ever used, a pipe is a file. ... Is there a >new implementation of pipes that doesn't use file descriptors? I sure >hope not; that sure would be a giant step backwards. In 4.2BSD and 4.3BSD, and systems based upon them, pipes are sockets rather than files, but sockets are named by file descriptors just like files. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
allbery@ncoast.UUCP (Brandon S. Allbery) (10/18/88)
As quoted from <104@minya.UUCP> by jc@minya.UUCP (John Chambers): +--------------- | > I will go as far as to say that standard output to pipes should be | > line buffered, since we no longer put pipes in files. | | BTW, on every Unix system I've ever used, a pipe is a file. What could | the stuff after the comma in the above quote possibly mean? Is there a | new implementation of pipes that doesn't use file descriptors? I sure | hope not; that sure would be a giant step backwards. +--------------- He means that on V7-based systems (including SVR3), pipes are essentially implemented in the kernel in a way that's almost identical to: p[1] = creat(some_temp_file, 0); p[1] = open(some_temp_file, 0); unlink(some_temp_file); whereas under BSD pipes are linked socketpairs and under SVR4 they're supposed to be a variant of the "clone" device (essentially, the STREAMS version of sockets) and are implemented quite differently. ++Brandon -- Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X uunet!hal.cwru.edu!ncoast!allbery <PREFERRED!> ncoast!allbery@hal.cwru.edu allbery@skybridge.sdi.cwru.edu <ALSO> allbery@uunet.uu.net comp.sources.misc is moving off ncoast -- please do NOT send submissions direct (But the aliases are NOT on UUNET yet, use the aliases at backbone sites!)
allbery@ncoast.UUCP (Brandon S. Allbery) (10/18/88)
As quoted from <12837@ncoast.UUCP> by allbery@ncoast.UUCP (Brandon S. Allbery): +--------------- | p[1] = creat(some_temp_file, 0); | p[1] = open(some_temp_file, 0); +--------------- Urk. Anyone want a slightly-used brain with flaky neurons? :-( The second "p[1]" should of course be "p[0]". AAAAARRRRRRRRRGGGHHHHHHHHHHH!!!!!!!!! (I shoulda stood in bed!) ++Brandon -- Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X uunet!hal.cwru.edu!ncoast!allbery <PREFERRED!> ncoast!allbery@hal.cwru.edu allbery@skybridge.sdi.cwru.edu <ALSO> allbery@uunet.uu.net comp.sources.misc is moving off ncoast -- please do NOT send submissions direct (But the aliases are NOT on UUNET yet, use the aliases at backbone sites!)