oz@yunexus.UUCP (Ozan Yigit) (04/20/88)
In article <2589@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes: >In article <1988Apr15.170753.867@utzoo.uucp>, Henry Spencer writes: ># [Good stuff deleted... sigh...] ># Also left as an exercise for the reader is finding the bug in Knuth's ># hash-table-search code. (He might possibly have corrected this in the ># reprint; my copy is the original Computing Surveys paper.) The hazards ># of gotos doth make fools of the best of us... >"Left as an exercise" (being implied in re the ways that Knuth's limited >endorsement of goto can supposedly be refuted) sounds more like "I just don't >want to bother showing why this is true, and anyone who doesn't agree is >a lazy dolt." Actually, I think Henry really means coding errors in Knuth's 74 article. [What is the bug Henry ?? It can't be infinite loop problem, as the hash technique he is using requires the number of entries be less than the size of the array.] >Mr. Spencer, put your code where your mouth is. For each goto example in >Knuth, show us how you would code it to run equally efficiently without >gotos. Fair enough? Why are the smileys missing from your last paragraph ?? :-) Have you actually read that article?? (ACM Computing Surveys V.6 #4 pp. 262-301 Dec. 74) Henry makes a good point indeed.. [Skip the Reader's Exercises part...:-)] I think the Knuth article may [or should] soon be of interest to Computer Language historians only. [In my opinion, a much more interesting perspective on GOTOs and their relation to procedures is found in Stele's "Debunking the EXPENSIVE PROCEDURE CALL Myth or, Procedure Call Implementations Considered Harmful or, LAMBDA: The Ultimate GOTO" ACM Conference Proceedings, pp. 153-162, 1977.] oz -- ... and they will all Usenet: [decvax|ihnp4]!utzoo!yunexus!oz bite the dust ... .......!uunet!mnetor!yunexus!oz comprehensively. ... Bitnet: oz@[yusol|yulibra|yuyetti] Archbishop Tutu Phonet: +1 416 736-5257 x 3976
levy@ttrdc.UUCP (Daniel R. Levy) (04/22/88)
In article <412@yunexus.UUCP>, oz@yunexus.UUCP (Ozan Yigit) writes: # >Mr. Spencer, put your code where your mouth is. For each goto example in # >Knuth, show us how you would code it to run equally efficiently without # >gotos. Fair enough? # # Why are the smileys missing from your last paragraph ?? :-) # Have you actually read that article?? (ACM Computing Surveys # V.6 #4 pp. 262-301 Dec. 74) Henry makes a good point # indeed.. [Skip the Reader's Exercises part...:-)] I think # the Knuth article may soon be of interest to Computer # Language historians only. The smileys are missing because I meant it. Henry might well discover that it isn't so easy, nor is the result so superior, as he thinks. He said (to paraphrase) that the whole damn suite of examples does not have one use of GOTO that he would stoop to calling justifiable. That is a serious charge and one which he should either prove or back away from. And why are you so quick to relegate this work to the bit bucket? -- |------------Dan Levy------------| Path: ihnp4,<most AT&T machines>!ttrdc!levy | AT&T | I'm not a real hacker, but I play one on | Data Systems Group | USENET. If you think that AT&T endorses |--------Skokie, Illinois--------| my opinions, I've a nice bridge to sell ya.
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
[Part 0: Introduction] Ozan Yigit, Dan Levy, and Henry Spencer have all been flinging about a reference to Donald Knuth's paper, ACM Computing Surveys 6:4, 262-301 (1974). In particular, Mr. Levy challenges Messrs. Spencer and Yigit to remove the GO TOs from each of Knuth's examples, and to comment on the elegance and utility of the result. In this article, and its several sequelae, I intend to put in my two cents on the subject, and to clarify some of the issues, I hope. Wherever I refer to EXAMPLE n, it is to be understood that I am citing the corresponding displayed example in Knuth's paper. Incidentally, I learned quite a bit in reviewing the paper; it was several years since I last encountered it. Knuth has a number of points which have still not been addressed; most of them represent concepts other than the goodness or evilness of the GO TO, however, when applied to today's programming languages. I strongly recommend that anyone who is entering this discussion, on either side of the fence, get a copy of the paper and study it. Both sides can learn from it. Knuth's examples can be classified into the following set of general topics: * Double-exit loops and double decisions. In general, these constructs arise when some conditional or looping construct is necessary to calculate the condition for another conditional or looping construct. EXAMPLES 1-3, dealing with table searching, fall into this category. EXAMPLE 4 can be treated in this category, as well; I choose, however, to discuss it by itself. * Finite-state recognizers and the `pushback' concept. EXAMPLE 4 uses the GO TO to represent, in the program's control structure, the concept of `pushing back' data on an input stream. * Parallel data structures. EXAMPLE 5 shows a program that uses a set of GO TO statements to represent carrying out the same operation on one of several variables. It is also reasonable to consider EXAMPLE 5 as if it were a recursive algorithm. * Recursion elimination. Knuth demonstrates the use of GO TO to eliminate the use of recursive procedure calls. EXAMPLE 6 is the most obvious example of this use of GO TO; EXAMPLES 7 and 8 can be considered in this context as well. * The `set of actions' construct. Knuth points out that many applications involving interpreters or simulators have a multi-way branch (a _switch_ in C) that selects one of a set of cases. Frequently, these cases have common code, and a procedure call may be too expensive when executing the common code. Knuth does not mention the `panic exit' form of GO TO; I shall discuss this one as well, as I find it of some interest. Many uses of GO TO can be considered under more than one of these categories. To some extent, this is inevitable: in particular, the ones that arise from the elimination of recursion can be used to model any of the others, since all control structures can be modeled by recursive functions. [End of part 0]
oz@yunexus.UUCP (Ozan Yigit) (04/26/88)
In article <2597@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes: >In article <412@yunexus.UUCP>, oz@yunexus.UUCP (Ozan Yigit) writes: >[removed parts] ...whole damn suite of examples does not have >one use of GOTO that he would stoop to calling justifiable. That is a >serious charge and one which he should either prove or back away from. > Oh whoppie. He did carefully qualify his statements with a reference to current language + compiler technology. >And why are you so quick to relegate this work to the bit bucket? > I am not so quick .. Actually, it took eight years :-). The article was interesting when I read it around 1980. Today, only some parts are somewhat interesting, and I think the rest is way out-of-date and uninteresting, at least for the languages I have been using. Check out the article and see for yourself. All of his very structured ways of using GOTOs to get out of loops etc. are covered by consistent language facilities, which is obviously not what is argued in the current GOTO discussion(?). As for other uses of goto for "optimization" purposes: things like "tail recursion optimization", "code motion out of loops" etc. are the type of things a good compiler can (or should) deal with. [In fact, there are languages out there, like SCHEME, where tail-recursion- removal and tail-call-to-jump optimizations are *generic* to an implementation. No-one using the language ever thinks about it.] Even some of Knuth's writings could become out-of-date don't you think ?? When was the last time you programmed in MIX ?? :-) oz -- ... and they will all Usenet: [decvax|ihnp4]!utzoo!yunexus!oz bite the dust ... .......!uunet!mnetor!yunexus!oz comprehensively. ... Bitnet: oz@[yusol|yulibra|yuyetti] Archbishop Tutu Phonet: +1 416 736-5257 x 3976
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
[Part 1: Double-exit loops and double decisions.] Before the advent of GO TO-less programming, many programmers recognized the advantages of straightforward control flow, and in fact, coded programs that created structured control structures using GO TO statements. With the advent of structured programming, these programs could be converted to use the Boehm-Jacopini [Boeh66] control structures; the conversion was usually straightforward. There was a common case, though, that defied conversion without either duplicating code or introducing extraneous variables. This case was where the result of executing conditional code was in turn used in the control condition of another conditional or looping context. In a study of Fortran programs, which addressed data flow analysis rather than control structures, Farrow, Kennedy and Zucconi [Farr76] identified two new control structures that address most of these cases. I propose looking at their control structures in the context of structuring `C' programs. They define their control structures in terms of program flow graphs; I will use GO TO programs to show the sort of analysis that can be done when a program is viewed in this light. Furthermore, to preserve the graph-theoretic nature of the arguments, I will sometimes insert needless GO TO statements to avoid considering the effects of `fallthrough.' If readers are unfamiliar with the fundamental techniques whereby flow graphs that already conform to Bohm-Jacopini control structures can be reduced to the corresponding control structures, I refer them to Appendix A at the end of this article. 1.1 The double decision. EXAMPLES 1-3 all use a control flow that cannot be reduced to the Boehm-Jacopini control-flow primitives without either introducing auxiliary variables or copying parts of the code. This limitation to the Boehm-Jacopini structures was first pointed out by Ashcroft and Manna [Ashc71], several years before Knuth's paper. In their analysis of programs, Farrow, Kennedy and Zucconi were able to introduce an additional control structure into their graphs; the new structure handles the Ashcroft-Manna counterexample, and in addition can handle EXAMPLES 1-3. The new structure is the _double decision_, corresponding to the program fragment: if (COND1) { action1; goto A; } action2; if (COND2) { action3; goto A; } action4; goto B; There is a trivial way to address this problem, introducing an auxiliary variable. We convert the program to the intermediate representation: if (COND1) { action1; flag = TRUE; } else { action2; if (COND2) { action3; flag = TRUE; } else { action4; flag = FALSE; } } if (flag) goto A; else goto B; and then reduce the concluding _if_ statement according to the techniques in Appendix A. I contend, without proof in this paper, that disciplined use of the _break_ statement can eliminate the auxiliary variable in all cases where (a) action1 and action3 are both null, or (b) action4 is null. I have a set of translation rules that will convert programs containing double decisions to conventional control structures; if there is sufficient interest, I may consider writing a paper on this. 1.2 The double-exit loop. A few programs (none of Knuth's fall into this category) cannot be reduced successfully with the double decision, but can by introducing another concept from Farrow, Kennedy, and Zucconi, the _double-exit loop._ The double-exit loop looks like the following: A: action1; if (COND1) { action2; goto B; } action3; if (COND2) { action4; goto C; } action5; goto A; Once again, there is a trivial translation to conventional `C' if we introduce an auxiliary Boolean variable: A: for (;;) { action1; if (COND1) { action2; flag = TRUE; break; } action3; if (COND2) { action4; flag = FALSE; break; } action5; } if (flag) goto B; else goto C; Once again, there are important special cases with null actions that allow the auxiliary variable to be eliminated by disciplined use of _break_ and _continue_. 1.3 Application to the examples. All of Knuth's examples use the double decision in a way that can avoid the use of auxiliary Booleans. Translated to idiomatic C (i.e., using 0-based indexing and the _for_ statement), Knuth's Example 1, with goto's, looks like: /* 1 */ for (i = 0; i < m; ++i) if (A[i] == x) goto found; /* not found */ i = m++; A[i] = x; B[i] = 0; found: ++B[i]; Knuth proposes, in Example 1a, eliminating the _goto_ with something like: /* 1a */ for (i = 0; i <= m && A[i] != x; ++i) /* empty loop body */ ; if (i > m) { i = m++; A [i] = x; B [i] = 1; } else ++B[i]; In this example, he points out that the test for (i <= m) in the inner loop is superfluous with proper data structures, and comes up with: /* 2 */ A[m] = x; for (i = 0; A[i] != x; ++i) /* empty loop body */ ; if (i > m) { i = m++; A [i] = x; B [i] = 1; } else ++B[i]; This last form is close to the way I'd code a linear search. It also has no _goto_s. EXAMPLE 3 does not admit gracefully of such restructuring. The example, with GO TO's, is /* 3 */ i = hash (x); while (A[i] != 0) { if (A[i] == x) goto found; if (--i < 0) i = m - 1; } A [i] = x; B [i] = 0; found: ++B[i]; Following Knuth's lead in interchanging the checks on A[i] (EXAMPLE 3B), we arrive at: /* 3b */ i = hash (x); while (A[i] != x) { if (A[i] == 0) goto not_found; if (--i < 0) i = m - 1; } goto found; not_found: A[i] = x; B[i] = 0; found: ++B[i]; But here we note that the `not_found' case can be hoisted into the lexical scope of the `while': i = hash (x); while (A[i] != x) { if (A[i] == 0) { A[i] = x; B[i] = 0; break; } if (--i < 0) i = m - 1; } ++ B[i]; This is close to the way I'd express it in `C'. I would, however, make a _for_ statement out of the first two lines; the _for_ statement would have an empty third part. Testing of this version (on a VAX, using 4.3BSD, and compiling all versions of the program with -O) shows that it's actually faster than any of Knuth's examples with the _goto_ statements, so in this case, at least, I contend that the arguments that structured programming degrades performance are specious. ------------------------------------------------------------------------ Appendix A. Reduction of flow graphs of already-structured programs. A.1 Sequencing. A program that contains a GOTO A, where the GOTO is the only way to reach label A, can have the GOTO eliminated in a trivial way, by reordering the code so that the statements at A immediately follow the GOTO. A.2 Conditionals. A program whose general structure is if (COND) goto A; else goto B; .... A: action1; goto C; .... B: action2; goto C; where there is no other way to reach the code contained in action1 and action2, can be replaced with if (COND) { action1; } else { action2; } A.3 Loops. The sequence of code: A: action1; if (COND) goto B; else goto C; .... B: action2; goto A; .... C: is a generalized one-entry one-exit loop. It can be converted mechanically to: A: for (;;) { action1; if (COND) break; action2; } There are two important special cases of this construct. If action1 is null, we can write A: while (!(COND)) { action2; } and if action2 is null, we can write A: do { action1; while (!(COND)); } Moreover, if action2 can be expressed as a single statement in C, we have the construct A: for (; !(COND); action2) { action1; } and, of course, straight-line code preceding A can be merged with the _for_ statement if desired. Which of these equivalent constructions is to be preferred depends on the context. These three reductions, applied successively, can be used to convert any flowgraph of a program that is already structured to the corresponding C program statements. Note that these reductions handle the common case of a `loop n and a half times,' where a loop exits from the middle; they avoid both copying code and the use of the GO TO by a disciplined use of the _break_ statement.
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
[Part 2: Finite-state recognizers and pushback] EXAMPLE 4 shows how _goto_s may be used to good effect in finite-state recognizers for processing text. Here we see, however, how programming tools have improved since Knuth's day. Most experienced C programmers, at least from among those workig on UN*X, would code EXAMPLE 4 with Lex [Lesk75]: %{ int column = 0; /* Column in the output */ %} %% '//' { putchar ('\n'); column = 0; } '/' { column = tabulate (column); } '.' { ECHO; putchar (' '); column += 2; } \n { ; } . { ECHO; ++column; } %% int tabulate (column) int column; { .... } Without Lex, there is another possibility, which is to use the `ungetc' feature of the standard I/O library. In this case, the procedure winds up being: c = getchar (); switch (c) { case EOF: return; case '\n': break; case '/': if ((c = getchar ()) == '/') { putchar ('\n'); column = 0; } else column = tabulate (column); break; case '.': putchar (c); putchar (' '); column += 2; break; default: putchar (c); column += 1; break; } Only if ungetc is unavailable do we need to resort to Knuth's examples. In this case, I actually prefer the use of the temporary Boolean; it may be interpreted as `there is a character in the ungetc buffer.'
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
[Part 3: Operations on parallel data structures] EXAMPLE 5 looks like: search: if (A[i] < x) { if (L[i] != 0) { i = L[i]; goto search; } else { L[i] = j; goto insert; } } else if (R[i] != 0) { i = R[i]; goto search; } else { R[i] = j; goto insert; } } insert: A[j] = x; L[j] = 0; R[j] = 0; Knuth claims that the _goto_ structure brings out the nature of the parallel operations on the L and R links. If this be true, why not collapse the code for these parallel operations? I would much rather see: for (;;) { register int *ptr; if (A[i] < x) ptr = & L [i]; else ptr = & R [i]; if (!*ptr) { *ptr = j; break; } i = *ptr; } A [j] = x; L [j] = 0; R [j] = 0; We have handled this by generating one reference to the member of the L or R array in question, and then using this reference, rather than duplicating the code. The argument about performance is difficult to address in this case, since it is heavily dependent on the machine architecture. Anyone who is worried about performance for code like this, though, will use pointers rather than array indices and make the L and R structure components rather than arrays, so the question is moot. Moral: If you're doing the same thing to parallel data structures, run the same code on the two structures rather than complicating your control flow by duplicating the code. [End of part 3.]
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
[Part 4: Eliminating recursive calls.] In EXAMPLE 6, Knuth gives an example of code that has been processed to eliminate recursive function calls. This code contains the perceived abomination of not only a _goto_, but a _goto_ into an inner lexical scope. I suspect, in this case, that even those who, like Mr. Spencer, find the _goto_ abhorrent will find this case inoffensive. Nobody is proposing that code looking like this should ever be coded or maintained directly. Knuth himself says that the code results from automatic translation of the recursive version; I hope that Mr Spencer will agree that the back end of a compiler or preprocessor is free to generate code that is as ugly as it wishes. I personally would prefer that the backend contain some sort of flow analyzer based on the translation rules to which I alluded in Part 1. I drew out a flow graph of EXAMPLE 6C, and applied my translation rules to it. The resulting graph contained neither double decisions nor double-exit loops. Somewhat to my surprise, the resulting code was nearly identical to EXAMPLE 6E. Untangling the spaghetti of EXAMPLE 6C with the -O option caused both the versions to generate very similar obejct code. I was unable to follow Knuth's argument with EXAMPLES 7 and 8. It appears that, while he found the usage of _goto_ gave better performance in the quicksort described, Sedgewick found a _goto_-less version that performed better (EXAMPLE 8A). Moral 1: If it's convenient, let your preprocessor generate _goto_s. The user won't have to maintain its output, after all. Moral 2: Compilers should look into optimizing recursive function calls more aggressively. Moreover, other procedure calls should also be optimized more aggressively than they are; most non-recursive ones, for instance, could be optimized to share the static parent's activation record, at a nontrivial saving of run time. I have a paper on this last point by Barry Wolman, entitled `Reducing the cost of procedure activations in block-structured languages.' I have never been able to locate a published version (I have a photocopy of a typewritten preprint that's nearly illegible); can anyone out there find me a pointer to it? [End of part 4]
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
[Part 5. Sets of actions.] Knuth mentions that in many examples of simulators and interpreters, it is advantageous to do things like: subtract: operand = - operand; goto add; add: accumulator = accumulator + operand; goto no_op; jump_if_overflow: if (overflow_indicator) { overflow_indicator = false; goto jump; } else goto no_op; no_op: ++simulated_time; I submit that, with today's larger memories, it is preferable to duplicate the code (It's certainly faster not to have all the unconditional branches in the instruction stream). The above could be written as something more like: #define ADD accumulator = accumulator + operand #define ADVANCE_TIME ++simulated_time case subtract: operand = - operand; ADD; ADVANCE_TIME; break; case add: ADD; ADVANCE_TIME; break; case jump_if_overflow: if (overflow_indicator) { overflow_indicator = false; JUMP; } ADVANCE_TIME; break; case no_op: ADVANCE_TIME; break; Moral: If the programmer is tempted to use GOTOs for a non-looping control structure, it is preferable to duplicate code, using macros if necessary to keep the duplicated code grouped together in the source file. [End of part 5.]
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
[Part 6. Panic exits] The notion of using a _goto_ for a panic exit condition is touched on by Knuth only in passing. There are many cases, generally ignored in the academic training that computer scientists receive, where a complicated operation can be interrupted by the occurrence of an unexpected condition. Generally, the result of such an interruption, in classroom exercises, is to terminate the program. Simply aborting is, however, unacceptable in the real world; for instance, a text editor that aborts (and quite possibly destroys the user's file) when the user makes an error (or the system detects one) is nearly unusable. Unquestionably, a program that can encounter an unexpected condition in an inner structure can recover by testing for that condition in all the containing structures. Such tests, usually written as statements like if (error_detected) break; while (... && !(error_detected)) .... error_detected != do_function (....) both clutter the code and are a decided performance expense at run time, as they repeat the tests for errors in all the loops of the program. In many of these cases, it is advisable to, instead of repeating tests, simply insert code of the form if (failure) goto error_handler; at the point where the failure can occur. The `C' idiom does not use the evil word _goto_ for this construct, since the transfer is normally out of the lexical scope, and quite possibly out of the source file. Rather, the usual way to do it in `C' is to use the constructs _setjmp_ and _longjmp._ Moral: Using _goto_ may be acceptable to break a deep nest of control structures in the event of an unusual occurrence. This usage is quite possibly the only acceptable context for _goto_. [End of part 6.]
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
[Part 7. Conclusions] As we have seen, programming tools that have come into general use since 1974, together with the larger memories now available on computers, render most of the arguments in Knuth's paper obsolete. There is still a use for the _goto_ statement in code generated by preprocessors and compilers (for such things as finite-state machines and the elimination of recursive calls). The _goto_ statement is also still useful for `panic exit' conditions. These uses alone justify leaving the _goto_ statement in the language. The advent of _break_ and _continue_ make the use of _goto_ in double decisions and double-exit loops superfluous, for the most part. The small minority of cases where the loop cannot be expressed with these newer constructs are probably better handled with repeated tests or auxiliary variables. Cases where performance considerations preclude using the auxiliary variables or repeating the tests are rare. In this vanishing minority of cases, a _goto_ may again be an acceptable evil -- but the code should be profiled, and the performance gain calculated, before any such optimization is attempted. (It is of note that modern compilers are beginning to be able to do this sort of optimization for the user, by using techniques of data-flow analysis to delete redundant tests.) The advent of _ungetc_, and of finite-state scanner generators, makes the use of _goto_ much less attractive in text-scanning applications. EXAMPLE 4 is unacceptable given today's programming practice. The availability of pointers and generalized call-by-reference in high-level languages means that _goto_ is no longer necessary to describe the control structures for performing parallel operations on different data. EXAMPLE 5 is much better handled by use of a pointer. I leave the discussion of alternative control structures (not involving _goto_) for multiple-exit constructs to others; the suggestions that Knuth makes in this regard may be valid, but are still sufficiently immature to consider incorporating them in a standard language such as `C'. In short, use of _goto_ directly by a programmer is most likely excusable only where the _goto_ represents a panic exit condition. There are a few other cases where compilers and preprocessors can generate _goto_s freely. All other uses are to be disparaged. References [Ashc71] Ashcroft, Edward, and Zohar Manna. ``The translation of `goto' program to `while' programs.'' Information Processing 71: Proc. IFIP Congress 71, Lubljana, Jugoslavia, pp. 250-255. [Boehm66] Boehm, C., and G. Jacopini. ``Flow diagrams, Turing machines, and languages with only two formation rules.'' CACM 9:5 (1966) 366-371. [Farr76] Farrow, R., K. Kennedy, and L. Zucconi. ``Graph grammars and global program flow analysis.'' Proc. 17th Annual IEEE Symposium on Foundations of Computer Science, Houston, Texas, 1975. [Kenn81] Kennedy, Ken. ``A survey of data flow analysis techniques,'' in _Program Flow Analysis: Theory and applications_, Steven S. Muchnuck and Neil D. Jones, eds., Englewood Cliffs, New Jersey: Prentice-Hall, 1981, pp. 5-54. [Lesk75] Lesk, Michael E. ``Lex -- a lexical analyzer generator.'' CSTR 39, Murray Hill, New Jersey: Bell Laboratories, 1975. [Wolm??] Wolman, Barry E. ``Reducing the cost of procedure activation in block structured languages.'' Circa 1976, publication information unavailable at this time.
kenny@uiucdcsb.cs.uiuc.edu (04/26/88)
One line summary: Henry, you're right. Dan, you're right. Now shut up, both of you.
g-rh@cca.CCA.COM (Richard Harter) (04/26/88)
In article <165600045@uiucdcsb> kenny@uiucdcsb.cs.uiuc.edu writes: > >In short, use of _goto_ directly by a programmer is most likely >excusable only where the _goto_ represents a panic exit condition. >There are a few other cases where compilers and preprocessors can >generate _goto_s freely. All other uses are to be disparaged. This is from a very nice series of articles. Well done, sir! In civilized languages I average about one goto per 15,000 lines of code, usually of the 'panic exit' form. However the most common instance is of the form procedure () { prolog code main body epilog: epilog code } The gotos live in the main body at various levels of nesting and all transfer to the epilog code which typicaly does things like returning space to the allocator. This, I suppose comes under the category of multiple exits from a main body. Sometimes you can arrange things to use break statements -- sometimes this is not convenient. In D or some other such language of the future one might find it profitable to have an escape statement that takes you up to the top level within the function, e.g. foobar () { while (foo) { stuff; while (bar) { more stuff; if (stop_this_nonsense) superbreak; other stuff; } } cleanup code; } Superbreak escapes all the way to the code following the outermost while, i.e. the cleanup code. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
henry@utzoo.uucp (Henry Spencer) (04/27/88)
> [common tail actions] > I submit that, with today's larger memories, it is preferable to > duplicate the code... One can avoid this to some degree in C by using case fallthrough. Otherwise, I agree but would phrase it differently: for "with today's larger memories" substitute "with sensible compilers". It doesn't take much of an optimizer to notice that the instruction sequences preceding an unconditional control- flow merge are identical, and condense them into one by branching to the beginning of the sequence rather than the end. Any peephole optimizer worthy of the name ought to be willing to do this. -- NASA is to spaceflight as | Henry Spencer @ U of Toronto Zoology the Post Office is to mail. | {ihnp4,decvax,uunet!mnetor}!utzoo!henry
kenny@uiucdcsb.cs.uiuc.edu (04/28/88)
I made an error in Part 2 of my recent posting on this subject. I left out the ungetc() (which is the key to the whole procedure!) in the sample C version of Knuth's Example 5. The code to handle '/' should look like: case '/': if ((c = getchar ()) == '/') { putchar ('\n'); column = 0; } else { ungetc (c); column = tabulate (column); } break; It beats me how I let that particular gaffe slip through, but I humbly and contritely apologize to anyone that I confused with it. Kevin Kenny UUCP: {ihnp4,pur-ee,convex}!uiucdcs!kenny Department of Computer Science ARPA: kenny@B.CS.UIUC.EDU (kenny@UIUC.ARPA) University of Illinois CSNET: kenny@UIUC.CSNET 1304 W. Springfield Ave. Urbana, Illinois, 61801 Voice: (217) 333-6680
cramer@optilink.UUCP (Clayton Cramer) (04/28/88)
> > [Part 6. Panic exits] > > The notion of using a _goto_ for a panic exit condition is touched on > by Knuth only in passing. > > There are many cases, generally ignored in the academic training that > computer scientists receive, where a complicated operation can be > interrupted by the occurrence of an unexpected condition. Generally, > the result of such an interruption, in classroom exercises, is to > terminate the program. Simply aborting is, however, unacceptable in > the real world; for instance, a text editor that aborts (and quite > possibly destroys the user's file) when the user makes an error (or > the system detects one) is nearly unusable. > > Unquestionably, a program that can encounter an unexpected condition > in an inner structure can recover by testing for that condition in all > the containing structures. Such tests, usually written as statements > like > > if (error_detected) break; > > while (... && !(error_detected)) .... > > error_detected != do_function (....) > > both clutter the code and are a decided performance expense at run > time, as they repeat the tests for errors in all the loops of the > program. > > In many of these cases, it is advisable to, instead of repeating > tests, simply insert code of the form > > if (failure) goto error_handler; > > at the point where the failure can occur. Since the objective is do some cleanup (close the editor's files, spray error messages on the user's screen, post an article about the failure on USENET, etc.) before giving up, the correct solution is; if (failure) ErrorHandler(__FILE__, __LINE__); ErrorHandler (FileName, LineNbr) char *FileName; int LineNbr; { /* Clean up everything. */ . . . /* Print out complaints. */ fprintf (stderr, "This software self-destructed at %d in %s\n", LineNbr, FileName); exit (9000); } Note that this gives the catastrophic failure cleanup the poster was saying was needed before killing the process, without the evil goto. Note also that calling a function, rather than using a goto makes it possible to pass information about the failure (in this case, what line and filename was where the error occurred), where the goto approach requires the use of global variables. > Moral: Using _goto_ may be acceptable to break a deep nest of control > structures in the event of an unusual occurrence. This usage is quite > possibly the only acceptable context for _goto_. > > [End of part 6.] Very true. But for the case described, calling a function that does an exit is STILL preferable to a goto. Clayton E. Cramer
wsmith@uiucdcsm.cs.uiuc.edu (04/28/88)
> However the most > common instance is of the form > > procedure () { > prolog code > main body > epilog: epilog code > } > > The gotos live in the main body at various levels of nesting and all > transfer to the epilog code which typicaly does things like returning > space to the allocator. This is an alternative technique that may avoid the goto: procedure () { prolog code do { main body } while (0); epilog code } A break or continue inside the 1 time only do-while will jump to the epilog code. I think this is only an academic curiosity, and I haven't ever seen code actually using this construct. Has anyone actually written code with this in it? The optimizer should generate the same code as if goto's were used directly. Bill Smith wsmith@a.cs.uiuc.edu ihnp4!uiucdcs!wsmith
wesommer@athena.mit.edu (William E. Sommerfeld) (04/28/88)
In article <2030@optilink.UUCP> cramer@optilink.UUCP (Clayton Cramer) writes: >Since the objective is do some cleanup (close the editor's files, spray >error messages on the user's screen, post an article about the failure >on USENET, etc.) before giving up, the correct solution is; > >if (failure) > ErrorHandler(__FILE__, __LINE__); > [error handler prints a message on stdout and exits] > >Note that this gives the catastrophic failure cleanup the poster was saying >was needed before killing the process, without the evil goto. This assumes that the only way to recover is to kill off the process completely. In UNIX, where each user-typed command starts a new process, this almost makes sense, since it returns you to the shell. However, there are environments where killing off the process in response to an error is not acceptable--editors, shells, or network server processes come to mind here, as does the kernel. You want to be able to gracefully unwind the stack, freeing up resources and unlocking locks as you go. People familiar with PL/I's 'on cleanup' and LISP's 'unwind-protect' should understand what I mean here... At least there are some approximations to this under UNIX... both Amdahl and Apollo have implemented simple exception handling mechanisms on top of setjmp/longjmp; Amdahl talked about theirs (used for exception handling in the kernel) at the last USENIX, and Apollo uses their implementation in NCS (their remote procedure call package). - Bill
g-rh@cca.CCA.COM (Richard Harter) (04/28/88)
In article <4700011@uiucdcsm> wsmith@uiucdcsm.cs.uiuc.edu writes: >This is an alternative technique that may avoid the goto: > procedure () { > prolog code > do { > main body > } while (0); > epilog code > } >A break or continue inside the 1 time only do-while will jump >to the epilog code. I think this is only an academic curiosity, and >I haven't ever seen code actually using this construct. Has anyone >actually written code with this in it? The optimizer should generate >the same code as if goto's were used directly. Yep. Once in a great while I use that particular technique. Although the language purists may frown, I have stashed away in the standard include file the following defines: #define BLOCK do { #define ENDBLOCK } while(0); #define LOOP for(;;) { #define ENDLOOP } The BLOCK/ENDBLOCK combo is mostly useful in situations like this LOOP get_next_thingy do_some_stuff BLOCK if (Special_case_1) { handle special case; break; } intervening code; if (special case 2) { take care of it; break; } more intervening code; .... ENDBLOCK more stuff ENDLOOP You can, of course, handle this with nested if's, but I for one think that the resulting code is kludgy, e.g. if (special case 1) { take care of it; break; } else { intervening code; if (special case 2) { take care of it; break; } else { .... The problem with this code is that it obscures the essential structure of what is being done -- the task is to filter the thingy though a series of tests and quit testing when a test is passed. And now for my little grasshopper for programmers: "The essence of structured programming is to identify the essential structure of the problem and arrange the code to reflect that structure." -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
limes@sun.uucp (Greg Limes) (04/29/88)
In article <4700011@uiucdcsm> wsmith@uiucdcsm.cs.uiuc.edu writes: > procedure () { > prolog code > do { > main body > } while (0); > epilog code > } > > >A break or continue inside the 1 time only do-while will jump >to the epilog code. I think this is only an academic curiosity, and >I haven't ever seen code actually using this construct. Has anyone >actually written code with this in it? The optimizer should generate >the same code as if goto's were used directly. I used this particular dodge to get around some heavy nesting in a previous job doing high speed network stuff, then started using it for some of the utilities. Nice when you have an inner routine that processes something in nineteen different steps, and if any of them fails you just want to abort the process, clean up, and return a failure. I still occasionally use it in quick utilities, although there is nearly always a better way to do what I want. -- Greg Limes [limes@sun.com] frames to /dev/fb
limes@sun.uucp (Greg Limes) (04/29/88)
In article <1988Apr27.164212.12535@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: > It doesn't take much of an optimizer >to notice that the instruction sequences preceding an unconditional control- >flow merge are identical, and condense them into one by branching to the >beginning of the sequence rather than the end. So the compiler can recognize the common code -- big deal. If I have several cases, all of which want the same ten line trailer, I would rather put it in one place than spread it out in my source. This calls for a boolean, or (gasp) maybe even a goto. -- Greg Limes [limes@sun.com] frames to /dev/fb
dhesi@bsu-cs.UUCP (Rahul Dhesi) (04/29/88)
In article <1988Apr27.164212.12535@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >It doesn't take much of an optimizer >to notice that the instruction sequences preceding an unconditional control- >flow merge are identical, and condense them into one by branching to the >beginning of the sequence rather than the end. Any peephole optimizer >worthy of the name ought to be willing to do this. My reading of the original article by kenny@uiucdcsb.cs.uiuc.edu is that kenny's example (a CPU simulator) would duplicate code in such a way that a peephole optimizer would not find it. A peephole optimizer scans the code through a window of limited size, and rearranges code in that window so that the net effect of the window remains unchanged. I don't think you can remove duplicate code in different places this way. Perhaps one could write a special-case optimizer that works only on switch statements, looking for and eliminating duplicate code preceding break statements. But if such duplicate code occurs only rarely, having everybody's compiler include such an optimizer may be overkill, and it might prove far simpler to do the optimization manually (i.e., using gotos) and keep the optimizer simple. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
peter@athena.mit.edu (Peter J Desnoyers) (04/29/88)
In article <2800@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: +In article <1988Apr27.164212.12535@utzoo.uucp> henry@utzoo.uucp (Henry +Spencer) writes: + ++It doesn't take much of an optimizer ++to notice that the instruction sequences preceding an unconditional control- ++flow merge are identical, and condense them into one by branching to the ++beginning of the sequence rather than the end. + +A peephole optimizer +scans the code through a window of limited size, and rearranges code in +that window so that the net effect of the window remains unchanged. I +don't think you can remove duplicate code in different places this +way. +-- +Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi The Apollo C compiler performs this optimization. It makes debugging at the source level tons of fun - you jump by some magic from one case to another. I don't know how it does it - if I were given the task, I would write a switch compress routine that compared each case backwards (subject to constraints) and found the longest matches. Peter Desnoyers peter@athena.mit.edu
henry@utzoo.uucp (Henry Spencer) (05/01/88)
>... A peephole optimizer >scans the code through a window of limited size, and rearranges code in >that window so that the net effect of the window remains unchanged. I >don't think you can remove duplicate code in different places this >way. "Peephole optimizer" is not a precise term nowadays; if you are willing to stipulate one that has two windows rather than one, the optimization I mentioned is possible. It's not an uncommon optimization. Even the ancient pdp11 C compiler (the original C compiler) does it. > ... if such duplicate code occurs only rarely, > having everybody's compiler include such an optimizer may be overkill... My impression is that the general case of common code before a merge point is relatively frequent, although the common code is usually small and the gains to be had from optimizing it are modest. But then, nobody expects a peephole optimizer to do anything massive. Little improvements add up. -- NASA is to spaceflight as | Henry Spencer @ U of Toronto Zoology the Post Office is to mail. | {ihnp4,decvax,uunet!mnetor}!utzoo!henry
wes@obie.UUCP (Barnacle Wes) (05/02/88)
In article <1988Apr27.164212.12535@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: | It doesn't take much of an optimizer | to notice that the instruction sequences preceding an unconditional control- | flow merge are identical, and condense them into one by branching to the | beginning of the sequence rather than the end. Any peephole optimizer | worthy of the name ought to be willing to do this. In article <2800@bsu-cs.UUCP>, dhesi@bsu-cs.UUCP (Rahul Dhesi) replies: > My reading of the original article by kenny@uiucdcsb.cs.uiuc.edu is > that kenny's example (a CPU simulator) would duplicate code in such a > way that a peephole optimizer would not find it. A peephole optimizer > scans the code through a window of limited size, and rearranges code in > that window so that the net effect of the window remains unchanged. I > don't think you can remove duplicate code in different places this > way. This form of optimization is called "common subexpression removal." It is quite often used to optimize for space (less code space is required, obviously). It would NOT commonly be used if you are looking for the utmost in speed, the time spent jumping into/out of the common code does not make for faster programs. -- /\ - "Against Stupidity, - {backbones}! /\/\ . /\ - The Gods Themselves - utah-cs!uplherc! / \/ \/\/ \ - Contend in Vain." - sp7040!obie! / U i n T e c h \ - Schiller - wes
dg@lakart.UUCP (David Goodenough) (05/02/88)
From article <1988Apr27.164212.12535@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer): > One can avoid this to some degree in C by using case fallthrough. Otherwise, > I agree but would phrase it differently: for "with today's larger memories" > substitute "with sensible compilers". It doesn't take much of an optimizer > to notice that the instruction sequences preceding an unconditional control- > flow merge are identical, and condense them into one by branching to the > beginning of the sequence rather than the end. Any peephole optimizer > worthy of the name ought to be willing to do this. Well said - in 1980 the Bell labs V6 Unix C compiler part 'c2' could be made to print out what it had done, and one message was 42 Common code before branches Exactly what Mr. Spencer is talking about. I'm tempted to add that a good peephole optimizer should be added to some current compilers: it gets my back up when our compiler generates code like movl d0,d0 or movl d1,d0 movl d1,d0 I've seen both of these!! No kidding! Or when it puts 'nop's into the assembler source. -- dg@lakart.UUCP - David Goodenough +---+ | +-+-+ ....... !harvard!adelie!cfisun!lakart!dg +-+-+ | +---+
djones@megatest.UUCP (Dave Jones) (05/05/88)
in article <211@obie.UUCP>, wes@obie.UUCP (Barnacle Wes) says: > ... > ... "common subexpression removal." > It is quite often used to optimize for space (less code space is > required, obviously). It would NOT commonly be used if you are > looking for the utmost in speed, the time spent jumping into/out of > the common code does not make for faster programs. > -- Nope, that's not what "common subexpression removal" is. Consider this Pascal fragment: A[i].k := A[i].k + j; The expression A[i].k occurs twice. Furthermore, its "L-value", or location, must evaluate to the same address when each subexpression is evaluated. So a clever compiler will only calculate the address of A[i].k once, and will leave it in a register to be reused. There is no "jumping into/out of the common code". In Pascal programs, this optimization will often increase speed by thirty percent or more. Depends on the program. -- Dave J.