vlcek@mit-caf.MIT.EDU (Jim Vlcek) (09/03/88)
Somehow I managed to lose Tom Duff's article on Duff's device, which quoted his original letter to dmr. Would someone who saved it please email me a copy? -- Jim Vlcek vlcek@caf.mit.edu !{ihnp4,harvard,seismo,rutgers}!mit-eddie!mit-caf!vlcek
g-rh@cca.CCA.COM (Richard Harter) (09/03/88)
Now I am a man that can write a goto without flinching and I can take structured programming or leave it, but Duff's device gives me the shudders. Granted that it may be guaranteed by the language specification. I have seen too many "broken" compilers that don't handle obscure technicalities correctly to be comfortable with "under such and such circumstances you may branch into a loop". Color me old-fashioned, but the idea gives me the creeps. Duff's device is another variant on the the loop and a half problem where you need to execute a loop body a whole number of times and part of the loop body once. His solution is (using goto's rather than a switch) is: goto A; do { code_section_1 A: /* Entry point for partial loop construction */ code_section_2 } while (sometest); Now all us good children know that when we write a loop with conditionals the compiler does some magic for us -- it creates some labels and transfers and tests and maybe does some block allocation. And it is incumbent upon us not to confuse the compiler. If I declare some variables at the start of the loop are they going to be there when I jump into the middle? This is the sort of code that you have to be a language lawyer to be sure that it works right -- and you have to assume that the compiler writer was a good language lawyer too. I don't like to write code that requires being a language lawyer to read and verify; life is too short. The besides of which, the device is not needed. If you are unrolling loops for speed the presumption is that you are willing to write duplicate code. You can always write code_section_2 do { code_section_1 code_section_2 } while (sometest); Or in the case in hand switch (remnant) { case 7: stuff; case 6: stuff; .... case 1: stuff; default: break; } for (;count;count--) { stuff; ... stuff; } It takes more lines of code, but it is a lot clearer, and it doesn't depend on obscurities of the language. Moreover it stands a better chance of executing faster on random compiler X. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
bill@proxftl.UUCP (T. William Wells) (09/07/88)
In article <32941@cca.CCA.COM> g-rh@CCA.CCA.COM (Richard Harter) writes:
: Now all us good children know that when we write a loop with conditionals
: the compiler does some magic for us -- it creates some labels and transfers
: and tests and maybe does some block allocation. And it is incumbent upon
: us not to confuse the compiler. If I declare some variables at the start
: of the loop are they going to be there when I jump into the middle? This
: is the sort of code that you have to be a language lawyer to be sure that
: it works right -- and you have to assume that the compiler writer was a
: good language lawyer too. I don't like to write code that requires being
: a language lawyer to read and verify; life is too short.
Pardon, but the question of what happens to local variables when
entering a loop from other than the top has never been an obscure
issue in C. Both K&R and ANSI assert that the locals will *not*
be initialized, and K&R implies and ANSI explicitly states that
the memory for the locals will be allocated.
I also don't know of any compiler that fails to allocate space in
this situation. Does anyone know of any?
---
Bill
novavax!proxftl!bill
bright@Data-IO.COM (Walter Bright) (09/08/88)
In article <32941@cca.CCA.COM> g-rh@CCA.CCA.COM (Richard Harter) writes:
<Duff's device is another variant on the the loop and a half problem
<where you need to execute a loop body a whole number of times and part
<of the loop body once.
< goto A;
< do {
< code_section_1
<A: /* Entry point for partial loop construction */
< code_section_2
< }
< while (sometest);
This type of code is called an 'irreducible flow graph' in optimizer texts.
Basically, it's a loop with more than one entry point. It's the bane of
optimizers, as most loop optimization algorithms depend on there being
only one entry point. Typical approaches to deal with this are:
1. Abandon optimization on that loop.
2. Abandon optimization on the function that contains that loop.
3. Rewrite the flow graph to eliminate the problem (the algorithm
is called 'node splitting', it's basically putting a copy
of code_section_2 before the loop).
Irreducible flow graphs occur so rarely in production code (even in code
that is a snarl of goto's) that implementing (3) is not worthwhile, and
approach (1) or (2) is used.
Some optimizers do not do flow analysis to detect loops. They rely on
recognizing loop constructs such as for(), while() and do..while(). Since
they cannot recognize an irreducible flow graph, they silently abandon
optimization on *any* function with a goto in it.
gwyn@smoke.ARPA (Doug Gwyn ) (09/09/88)
In article <718@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes: >I also don't know of any compiler that fails to allocate space in >this situation. Does anyone know of any? All the C compilers I know about allocate the space upon function entry, not upon block entry.
pearce@TYCHO.YERKES.UCHICAGO.EDU ("Eric C. Pearce") (09/10/88)
After trying Doug Schmidts duff timer on our Sun 3/260, with the sun supplied cc complier, I found duff's device considerably faster (almost a factor of two). However, the test is unfair in the sense that no attempt is made to optimize the non-duff copy without resorting to syntax-nightmares like the duff code. Additionally, Schmidts test is somewhat unfair since it leaves some loop initialization (such as the initialization of A and B) out of the loop. This will be quite small if the copy is large though. Personally, I prefer this "conventional" copy fragment: A = array1; B = array2; n = Count / 8; for (i = Count%8; i > 0; i--) *A++ = *B++; while (--n >= 0) { *A++ = *B++; *A++ = *B++; *A++ = *B++; *A++ = *B++; *A++ = *B++; *A++ = *B++; *A++ = *B++; *A++ = *B++; } >From a performance standpoint, it falls only 2% behind duff device and is 102% more readable and not "kludgy".