[comp.lang.c] Duff's device.

vlcek@mit-caf.MIT.EDU (Jim Vlcek) (09/03/88)

Somehow I managed to lose Tom Duff's article on Duff's device, which
quoted his original letter to dmr.  Would someone who saved it please
email me a copy?
-- 
Jim Vlcek
vlcek@caf.mit.edu
!{ihnp4,harvard,seismo,rutgers}!mit-eddie!mit-caf!vlcek

g-rh@cca.CCA.COM (Richard Harter) (09/03/88)

Now I am a man that can write a goto without flinching and I can take
structured programming or leave it, but Duff's device gives me the shudders.
Granted that it may be guaranteed by the language specification.  I have
seen too many "broken" compilers that don't handle obscure technicalities
correctly to be comfortable with "under such and such circumstances you
may branch into a loop".  Color me old-fashioned, but the idea gives me
the creeps.

Duff's device is another variant on the the loop and a half problem
where you need to execute a loop body a whole number of times and part
of the loop body once.  His solution is (using goto's rather than a
switch) is:

  goto A;
  do {
    code_section_1
A:       /* Entry point for partial loop construction */
    code_section_2
    }
    while (sometest);

Now all us good children know that when we write a loop with conditionals
the compiler does some magic for us -- it creates some labels and transfers
and tests and maybe does some block allocation.  And it is incumbent upon
us not to confuse the compiler.  If I declare some variables at the start
of the loop are they going to be there when I jump into the middle?  This
is the sort of code that you have to be a language lawyer to be sure that
it works right -- and you have to assume that the compiler writer was a
good language lawyer too.  I don't like to write code that requires being
a language lawyer to read and verify; life is too short.

The besides of which, the device is not needed.  If you are unrolling loops
for speed the presumption is that you are willing to write duplicate code.
You can always write

  code_section_2
  do {
    code_section_1
    code_section_2
    } while (sometest);

Or in the case in hand

  switch (remnant) {
    case 7:  stuff;
    case 6:  stuff;
    ....
    case 1:  stuff;
    default: break;
    }
  for (;count;count--) {
    stuff;
    ...
    stuff;
    }

It takes more lines of code, but it is a lot clearer, and it doesn't
depend on obscurities of the language.  Moreover it stands a better chance
of executing faster on random compiler X.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

bill@proxftl.UUCP (T. William Wells) (09/07/88)

In article <32941@cca.CCA.COM> g-rh@CCA.CCA.COM (Richard Harter) writes:
: Now all us good children know that when we write a loop with conditionals
: the compiler does some magic for us -- it creates some labels and transfers
: and tests and maybe does some block allocation.  And it is incumbent upon
: us not to confuse the compiler.  If I declare some variables at the start
: of the loop are they going to be there when I jump into the middle?  This
: is the sort of code that you have to be a language lawyer to be sure that
: it works right -- and you have to assume that the compiler writer was a
: good language lawyer too.  I don't like to write code that requires being
: a language lawyer to read and verify; life is too short.

Pardon, but the question of what happens to local variables when
entering a loop from other than the top has never been an obscure
issue in C.  Both K&R and ANSI assert that the locals will *not*
be initialized, and K&R implies and ANSI explicitly states that
the memory for the locals will be allocated.

I also don't know of any compiler that fails to allocate space in
this situation.  Does anyone know of any?

---
Bill
novavax!proxftl!bill

bright@Data-IO.COM (Walter Bright) (09/08/88)

In article <32941@cca.CCA.COM> g-rh@CCA.CCA.COM (Richard Harter) writes:
<Duff's device is another variant on the the loop and a half problem
<where you need to execute a loop body a whole number of times and part
<of the loop body once.
<  goto A;
<  do {
<    code_section_1
<A:       /* Entry point for partial loop construction */
<    code_section_2
<    }
<    while (sometest);

This type of code is called an 'irreducible flow graph' in optimizer texts.
Basically, it's a loop with more than one entry point. It's the bane of
optimizers, as most loop optimization algorithms depend on there being
only one entry point. Typical approaches to deal with this are:
	1. Abandon optimization on that loop.
	2. Abandon optimization on the function that contains that loop.
	3. Rewrite the flow graph to eliminate the problem (the algorithm
	   is called 'node splitting', it's basically putting a copy
	   of code_section_2 before the loop).
Irreducible flow graphs occur so rarely in production code (even in code
that is a snarl of goto's) that implementing (3) is not worthwhile, and
approach (1) or (2) is used.

Some optimizers do not do flow analysis to detect loops. They rely on
recognizing loop constructs such as for(), while() and do..while(). Since
they cannot recognize an irreducible flow graph, they silently abandon
optimization on *any* function with a goto in it.

gwyn@smoke.ARPA (Doug Gwyn ) (09/09/88)

In article <718@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
>I also don't know of any compiler that fails to allocate space in
>this situation.  Does anyone know of any?

All the C compilers I know about allocate the space upon function
entry, not upon block entry.

pearce@TYCHO.YERKES.UCHICAGO.EDU ("Eric C. Pearce") (09/10/88)

After trying Doug Schmidts duff timer on our Sun 3/260, with the sun
supplied cc complier, I found duff's device considerably faster
(almost a factor of two).  However, the test is unfair in the sense
that no attempt is made to optimize the non-duff copy without
resorting to syntax-nightmares like the duff code.  Additionally,
Schmidts test is somewhat unfair since it leaves some loop
initialization (such as the initialization of A and B) out of the
loop.  This will be quite small if the copy is large though.

Personally, I prefer this "conventional" copy fragment:

      A = array1;
      B = array2;

      n = Count / 8;
      for (i =  Count%8; i > 0; i--)
	 *A++ = *B++;
      while (--n >= 0) {
	 *A++ = *B++; 
	 *A++ = *B++; 
	 *A++ = *B++; 
	 *A++ = *B++; 
	 *A++ = *B++; 
	 *A++ = *B++; 
	 *A++ = *B++; 
	 *A++ = *B++; 
      } 

>From a performance standpoint, it falls only 2% behind duff device and
is 102% more readable and not "kludgy".