[comp.sys.amiga.tech] Dividing by a power of two

bader+@andrew.cmu.edu (Miles Bader) (11/27/88)

rodd@dasys1.UUCP (Rod Dorman) writes:
> As many compiler writers have found in the past and I'm sure many will
> rediscover in the future, this only works for positive values.  Try
> shifting -1 and you'll get -1 as a result, *not* zero as one would expect.

I think the problem is that shifting always rounds down, as opposed
to towards zero, like we expect.  So even in cases where you have a
signed, unkown value to divide, it would probably still be more
effecient to inline code.

So, when dividing a signed value, x, by a power of two, y:

if(x<0 && (x&(y-1))!=0)
    x|=y; /* pre-round negative numbers */
x>>=log2(y);

Which isn't quite as bad as it looks, since y is a known constant
power of two.

-Miles

toebes@sas.UUCP (John Toebes) (11/28/88)

In article <YXXhTUy00UkaQ85bl7@andrew.cmu.edu> bader+@andrew.cmu.edu (Miles Bader) writes:
> rodd@dasys1.UUCP (Rod Dorman) writes:
> > As many compiler writers have found in the past and I'm sure many will
> > rediscover in the future, this only works for positive values.  Try
> > shifting -1 and you'll get -1 as a result, *not* zero as one would expect.
> 
> I think the problem is that shifting always rounds down, as opposed
> to towards zero, like we expect.  So even in cases where you have a
> signed, unkown value to divide, it would probably still be more
> effecient to inline code.
> So, when dividing a signed value, x, by a power of two, y:
> if(x<0 && (x&(y-1))!=0)
>     x|=y; /* pre-round negative numbers */
> x>>=log2(y);
> Which isn't quite as bad as it looks, since y is a known constant power of two.
> -Miles
Actually it is much simpler than that.  The Lattice 5.0 compiler generates the
3 instructions (sometimes 4) inline to perform the division by 2,4, and 8.
These are the most frequent numbers to occur from analysis of many programs.
Specifically:
      x/= 2
      move x,d0   ;unless it was already there
      bpl  lab    ;skip if positive
      addq #1,d0  ;adjust for rounding
lab:  asr.l #1,d0 ; and do the divide

There are lots of other sequences of multiply/divide by constants (take 5 for
example) that can be optimized.
/*---------------------All standard Disclaimers apply---------------------*/
/*----Working for but not officially representing SAS or Lattice Inc.-----*/
/*----John A. Toebes, VIII             usenet:...!mcnc!rti!sas!toebes-----*/
/*------------------------------------------------------------------------*/

dillon@POSTGRES.BERKELEY.EDU (Matt Dillon) (11/29/88)

:> > As many compiler writers have found in the past and I'm sure many will
:> > rediscover in the future, this only works for positive values.  Try
:> > shifting -1 and you'll get -1 as a result, *not* zero as one would expect.
:> I think the problem is that shifting always rounds down, as opposed
:> to towards zero, like we expect.  So even in cases where you have a
:Actually it is much simpler than that.  The Lattice 5.0 compiler generates the
:3 instructions (sometimes 4) inline to perform the division by 2,4, and 8.
:These are the most frequent numbers to occur from analysis of many programs.
:Specifically:
:      x/= 2
:      move x,d0   ;unless it was already there
:      bpl  lab    ;skip if positive
:      addq #1,d0  ;adjust for rounding
:lab:  asr.l #1,d0 ; and do the divide

	This is also a very good example of what is known as "programmer's
experience".  No *good* programmer would do constant division by powers of 2
using '/', but would use a shift instead.  The programmer, after all, knows
exactly what range a particular variable will fall in whether it is 
unsigned or not, and thus can make optimizations and assumptions the
compiler could never match.

	Take a good look at the example above ...  how often do you
divide negative numbers by powers of 2?  Now, take that probability
and ask how often the variable would be unbounded (can take arbitrary values)
verses in a known range that can be optimized by some other method.
Finally, take into account whether, in the case it IS negative, rounding is 
desirable.

	Now you say "But I don't want to worry about it" and "I don't want
to have to worry about optimizing it".  Bullshit.  Either the problem is
sooo important that nobody in their right mind would risk a division anyway,	
or so unimportant that it doesn't matter if it is optimized or not.  I wish
the compiler writers would take more time to optimize the more frequent
cases that occur in tight code rather than these totally useless cases.
For example, Lattice does no register tracking and Aztec doesn't do enough.

	You might say "It is the compiler's job to optimize!", to which I would
respond: "Which do you think the compiler can optimize better, a shift, or
a divide?".  Got it?  No matter how good compilers get, certain constructs
are more apt to optimization than others due to fewer assumptions having to
be made about them, so the 'good' programmer *always* optimizes his code.
little goodies might be a boon to the 'ok' programmer, but even with them
his program is no where near the theoretical efficiency he could obtain, and
what he gains by those (for us) useless cases doesn't help as much as he
would like.

						-Matt

scott@applix.UUCP (Scott Evernden) (11/30/88)

In article <8811281919.AA17977@postgres.Berkeley.EDU> dillon@POSTGRES.BERKELEY.EDU (Matt Dillon) writes:

	( re:  "/ 2" vs ">> 1" )

>	This is also a very good example of what is known as "programmer's
>experience".  No *good* programmer would do constant division by powers of 2
>using '/', but would use a shift instead.  The programmer, after all, knows
>exactly what range a particular variable will fall in whether it is 
>unsigned or not, and thus can make optimizations and assumptions the
>compiler could never match.

I used to believe this.  At one time, my code would to be riddled with <<'s
and >>'s for powers-of-2 arithmetic.  I still fight the temptation to do
this and other confusing tricks.  I now believe that it is the compiler's
duty to recognize these cases (at least) and do the optimization for me.
Mostly because "foo = bar / 16" is vastly more readable, and the idea is more
immediate than "foo = bar >> 4".  I maintain that someone else reading this
code will become more confused than the simple division.

Do you multiply by 12 like this? 

	foo = (bar << 3) + (bar << 2);

I hope not.  Yet, J. Toebes and Lattice will figure out a "* 12" for you.
Another problem with excessive use of << and >> is that they bind so loosely
that you've got to be real careful with parens, which tends to further
confuse symantics.

>	You might say "It is the compiler's job to optimize!", to which I would
>respond: "Which do you think the compiler can optimize better, a shift, or
>a divide?".  Got it?  No matter how good compilers get, certain constructs
>are more apt to optimization than others due to fewer assumptions having to
>be made about them, so the 'good' programmer *always* optimizes his code.

My feeling is that the compiler should optimize whatever it can, allowing
me to select degrees of optimization, etc- like most modern compilers do.
After trying to understand what some of my oldest C code was doing (full of
tricks, and even with comments) I'm now convinced that
readability/understandability is vastly more important than "good"
programmer tricks which attempt to out-think the compiler.

-scott

dillon@POSTGRES.BERKELEY.EDU (Matt Dillon) (11/30/88)

:>	You might say "It is the compiler's job to optimize!", to which I would
:>respond: "Which do you think the compiler can optimize better, a shift, or
:>a divide?".  Got it?  No matter how good compilers get, certain constructs
:>are more apt to optimization than others due to fewer assumptions having to
:>be made about them, so the 'good' programmer *always* optimizes his code.
:
:My feeling is that the compiler should optimize whatever it can, allowing
:me to select degrees of optimization, etc- like most modern compilers do.
:After trying to understand what some of my oldest C code was doing (full of
:tricks, and even with comments) I'm now convinced that
:readability/understandability is vastly more important than "good"
:programmer tricks which attempt to out-think the compiler.
:
:-scott

	Yes, readability/understandability is vastly more important than
"good" programmer tricks, though they hardly attempt to out-think the
compiler (one just make it easier for the compiler to generate better code).

	However, you are infering that there is a major relationship
between how good the code looks and how many tricks one sticks in it.  
There is, in fact, a relationship ... about 1%, probably less.  The
absolute greatest effect on readability/understandability is, of course,
one's coding form which is easily 99% of the whole deal.  (form = indentation,
structural form, modularity, modular independance, naming, and comments).
The little tricks one sticks in his code probably accounts for less than
one tenth of one percent of the source and can be ignored.

	So I'd say that you are exaggerating a bit in your message.  When
I want to multiply by 12 I use * UNLESS I am in a very critical portion of
my code, in which case I wouldn't multiply OR shift, but would design the
section so a multiplication is not required.  In the very rare cases where
I can't redesign the section, I will indeed use two shifts and an add.  Now
how often do you think that situation is going to come up?  Even in large
programs the critical areas are usually quite tiny and few in number.

	I suggest that the trouble you have understanding your very very
old C code (I have the same problem to a degree w/ my very very old code) is
due to the fact that you were not as experienced with form as you are now
and has nothing to do with tricks.  Well, maybe something ... a while back
I was helping a kid with some assembly language he was doing.  He wanted to
convert an ascii digit to a binary value and this is what he did (in pseudo
C):
	switch(c) {
	case '0':
	    return(0);
	case '1':
	    return(1);
	case '2':
	    return(2);
	case '3':
	    return(3);
	case '4':
	    return(4);
	case '5':
	    return(5);
	case '6':
	    return(6);
	case '7':
	    return(7);
	case '8':
	    return(8);
	}
	return(9);

	It took me 20 minutes to explain to him that he could replace
30 lines of assembly with 2.

					-Matt

cg@myrias.UUCP (Chris Gray) (11/30/88)

In article <8811281919.AA17977@postgres.Berkeley.EDU> dillon@POSTGRES.BERKELEY.EDU (Matt Dillon) writes:

An inclusion by John Toebes about Lattice 5.0:

>:Actually it is much simpler than that.  The Lattice 5.0 compiler generates the
>:3 instructions (sometimes 4) inline to perform the division by 2,4, and 8.
>:These are the most frequent numbers to occur from analysis of many programs.
>:Specifically:
>:      x/= 2
>:      move x,d0   ;unless it was already there
>:      bpl  lab    ;skip if positive
>:      addq #1,d0  ;adjust for rounding
>:lab:  asr.l #1,d0 ; and do the divide
>

Lots of discussion by Matt, basically seeming to expression the opinion that
a compiler shouldn't bother doing this kind of optimization because a good
programmer will do it himself. Also that compilers would get more benefit
from other optimizations, such as register tracking.

I'll give another compiler-writer's point of view. The optimization discussed
above (which is done in Draco 1.2, for all powers of 2) is an easy, and
fairly error-free, optimization to do. It is also one that will help out
those programmers who don't bother to hand optimize. It is worth noting
that the shift may in fact be slower on some machines (image one with a
32 bit array (or whatever they're called) multiplier, and a shift instruction
that always requires the shift amount to be loaded into a register). This is
precisely the kind of optimization that a compiler SHOULD make.

The other kinds of optimizations (register tracking, register colouring,
common subexpression elimination, strength reduction, etc. etc.) are often
much harder to do. They require lots more code, lots more memory, lots more
run-time and end up with lots more bugs. They are HARD to do. Draco 1.2 gets
it main code improvements over Draco 1.1 through the following:

    - peephole optimization
    - register tracking
    - user register variables

I am STILL finding bugs in the peepholer (the one last night was one which
deleted a MOVE to a register variable because the next instruction could use
the value directly).

So, in short, compiler writers do the simple optimizations because they are
simple, and because most people (I guess Matt isn't 'most people' :-) )
expect them to be done.

P.S. Draco 1.2 should be going into Beta test soon. If there are any users
out there who can commit to a serious Beta test, please let me know. It got
delayed when the company I work for sent me to California for a month.

-- 
Chris Gray		Myrias Research, Edmonton	+1 403 428 1616
	{uunet!mnetor,ubc-vision,watmath,vax135}!alberta!myrias!cg

karl@sugar.uu.net (Karl Lehenbauer) (11/30/88)

In article <8811281919.AA17977@postgres.Berkeley.EDU>, dillon@POSTGRES.BERKELEY.EDU (Matt Dillon) writes:
> 	You might say "It is the compiler's job to optimize!", to which I would
> respond: "Which do you think the compiler can optimize better, a shift, or
> a divide?".  Got it?  No matter how good compilers get, certain constructs
> are more apt to optimization than others due to fewer assumptions having to
> be made about them, so the 'good' programmer *always* optimizes his code.

The thing is, though, that divide makes the algorithm work for all the cases
where it's not power of two, so if I write a general solution for "n" bits
I have a general algorithm, and if I set "n" to some power of two, I have
helped the compiler to optimize because I know it knows how to optimize
divides by powers of two -- if it did, which it doesn't.  I'm using unsigned 
ints, by the way.
-- 
-- "We've been following your progress with considerable interest, not to say
-- contempt."  -- Zaphod Beeblebrox IV
-- uunet!sugar!karl, Unix BBS (713) 438-5018

dan-hankins@cup.portal.com (Daniel B Hankins) (12/03/88)

     In article <8811292001.AA02370@postgres.Berkeley.EDU>,
dillon@POSTGRES.BERKELEY.EDU (Matt Dillon) makes some more comments about
the value of tricks such as shifting to divide by powers of two.

     I'm afraid I'm going to have to disagree with you on this one, Matt. 
Very recently, I spent a miserable week trying to find a bug caused by
exactly this kind of programming.

     I was assigned the task of updating a very large (18000+ lines) piece
of PL/I code (not actually PL/I, but similar enough so that you get the
idea).  One of the data structures in the program was a large array of
buffers.  The previous programmer had not bothered to declare a constant,
but rather had assumed that the buffer size would never change.  Of course,
on of my tasks was to enhance the program to handle data larger than the
current buffer size.

     So I went merrily through the program files, searching for every
instance of the number '150', and all the derivatives thereof I could
imagine.  One was tricky to find - to format the buffer for output he
divided it up into five segments.  Each segment had its own line of code in
the program with the beginning and ending addresses hard-coded (such as
BUFFER(31:50).  Well, I thought I got them all.

     Then one day I got a call from an irate user complaining that the
output data was completely out of whack.  Then started the hunt for a week.
At the end of the week, I found the problem, almost by accident.

     The previous programmer had padded each buffer with flags and other
data to make the addresses of the buffers separate by 256.  Sometimes he
needed the index of a buffer he had the address of.  So what he would do is
to subtract the address of BUFFER(0) from the address of the buffer in
question.  He then stored this difference in a two-byte field that had
two one-byte subfields.  He would then read back the high-order byte, thus
implicitly dividing by 256 (much like the bit-shift technique).

     When I rewrote the code, the buffer size itself increased to 256
bytes.  Each buffer plus the flags was somewhat more than this.  Therefore
the addresses of the buffers were separated by more than 256 bytes, and
each time his code attempted to compute the index of a buffer, it would get
values that were too high for any buffer above BUFFER(5).

     The moral of the story is:  Unless no one else will ever have to read
or maintain your program, DON'T DO THIS!  *Parameterize*.  Let the compiler
optimize it if the constant is a power of two.

     The only case I can think of where this might be okay is if one wishes
to divide an integer by a power of two (an equation like y = x / (2 ** z)).
Even then I would hesitate.


Dan Hankins