[comp.sys.m68k] Incidentally ...

chris@mimsy.UUCP (04/04/87)

>	while (wy--)  {
>		j = wx;
>		while (j--)  {

(The above is an inner loop.)  Such loops should usually (always?)
be written as

		while (--j >= 0)

---assuming j is signed.  Why?  They both do the same thing, but
a dumb compiler will turn the former into `move j to tmp; decrement
j; test tmp; branch if zero', while the same dumb compiler will
turn the latter into `decrement j; branch if negative'.  Details
will vary depending on condition codes, but the former is often
four instructions, and the latter two.  On a Vax, the second version
is sometimes a single instruction.

Of course, a smart compiler will generate the same code for both.
That is wonderful---if you have a smart compiler.  Better check!
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

mcvoy@uwvax.UUCP (04/05/87)

(Chris Torek) writes:
>>	while (wy--)  {
>>		j = wx;
>>		while (j--)  {
>
>(The above is an inner loop.)  Such loops should usually (always?)
>be written as
>
>		while (--j >= 0)
>
>---assuming j is signed.  Why?  They both do the same thing, but
> [good justification that (--j >= 0) is faster]


I'm not sure that it worth everyones time to be thinking about this.
And I speak from a certain amount of experience, to wit: last semester
I had to write a fake TCP/IP from scratch and I wanted to make a fast
implementation.  Over the course of the semester I gathered a directory
full of t.{c,s} files where I was looking at exactly this sort of
thing.  I really wanted fast code.  As it turned out, my implementation
did not gain much at all from all this extra work - mainly because it
as just not used.  The old "90% of the time 10% of the code" saw
applies.  I would have been much better off to profile my code and
rewrite the bottlenecks.

Please don't take this the wrong way - Chris is a smart guy, and he's
right in a technical sense.  And I get a warm fuzzy feeling from
writing my code in a efficient manner too.  It's just that I think it's
misleading to to appear worried about this sort of thing in general
sense - it really belongs in the "profiling code" section, not "general
programming tips".  Kernighan and Plauger say "Premature Optimization
is the root of all evil".  I think this is a bit extreme, but I agree
in principle.

Food for thought (?),

--larry
-- 
Larry McVoy 	        mcvoy@rsch.wisc.edu  or  uwvax!mcvoy

"It's a joke, son! I say, I say, a Joke!!"  --Foghorn Leghorn

chris@mimsy.UUCP (04/05/87)

In article <6134@mimsy.UUCP> I wrote:
>[use] while (--j >= 0) [rather than while (j--)]

I poked around today and discovered that Sun's compiler, at least,
will turn

	register short j;	/* but not `register int j' */

	while (--j != -1)
		...

into a `dbra' loop.  If you are willing to put machine dependent
source optimisations into your C code, this might be something to
consider (at least for inner loops).

Anyway, it is a good idea to profile, tune, compile to assembly
code, and sometimes even hand-tweak the results, in speed-critical
routines.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

jon@eps2.UUCP (04/07/87)

In article <6139@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <6134@mimsy.UUCP> I wrote:
> >[use] while (--j >= 0) [rather than while (j--)]
> 
> I poked around today and discovered that Sun's compiler, at least,
> will turn
> 
> 	register short j;	/* but not `register int j' */
> 	while (--j != -1)
> 		...
> 
> into a `dbra' loop.  If you are willing to put machine dependent
> source optimisations into your C code, this might be something to
> consider (at least for inner loops).
 
On this Sun-3/160 running 3.2, the compiler won't generate the dbra.  However,
the object code optimizer will.  But the only way I know of to find this out
is to adb the executable or a .o (I know I am not as clever as you, am I
missing something?  I didn't think you could get an optimized .s).  One nice
thing about the Green Hills compiler that Integraph uses was that you could
look at optimized .s files.  Interestingly enough, this incredibly old Alcyon
C compiler I sometimes use generates the dbras.

Another way to get the dbra instruction from the Sun optimizer is to use:

	register short i;

	for (i = 10; --i != -1;) 	/* the Alcyon compiler will do it too */

And as we all know, the dbra is especially important on the 68010 and 68012
because with the right instruction in the loop, you can get it into loop mode.


Jonathan Hue	DuPont Design Technologies/Via Visuals		leadsv!eps2!jon

david@sun.UUCP (04/08/87)

In article <75@eps2.UUCP> jon@eps2.UUCP (Jonathan Hue) writes:
>On this Sun-3/160 running 3.2, the compiler won't generate the dbra.  However,
>the object code optimizer will.  But the only way I know of to find this out
>is to adb the executable or a .o (I know I am not as clever as you, am I
>missing something?  I didn't think you could get an optimized .s).

cc -O -S foo.c

In article <6139@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <6134@mimsy.UUCP> I wrote:
> >[use] while (--j >= 0) [rather than while (j--)]
> 
> I poked around today and discovered that Sun's compiler, at least,
> will turn
> 
> 	register short j;	/* but not `register int j' */
> 	while (--j != -1)
> 		...
> 
> into a `dbra' loop.  If you are willing to put machine dependent
> source optimisations into your C code, this might be something to
> consider (at least for inner loops).

Be a good citizen and hide it with macros...

#ifdef mc68000
typedef short LOOP_T;
#define LOOP_DECR(var)	(--(var) != -1)
#else
typedef int LOOP_T;
#define LOOP_DECR(var)	(--(var) >= 0)
#endif

	register LOOP_T j;

	while (LOOP_DECR(j))
		something;

which leads us to the mystic loop macro ...

#define	LOOP(count, op)	do { 
		register LOOP_T _loop = (count); 

		if (--_loop >= 0) 
			do { op; } while (LOOP_DECR(_loop));
	} while (0)

(end of line backslashes omitted for clarity)
-- 
David DiGiacomo, Sun Microsystems, Mt. View, CA  sun!david david@sun.com
Disclaimer: blah blah blah

mark@markshome (mark weiser) (04/09/87)

In article <75@eps2.UUCP> jon@eps2.UUCP (Jonathan Hue) writes:
>...On this Sun-3/160 running 3.2, the compiler won't generate the dbra.  However,
>the object code optimizer will.  But the only way I know of to find this out
>is to adb the executable or a .o (I know I am not as clever as you, am I
>missing something?  I didn't think you could get an optimized .s).  

On this Sun-3/75 running SunOS 3.2, using -O and -S together shows me
the optimized assembly in the .s file.
-mark
Spoken: Mark Weiser 	ARPA:	mark@mimsy.umd.edu	Phone: +1-301-454-7817
After May 1, 1987: weiser@xerox.com