[comp.lang.c] Array indexing vs. pointers...

U23405@UICVM (Michael J. Steiner) (09/11/88)

First, I have a question. How and why is array indexing slower than
pointer arithmetic? They are very similar. Also, I think that compilers
should automatically translate array indexing into pointer arithmetic when
dealing with arrays in C. Any comments, opinions?

                                                 Michael Steiner
                                                 Email: U23405@UICVM.BITNET

rwberry@hubcap.UUCP (Robert W Berry) (09/20/88)

In article <8809191521.AA17824@ucbvax.Berkeley.EDU>, U23405@UICVM (Michael J. Steiner) writes:
> First, I have a question. How and why is array indexing slower than
> pointer arithmetic? They are very similar. Also, I think that compilers
> should automatically translate array indexing into pointer arithmetic when
> dealing with arrays in C. Any comments, opinions?
> 
>                                                  Michael Steiner
>                                                  Email: U23405@UICVM.BITNET
	Indexing requires that a constant (also called an offset) be added
to the beginning of the array for each and every reference.  This overhead
is the number one reason that array indexing is slower than pointer
manipulation.  (Today's C compilers often view the beginning of the array
as a constant and add the offset from a register which is ideal for 
sequential memory accesses.)
	Pointers, on the other hand, make use of today's larger word sizes
to directly address the object in question.  More efficient, and much 
faster.
	With today's optimizing compilers this is often a moot point.
Many compilers end up implementing both array indexing and pointer
manipulation using indirect addressing (adding the value stored in one
register to the value stored in another register to calculate and effective
address.)  Especially in a segmented memory architecture.  The end result
is that both techniques end up running in about the same time.  
	Be wary however, because there are also a lot of compilers on the
market that use different techniques for array indexing and pointer
manipulation.  So when in doubt, use pointers.  They're _almost_ always
faster (and closer to the way God intended people to program ;-)!)
	One good way to get a feel for this type of situation is to have
your C compiler compile some source stubs into assembler (OH NO, ANYTHING
BUT ASSEMBLER) and watch the way things REALLY happen with your C compiler.
Even if you're not an assembler expert, some things become pretty obvious
when you see them on a machine level.
	Happy Programming,
	Bob

rwberry@hubcap.UUCP (Robert W Berry) (09/20/88)

The doggone news-poster didn't tack on my address, and it also messed
up my node name.  I'm not an IDIOT, the news poster is!!

	Bob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-=- Bob Berry -=- PC-Guru's Inc.     ! Compuserve:72646,3331 or 73170,1242  -=-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BITNET:rwberry@clemson               -=-
-=- This space for rent or lease     ! INTERNET:rwberry@hubcap.clemson.edu  -=-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

g-rh@xait.CCA.COM (Richard Harter) (09/20/88)

In article <8809191521.AA17824@ucbvax.Berkeley.EDU> U23405@UICVM (Michael J. Steiner) writes:
>First, I have a question. How and why is array indexing slower than
>pointer arithmetic? They are very similar. Also, I think that compilers
>should automatically translate array indexing into pointer arithmetic when
>dealing with arrays in C. Any comments, opinions?

	Depends on the compiler, the machine, and the circumstances.  The
gain comes when you increment a pointer into the array rather than reference
the indexed array.  For example

	for (i=0;i<n;i++) {
	  a batch of code referring to a[i]
	  }
versus
	tmp = a;
	for (i=0;i<n;,i++,tmp++) {
	  same code referring to *tmp
	  }

In the first instance a[i] is equivalent to *(a+i) and most compilers will
treat them much the same.  The add has to be done each time a[i] is referred
to.  In the second case there is no add but there is an increment once through.
A smart compiler will take the first instance and generate something like

	tmp = a;
	for (i=0;i<n;i++) {
	  tmp1 = *tmp++;
	  same code referring to tmp1
	  }

If execution efficiency is your concern and your environment includes simple
compilers there is profit to be got by doing your own optimization as above.
The down side is that your hand optimization may interfere with the clever
tricks of an optimizing compiler and you lose ground by doing hand
optimization.  There ain't no justice.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

swilson%thetone@Sun.COM (Scott Wilson) (09/20/88)

In article <8809191521.AA17824@ucbvax.Berkeley.EDU> U23405@UICVM (Michael J. Steiner) writes:
>First, I have a question. How and why is array indexing slower than
>pointer arithmetic? They are very similar. Also, I think that compilers
>should automatically translate array indexing into pointer arithmetic when
>dealing with arrays in C. Any comments, opinions?

In K&R pg. 186 it says: "E1[E2] is identical (by definition) to
*((E1)+(E2))."  So array indexing is exactly the same as pointer
arithmetic.  What you might be hearing is about the efficiency of using
array indexing in loops versus using pointers.  For example:

	int i, sum = 0, array[100];

	for (i = 0; i < 100; i++)
		sum += array[i];

is less efficient than:

	int i, sum = 0, array[100], *ip = array;

	for (i = 0; i < 100; i++, ip++)
		sum += *ip;

In the first case, using array[i] results in a shift and an add
to compute the address of the array element.  (If the array element
size was not a power of two then a multiplication would be needed.)
In the second we are just bumping a pointer up each time through
the loop.  There is more that can be done to the second example
as well (why increment two vars side-by-side).

--
Scott Wilson		arpa: swilson@sun.com
Sun Microsystems	uucp: ...!sun!swilson
Mt. View, CA

jlg@lanl.gov (Jim Giles) (09/20/88)

From article <33488@xait.CCA.COM>, by g-rh@xait.CCA.COM (Richard Harter):
> A smart compiler will take the first instance and generate something like
> 
> 	tmp = a;
> 	for (i=0;i<n;i++) {
> 	  tmp1 = *tmp++;
> 	  same code referring to tmp1
> 	  }

A smarter compiler will generate code more like this:

        ttt=a+n
        for (tmp=a; tmp<ttt; tmp++) {
          same code referring to *tmp
          }

Most Fortran Compilers already make optimizations like this.

J. Giles
Los Alamos

scs@athena.mit.edu (Steve Summit) (09/20/88)

WARNING: Heretical C advice follows.  Efficiency hackers please
read no further.

In article <8809191521.AA17824@ucbvax.Berkeley.EDU> Michael J. Steiner writes:
>How and why is array indexing slower than pointer arithmetic?
>I think that compilers should automatically translate array indexing
>into pointer arithmetic when dealing with arrays in C.

Array indexing may be slower, but it may also be easier (for
human readers of the program) to understand.  If there is the
slightest chance that using array indexing instead of pointers
will make your code clearer, I'd like to encourage you to stick
with the indices, unless your program ends up running too slowly,
and profiling shows the array indices to be at fault.  Chances
are it won't run noticeably slower, in part because there are
good compilers which, in the spirit of your suggestion,
effectively use a pointer reference in the generated code.

This is not to say that array indexing is always clearer than
pointers--often, a well-conceived pointer algorithm is the right
one to use for both clarity and efficiency.  It's a judgement
call.  But some code is just ridiculously clearer to the
uninitiated if it uses indices rather than pointers, and at no
cost in overall efficiency.  A perfect example is argv parsing;
compare

	while(--argc)
		if(**++argv == '-')
			processoption(*argv);
		else	processfile(*argv);

with

	for(i = 1; i < argc; i++)
		if(argv[i][0] == '-')
			processoption(argv[i]);
		else	processfile(argv[i]);

The second example may use an extra variable, but it's a lot more
obvious that argv[0] is skipped (as it should be) and it has the
additional benefit of leaving the initial values of argc and argv
undisturbed.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

cik@l.cc.purdue.edu (Herman Rubin) (09/20/88)

In article <68995@sun.uucp>, swilson%thetone@Sun.COM (Scott Wilson) writes:
> In article <8809191521.AA17824@ucbvax.Berkeley.EDU> U23405@UICVM (Michael J. Steiner) writes:
> >First, I have a question. How and why is array indexing slower than
> >pointer arithmetic? They are very similar. Also, I think that compilers
> >should automatically translate array indexing into pointer arithmetic when
> >dealing with arrays in C. Any comments, opinions?

This has appeared before, but maybe it bears repeating.  

It depends on the architecture and on where the pointers are located
(registers vs. memory).  Consider copying an array.  If the pointers are
in registers and reading and writing *(j++) is hardware, pointers will beat
indexing.  This is the case on the VAX, but not on the PYRAMID.

If, on the other hand, the array bases and the index are in registers, and
register indexing is _hardware_, array indexing will beat pointers if the 
previous situation does not hold.  Reverse the machines above.

The answer depends on the hardware, the storage classes, the timing, and
for machines which have instruction overlap, on the overlaps available.
An attempt to force the issue is WRONG.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

harish@ece-csc.UUCP (Harish Hiriyannaiah) (09/21/88)

In article <68995@sun.uucp> swilson@sun.UUCP (Scott Wilson) writes:
->dealing with arrays in C. Any comments, opinions?
-
-In K&R pg. 186 it says: "E1[E2] is identical (by definition) to
-*((E1)+(E2))."  So array indexing is exactly the same as pointer
-arithmetic.  What you might be hearing is about the efficiency of using
-array indexing in loops versus using pointers.  For example:
-
-	int i, sum = 0, array[100];
-
-	for (i = 0; i < 100; i++)
-		sum += array[i];
-
-is less efficient than:
-
-	int i, sum = 0, array[100], *ip = array;
-
-	for (i = 0; i < 100; i++, ip++)
-		sum += *ip;
-
-In the first case, using array[i] results in a shift and an add
-to compute the address of the array element.  (If the array element
-size was not a power of two then a multiplication would be needed.)
-In the second we are just bumping a pointer up each time through
-the loop.  There is more that can be done to the second example
-as well (why increment two vars side-by-side).

All this is really fine if

1. The processor has autoincrement addressing (any decent processor should
   have it. )

2. The compiler generates the code for the same. 

   #2 does not always happen. So, it is best to see the assembler output
before deciding on the indexing strategy. Also, in some processors,indexing
will mean concatenation of the index to the higher order bits of the
address. In this case, indexing might turn out to be faster than indirect
addressing.

-- 
harish pu. hi.     harish@ece-csc.ncsu.edu
                   {decvax,possibly other backbone sites}!mcnc!ece-csc!harish

I am not, therefore I think not ?

smryan@garth.UUCP (Steven Ryan) (09/21/88)

>	for (i=0;i<n;i++) {
>	  a batch of code referring to a[i]
>	  }
>versus
>	tmp = a;
>	for (i=0;i<n;,i++,tmp++) {
>	  same code referring to *tmp
>	  }

Just to muddy the waters. Most cpus allow address expressions like
constant_offset+base_register, where base registers are a scarce resource.

In the first case, the loop refers to a[i], b[i], ..., the compiler can
sometimes generate better code by only putting i in a register and leaving
a_base_address, b_base_address, ... as constant offsets. Making a separate
pointer for each array can scrunch the register file.

Just a reminder that optimisation is not a trivial process.

braner@batcomputer.tn.cornell.edu (Moshe Braner) (09/24/88)

[]

I was surprised by the results of some experiments I ran on a Gould box
and on the Inmos Transputer.  In both cases the speed of bumping pointers
turned out to be similar to, or even slower than, using array indexing.
This is not too surprising for a[i], but even holds for a[i][j], where
the indexing involves a _multiplication_!!!  (I was comparing indexing
to with double-indirection, i.e. p[i][j] where p is a pointer to an array
of pointers to rows of the matrix, and also to with *p++ where p is a
pointer to an element inside a row.)  It turns out that while on
some machines (e.g., 68000) multiplication is an order of magnitude
slower than addition or shifts, on some other machines (e.g., the
transputer) multiplication is just as fast as an addition!
(The transputer is _really_ wierd: 32*32=64 bits is (slightly)
_faster_ than 32*32=32 bits...  -- they must have the hardware
calculate 64 bits in either case, and very efficiently too!)

- Moshe Braner

Cornell Theory Center, 265 Olin Hall,
Cornell University, Ithaca, NY 14853
(607) 255-9401	(Work)
<braner@tcgould.tn.cornell.edu>		(INTERNET)
<braner@crnlthry> or <braner@crnlcam>	(BITNET)

news@amdcad.AMD.COM (Network News) (09/24/88)

In article <6396@batcomputer.tn.cornell.edu> braner@tcgould.tn.cornell.edu (Moshe Braner) writes:
| []
| 
| I was surprised by the results of some experiments I ran on a Gould box
| and on the Inmos Transputer.  In both cases the speed of bumping pointers
| turned out to be similar to, or even slower than, using array indexing.
| This is not too surprising for a[i], but even holds for a[i][j], where
| the indexing involves a _multiplication_!!!  (I was comparing indexing
| to with double-indirection, i.e. p[i][j] where p is a pointer to an array
| of pointers to rows of the matrix, and also to with *p++ where p is a
| pointer to an element inside a row.)  It turns out that while on
| some machines (e.g., 68000) multiplication is an order of magnitude
| slower than addition or shifts, on some other machines (e.g., the
| transputer) multiplication is just as fast as an addition!

I don't think so.  Rather, it is probable that the "multiplication" is
optimized via strength reduction into a series of shifts and adds.  If
this series is small, it may be faster than a load (indirection), which
would be the reason why array indexing may be faster than two-level
indirection in this case. 
	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)

mec@ardent.UUCP (Michael Chastain) (09/24/88)

Good grief.  Say NO to low-level performance tweaking.

I prefer to use the most natural data representation I can think of.
If something's an array, I don't dink around inside it with pointers.

More precisely, I restrict my pointer operations to:

	-- assignment/parameter passing/return value
	-- dereferencing
	-- address-of: p2 = &p1[5];

As another poster mentioned:

	/* Array */
	for ( iArg = 1; iArg < argc; iArg++ )
		ProcessArg( argv[iArg] );

	/* Pointer */
	while ( --argc )
		ProcessArg( *++argv );

I find the first version more writeable.  More debuggable.  More
readable.  And more likely to be correct *the first time*.

If you had to debug this code with a typical sdb-like debugger, which
would you rather look at?   "iArg" or "argv"?  I find wild array
references easier to deal with than wild pointer references:  less
likely to occur, easier to debug/find when they do.  I find all of
these properties more useful than the small and doubtful performance
gain that I can get by *increasing the entropy* of my code and data.

Suppose you told your boss: "my next project will be:
	-- 20% sooner
	-- 20% cleaner
	-- 30% less bugs
	--  5% larger
	--  5% slower (in non-real-time-critical areas)"
What do you think he or she would say?

#if	FLAME
The last word for you efficiency geeks:

#if	BIG_COMPUTER
My array code vectorizes more easily than your pointers-that-might-be-
aliased code.  So while you're saving a cycle here, a cycle there, I'm
getting 400% performance improvements just by recompiling.
#endif

#if	INTEL_8086
While your object code is thrashing via ES:BX, my object code is loading
ES:BX with a fixed value and loading array indexes into SI and DI.  And
if you point a normal 32-bit pointer into global data or a stack, while
I'm using an array index into a (sufficiently well-known) structure, my
code is likely to get 16-bit address computations off of DS or SS:BP.
Your code will thrash through ES:BX.
#endif

#endif	FLAME

Michael Chastain
mec@ardent.com
{hplabs,uunet}!ardent!mec
"He who dies with the most FRIENDS wins."

psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/27/88)

In article <8809191521.AA17824@ucbvax.Berkeley.EDU> U23405@UICVM
(Michael J. Steiner) writes:
>First, I have a question. How and why is array indexing slower than
>pointer arithmetic? They are very similar. Also, I think that compilers
>should automatically translate array indexing into pointer arithmetic when
>dealing with arrays in C. Any comments, opinions?

I waited a long time to respond to this one, because I expected
everyone to post the obvious answer.  Yes, a[i] is equivalent to *(a+i),
*in* *C*; but what about in machine language?  Y'know, the stuff the
CPU really eats?  "The definition of 'adding 1 to a pointer,', and by
extension, all pointer arithmetic, is that the increment is scaled by
the size in storage of the object is pointed to."  [K&R 1st ed, p. 94]
This implies multiplying the pointer offset or array index by the size.
Granted, the size is known, is one for char's, and is often a power of
two; but what about structures that are seventeen bytes long?

So, let's consider the (*un*optimized) translation of the following
loops into a pseudo-C assembler language, where pointers arithmetic is
scaled by the smallest addressable cell (probably a byte):

C:
	for ( i = 0; i < N; ++i )
		a[ i ] = b[ i ];	/* a and b are of type t */
		/* or equivalently, *(a+i) = *(b+i)
machine language:
	i = 0
	begin_for:
	if ( ! ( i < N ) ) goto end_for
	tmp = i * sizeof t
	*( a + tmp ) = *( b + tmp )	; probably a single instruction
	i += 1
	goto begin_for
	end_for:
TOTAL:  N+1 integer assignments, N increments, N copies; N multiples!

C:
	for ( p = a, q = b, i = 0; i < N; ++i )
		*p++ = *q++;
machine language:
	p = a
	q = b
	i = 0
	begin_for:
	if ( ! ( i < N ) ) goto end_for
	*p = *q		; on a PDP-11, these next three lines
	p += sizeof t	; can be replaced by a single instruction
	q += sizeof t	; ditto VAX, 680x0, etc.
	i += 1
	goto begin_for
	end_for:
TOTAL:  3 integer assignments, 3*N increments, N copies; 0 multiplies!

We've traded N multiplies and N assignments for 2*N increments, which
can easily (on a PDP-11 or similar CPU) be snuck into auto-increment
modes of the copy instruction.  Yes, an optimizer would reduce the
first to the second, or better.  But one of Ritchie's goals was to
make it easy for the compiler generate good low-level code from clever
high-level code.

Moral:  A really good C programmer should know assembler language and
compiler technology to write ordinary applications efficiently.

Homework:  Discover what kind of code your compiler generates for
*p++ = *q++ when pointing to a char, int, or odd length structure.  If
p and q are auto, not register, is *p++ = *q++ the same as:
	*(++p - 1) = *(++q - 1)		/* C, not assembler */
(that is to say, very ugly code), or possibly equivalent to:
	tmp = *p = *q, ++p, ++q, tmp

Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com)
AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.

marc@metavax.UUCP (Marc Paige) (09/28/88)

I just finished reading someones rendition of C to MACHINE language! UGHN!!

MACHINE language looks something like this (for a 32 bit word):

10001110111110011011011000001111
10010010111011011000110110011101
10100011001111011011100001100111
.
.
.
   
   Assembler may look something like what this person wrote but only with
   heavy macro use!!  Please get terminology straight.

----------------------------------------------------------------------------
"tired and shagged out from a prolonged squawk" - mpfc the parrot sketch

swilson%thetone@Sun.COM (Scott Wilson) (09/29/88)

In article <3628@metavax.UUCP> marc@metavax.UUCP (Marc Paige) writes:
>I just finished reading someones rendition of C to MACHINE language! UGHN!!
>
>MACHINE language looks something like this (for a 32 bit word):
>
>10001110111110011011011000001111
>10010010111011011000110110011101
>10100011001111011011100001100111

All right, I'll be a weenie.  Unless you can see electrons (or holes or
whatever the hell is going on down there), machine language doesn't
look like anything.  At best it's a bit pattern or a collection of
electrical characteristics, hardly a language.  Your description of
it as 1010... is just an abstraction.  But I know what you were trying
to say so I'll go back to talking to myself now...

--
Scott Wilson		arpa: swilson@sun.com
Sun Microsystems	uucp: ...!sun!swilson
Mt. View, CA

bill@proxftl.UUCP (T. William Wells) (09/29/88)

In article <70779@sun.uucp> swilson@sun.UUCP (Scott Wilson) writes:
: All right, I'll be a weenie.  Unless you can see electrons (or holes or
: whatever the hell is going on down there), machine language doesn't
: look like anything.  At best it's a bit pattern or a collection of
: electrical characteristics, hardly a language.  Your description of
: it as 1010... is just an abstraction.  But I know what you were trying
: to say so I'll go back to talking to myself now...

All written language is "just an abstraction".

In other words, "machine language" is an abstraction of the
patterns of electrons or what have you.

But I know that you know that I know that he knows what
you were trying to say... :-)

---
Bill

You can still reach me at proxftl!bill
But I'd rather you send to proxftl!twwells!bill

bill@proxftl.UUCP (T. William Wells) (09/30/88)

In article <607@ardent.UUCP> mec@ardent.UUCP (Michael Chastain) writes:
: Good grief.  Say NO to low-level performance tweaking.

Good grief.  Say NO to understanding the language well enough
that "low-level performance tweaking" is natural to the
programmer.  Grrr.  Note that the quotes here are because I
disagree that the programming practices being condemned
constitute "low-level performance tweaking".

: I prefer to use the most natural data representation I can think of.
: If something's an array, I don't dink around inside it with pointers.

This completely irrelevant: one man's "natural" is another's
perversion.  The claim that such is "natural" is only a statement
of a personal preference.  My own preference is to think in terms
of pointers exclusively; and I tend not to use array indexing.
Note that I do not advocate using pointers for this reason, I do
so for a completely different one, which I will state shortly.

: If you had to debug this code with a typical sdb-like debugger, which
: would you rather look at?   "iArg" or "argv"?  I find wild array
: references easier to deal with than wild pointer references:  less
: likely to occur, easier to debug/find when they do.  I find all of
: these properties more useful than the small and doubtful performance

But *I* don't.  Personal preferences aren't natural law.

: gain that I can get by *increasing the entropy* of my code and data.

This asserts that the use of pointers increases the disorder
(that is what entropy is about, after all) in one's code and
data; this is entirely a subjective judgement.

---

At one time, I spent a lot of my spare time improving other
people's code.  One of the things I discovered is that almost all
the code I was dealing with had better than 25% execution time
fat.  What I mean is that when I merely changed all of the code
to follow my coding practices (which are not anything special,
just obvious things that every coder ought to know), it got that
much better; I am not talking about mere tweaks, nor am I talking
about optimizing particular sections of code.  I am talking about
reading each line of code and making obvious changes (things like
using temporaries when needed, adding appropriate register
declarations, eliminating obviously unnecessary function calls,
etc.)

Supposing that the 90-10 rule actually applies (which experience
shows to be something of an exaggeration, and utterly irrelevant
to a lot of well written code), and I were to take the same code
and optimize that 10% as much as possible.  I might expect that
to double the speed of the 10%.  Now, how much faster would this
unrealistic example get?  45%.

Now, let us compare these: if the programmer had trained himself
to code my way, he could have written the program to be 25%
faster *with no additional effort*.  Or, he could have it 45%
better doing it the original way, after all the additional effort
to optimize that 10% (if we are being optimistic, that is).

If he were to spend that extra effort on the code that follows my
practices, the resultant code would be 59% faster.

The point of all this is that learning coding practices that
result in writing reasonably fast code, and applying them, can
result in code that is as good or better (when measured by
execution time) as code written to standards that ignore
execution speed which is then optimized as necessary.  And having
learned to do this, one does not have to spend extra effort to
make faster code, it comes "naturally".

What I am calling for is an understanding of what kind of coding
practices make for good code, and the inclusion of execution
speed in the definition of "good code".

---

Now, as I said, I will justify why *I* use pointers almost to the
exclusion of arrays.

Most of the code I write in which efficiency is a concern falls
into one of two categories: code that takes a long time to run,
but will be run on a particular kind of machine, and code that
takes a fairly short time to run, but will get run on damn near
every kind of machine that is capable of running it at all.

For the code that takes a long time to run, it happens that it
will be run on a more-or-less conventional architecture: this
means that pointer references are generally going to be faster
than indexing.

For the others, what happens is that the unconventional
architectures also run the programs very quickly: the execution
time does not matter for these.  However, for a certain class of
conventional processors, the execution time is of great
performance: if the code is not fast enough on an 8086 (or
equivalent), it just isn't useful.  For most of these slower
processors, indexing is much slower than using pointers.

Now, for the last case, the reason is compelling: I *must* use
pointers instead of indexing as the code will not be fast
enough.  And, at least for my code, the 90-10 rule is pure BS:
rarely does *any* subsystem (never mind particular functions) get
more than 20% of the execution time.

---

The long and short of the above is that neither indexes or
pointers are intrinsically better than the other; programming
being a utilitarian art, "better" depends on what you want to use
it for.  And in C, given the equivalence between indexing and
pointer arithmetic, there can be no abstract reason for
preferring one above the other.

---
Bill

You can still reach me at proxftl!bill
But I'd rather you send to proxftl!twwells!bill

cik@l.cc.purdue.edu (Herman Rubin) (09/30/88)

In article <836@proxftl.UUCP>, bill@proxftl.UUCP (T. William Wells) writes:
> In article <607@ardent.UUCP> mec@ardent.UUCP (Michael Chastain) writes:
> : Good grief.  Say NO to low-level performance tweaking.
> 
> Good grief.  Say NO to understanding the language well enough
> that "low-level performance tweaking" is natural to the
> programmer.  Grrr.  Note that the quotes here are because I
> disagree that the programming practices being condemned
> constitute "low-level performance tweaking".

<	Good grief.  Say NO to understanding the MACHINE well enough
	that "low-level performance tweaking" is natural to the
	programmer.  Grrr.  Note that the quotes here are because I
	disagree that the programming practices being condemned
	constitute "low-level performance tweaking".                       >

I frequently have less difficulty thinking in terms of the machine structure
than the arbitrary obfuscated language constructs.  The gains are frequently
greater at this level.  If one thinks this way, it is not at all difficult to
find situations in which factors of 10 arise.

If I have so little difficulty in understanding machine operations for many
different machines, why should it be considered so overwhelmingly difficult
for others to be able to read and maintain code written with those
considerations in mind?
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

rwberry@hubcap.UUCP (Robert W Berry) (09/30/88)

From article <836@proxftl.UUCP>, by bill@proxftl.UUCP (T. William Wells):
> In article <607@ardent.UUCP> mec@ardent.UUCP (Michael Chastain) writes:
> : Good grief.  Say NO to low-level performance tweaking.
> ...
> At one time, I spent a lot of my spare time improving other
> people's code.  One of the things I discovered is that almost all
> the code I was dealing with had better than 25% execution time
> fat.  What I mean is that when I merely changed all of the code
> to follow my coding practices (which are not anything special,
> just obvious things that every coder ought to know), it got that
> ...
> Now, let us compare these: if the programmer had trained himself
> to code my way, he could have written the program to be 25%
> faster *with no additional effort*.  Or, he could have it 45%
> better doing it the original way, after all the additional effort
> to optimize that 10% (if we are being optimistic, that is).
> 
> If he were to spend that extra effort on the code that follows my
> practices, the resultant code would be 59% faster.
> 
> The point of all this is that learning coding practices that

    I've been programming for YEARS, and if it's not a professional trade
secret, I'd love it if you would summarize your "coding practices" and
email them or post them.  If you can get half the added efficiency
you're talking about, the programming practices be would of interest to the
whole computing world.  Like you I spend a great deal of my time
optimizing code for other people, and I have yet to see more than a
20-25% improvment (even over BAD code.)  
                *** THIS IS NOT A FLAME OF ANY KIND ***
I would sincerely like to see these practices.  They might give me a new
outlook on coding.

BTW the way I believe that the 90-10 proposition includes OS overhead,
and thus it becomes a much more possible figure.  (ie  It is really the
OS that forms most of that 10%  Depending on the complexity of your OS
this may often be close to the case.)
I do agree on your views of personal preference.  Some people just THINK
in terms of pointers.  For these people writing code with array indexing
would be nearly as comfusing as a person who thinks in array indexing
trying to write code with pointers. 

Looking forward to hearing more,
Bob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-=- Bob Berry -=- PC-Guru's Inc. ! INTERNET:rwberry@hubcap.clemson.edu  -=-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=   ! BITNET:rwberry@clemson               -=-
-=- This space for rent or lease ! Compuserver:72646,3331 or 73170,1242 -=-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

mouse@mcgill-vision.UUCP (der Mouse) (10/01/88)

In article <607@ardent.UUCP>, mec@ardent.UUCP (Michael Chastain) writes:
> 	/* Array */
> 	for ( iArg = 1; iArg < argc; iArg++ )
> 		ProcessArg( argv[iArg] );

> 	/* Pointer */
> 	while ( --argc )
> 		ProcessArg( *++argv );

> I find the first version more writeable.  More debuggable.  More
> readable.  And more likely to be correct *the first time*.

Your first two words are the most telling: "I find".

> If you had to debug this code with a typical sdb-like debugger, which
> would you rather look at?   "iArg" or "argv"?

I would generally shut off the symbolic nonsense and look at offsets
from the argument pointer.  I don't trust compilers to produce code
corresponding to what I wrote any longer.  But I would rather look at
the pointer version, if you really want to know.

> Suppose you told your boss: "my next project will be:
> 	-- 20% sooner
> 	-- 20% cleaner
> 	-- 30% less bugs
> 	--  5% larger
> 	--  5% slower (in non-real-time-critical areas)"
> What do you think he or she would say?

He probably wouldn't care.  Next question? :-)

But also, I fail to see how forcing myself to code in a style I find
unnatural[*] is going to bring any of the improvements from the above
list.  It most certainly won't bring the first two.

[*] Not to imply a value judgement; just because it's not natural for
  me doesn't mean a thing about anyone else.

This also is not to say that I don't use [] notation.  I do, sometimes.
De gustibus non est disputandum, or whatever it's supposed to be (my
knowledge of Latin is right down there with my knowledge of COBOL).

> #if	FLAME
> The last word for you efficiency geeks:
> #if	BIG_COMPUTER
> My array code vectorizes more easily than your
> pointers-that-might-be-aliased code.  So while you're saving a cycle
> here, a cycle there, I'm getting 400% performance improvements just
> by recompiling.
> #endif

Only if your compiler supports noalias or moral equivalent, to allow
you tell it that your arrays aren't aliased either.  And in that case,
I can tell it my pointers aren't aliased....

> #if	INTEL_8086
> ....a normal 32-bit pointer....

I was under the impression that a "normal" pointer on the 8086 was 16
bits.  Of course, I don't stoop that low very often[%], so I can't
claim great familiarity with it.

[%] I *do* mean to imply a value judgement here.  I think the 8086 is a
  horrible machine to have to program.

> #endif	FLAME

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

mcdonald@uxe.cso.uiuc.edu (10/01/88)

>    I've been programming for YEARS, and if it's not a professional trade
>secret, I'd love it if you would summarize your "coding practices" and
>email them or post them.  If you can get half the added efficiency
>you're talking about, the programming practices be would of interest to the
>whole computing world.  Like you I spend a great deal of my time
>optimizing code for other people, and I have yet to see more than a
>20-25% improvment (even over BAD code.)  
>                *** THIS IS NOT A FLAME OF ANY KIND ***
>I would sincerely like to see these practices.  They might give me a new
>outlook on coding.

I think that 25% or even 60% improvement is not a bad guess. This is
only an average, though, and includes, in my experience, some cases
of truly great gains , that is 30000% improvement (300 times faster).

I'll be happy to share my experiences (I have my flameproof suit on,
wear yours).

1.
      Don't use a bad algorithm. I have seen programs with slow Fourier
transforms in them, a factor of 10 to 50 too slow. The NCAR graphics
package included a 3d surface contour plotter that drew with a pen
plotter in such a way that it picked up the pen after EVERY line
segment, even if the next one started where the last one left off
(also present in Mathcad)(factor of 8). The latest version (3.9p)
of MicroEmacs searches 30 times faster than the previous one by
an improved algorithm (though the old one suffered from another
problem also.)

2.     
       Don't use a general purpose routine to do a very specific,
simple thing. The classic case in C is writing 
     char ss[80];
     ss[0] = 'a';
     ss[1] = '\0';
      printf("%s",ss);
       
instead of putchar('a');  ... note that I'm not talking about the
first one happening occasionally in a loop.

-2.    Don't reinvent a slow wheel if a fast one is lying around for
the taking. (Illustrated by the recent thread on Duff's device, where
several folks discovered memcpy to be faster. Save hacks like that
for where they are NEEDED - like writing to IO space, where memcpy
might not work.) 

3.     Don't use a complicated canned system to push square pegs
in round holes. This is where I have found my factors of 300. Graphics
systems like GKS and Phigs are the classic case, as are the graphics
routines in Microsoft Windows (and, I'll bet a cup of coffee, OS/2),
but not, for some odd reason , the Mac. Those things go through so
many layers of code from your call to the screen that they are
guaranteed to be slow. If you have a simple task (as most of mine are)
like drawing a bunch of circles and ellipses and moving them around
the screen fast, use a simple tool. Write your own tool to draw ellipses
direct to the screen - direct to the hardware. My routines beat
Microsoft WIndows by a factor of 10 and IBM's GKS by 100 to 1000 times
(that's right, 1000 times faster). Save the big clumsy packages for
big clumsy tasks - if your calculations take 10,000,000 lines,
they are probably just fine.

4.     Don't use nested subroutines to do lots of simple things
inside lots of simple things. Write the damn stuff out, even if it makes
more code. The classic case here is IBM's PC video bios.

5.     Okay, we admit the rest is tweaking. 25% is a good goal.
To get the fastest code, the first thing to do is convince yourself
(and maybe your boss - luckily I don't have one) that you WANT to get
it faster. Then understand the hardware and instruction set of the
target computer (portable? what's that? We have already decided to go 
go the gold medal). Look at the compiler's .asm output. Try changing
things like x*65 to x<<6+x. It might be faster, might not. IF
your code does indexing like      for(i = 0; i < 100; i++)a[i] = i*6;
change it to                 b = a;
                             j = 0;
                             for(i = 0; i < 100; i++, j += 6)*b++ = j;
It might (or might not) make a difference.

Honestly now - aren't all these things obvious? I would love to hear
about non-obvious ones!. Nevertheless, I have seen them missed more
often than not.

6.     When all else fails, cheat. Use assembly language. Look at the
compiler output. Maybe the optimizer isn't perfect. Remove stores of
never used values. Play with instruction order. Maybe your processor
can overlap floating point computations with fixed point ones -
take advantage of this. Maybe you can assign all your variables
in a loop to registers and get a big gain there (I got a 15% gain
in one graphics routine by storing data in argument pointer and
segment registers on the 8086). There are some cases where certain
compilers can do a better job than you or me, mainly RISC machines
with microscheduling compilers. I'd still like to see what the
guy who designed the scheduler could eke out if he tried hard!


Doug McDonald

D
D
                             for(i = 0; i < 100; i++, j+=6)

charette@edsews.EDS.COM (Mark A. Charette) (10/02/88)

In article <836@proxftl.UUCP>, bill@proxftl.UUCP (T. William Wells) writes:
> 
> At one time, I spent a lot of my spare time improving other
> people's code.  One of the things I discovered is that almost all
> the code I was dealing with had better than 25% execution time
> fat.  What I mean is that when I merely changed all of the code
> to follow my coding practices (which are not anything special,
> just obvious things that every coder ought to know), it got that
> much better; I am not talking about mere tweaks, nor am I talking
> about optimizing particular sections of code.  I am talking about
> reading each line of code and making obvious changes (things like
> using temporaries when needed, adding appropriate register
> declarations, eliminating obviously unnecessary function calls,
> etc.)
> 
We have 6 million lines of PL/1 code we are porting to workstations, mostly well
written and modular in form. Your methods were used in writing this code
(it was written to take advantage of the IBM 370 architecture). The writers
(many, many writers) were trained in making their algorithms fly while
keeping the code within ANSI bounds. Lots of effort went into analyzing the
object code generated for equivalent algorithms, with the winners (in
execution speed) used for the final programs.


The port looks like its going to be sheer hell on a team of people for the
next two or three years. All the effort taken during the last 12 years to
make the code portable while conscientiously taking advantage of the
underlying architecture has made optimization on a workstation a nightmare.
However, if we limited the programs by keeping them on the mainframes, we
would never reach the productivity gains that we hope to attain.

All things in moderation. (Of course, 6,000,000 lines of code ain't
moderate.)
-- 
Mark Charette                   "On a clean disk you can seek forever" - 
Electronic Data Systems                  Thomas B. Steel Jr.
750 Tower Drive           Voice: (313)265-7006        FAX: (313)265-5770
Troy, MI 48007-7019       charette@edsews.eds.com     uunet!edsews!charette

gwyn@smoke.ARPA (Doug Gwyn ) (10/03/88)

In article <517@poseidon.UUCP> psrc@poseidon.UUCP (Paul S. R. Chisholm) writes:
>Moral:  A really good C programmer should know assembler language and
>compiler technology to write ordinary applications efficiently.

I have to disagree with this.  What is "efficient" for one implementation
may not be efficient for another, and most real C applications end up being
used with several different C implementations.  Low-level tweaking should
only be done when a specific implementation has unacceptable overhead in
some "bottleneck" code, and only after the algorithmic aspects of the
application have been made as good as possible.  Jon Bentley's book on
"Writing Efficient Programs" is recommended reading.

bright@Data-IO.COM (Walter Bright) (10/04/88)

In article <3105@hubcap.UUCP> rwberry@hubcap.UUCP (Robert W Berry) writes:
>From article <836@proxftl.UUCP>, by bill@proxftl.UUCP (T. William Wells):
>> In article <607@ardent.UUCP> mec@ardent.UUCP (Michael Chastain) writes:
>    I've been programming for YEARS, and if it's not a professional trade
>secret, I'd love it if you would summarize your "coding practices" and
>email them or post them.
>Like you I spend a great deal of my time
>optimizing code for other people, and I have yet to see more than a
>20-25% improvment (even over BAD code.)  

Here's some of the things I do:
    o	Organize complicated if statements so the result of the calculation
	is known by evaluating it as little as possible. For instance,
		if (a && b && c)
			...
	organize such that if a is easy to evaluate and usually FALSE, put
	it first.
    o	Replace stuff like,
		if (a)
			x = b;
		else
			x = c;
	with,
		x = a ? b : c;
	Many compilers generate better code for the latter.
    o	Replace stuff like,
		x = (a == 0) ? 1 : 0;
	with,
		x = (a == 0);
	I know it's obvious, but I see it all the time!
    o	Use the ^ operator because many times,
		a = !a;
	should really be,
		a ^= 1;
    o	Avoid things like,
		a %= 4;
	Replace with,
		a &= 3;
    o	In fact, avoid !, / and % like the plague.
    o	Organize control flow such that the most common cases go straight
	through, this is most important for pipelined machines.
    o	I gag when I see things like,
		a = strlen("asdf") + 1;
	instead of,
		a = sizeof("asdf");
	The strlen case above was even printed in a magazine recently to
	'prove' that assembler was better than C!
    o	Combine printfs,
		printf("aksdf aksjdhf kahdf jhdsfhj\n");
		printf(" asdkljfhkajshdf djfh kjahsdfkja h\n");
	Convert to,
		printf("aksdf aksjdhf kahdf jhdsfhj\n\
		 asdkljfhkajshdf djfh kjahsdfkja h\n");
    o	Try to improve the 'locality' of operations, i.e. move calculations
	as close as possible to the point where they are used. This helps
	most compilers to utilize registers better.
    o	Replace int variables with unsigned where possible. This tells the
	optimizer that the variable can never be negative, making certain
	optimizations possible.
    o	Put the most frequently accessed member of a struct first, so the
	offset is 0.
    o	Use char arrays instead of int arrays where possible. Adjust other
	arrays so that the size of the array elements are a power of 2.
	(Making the address calculation a simple shift.)
    o	Avoid struct function parameters and return values.
    o	Avoid bit fields, most especially signed ones.
    o	Replace sequences of if..else if... with a switch.
    o	Use realloc instead of malloc/free pairs. Use calloc instead of
	malloc followed by zeroing each member.
    o	Think about replacing trivial functions with a macro.

Floating point code offers more opportunities:
    o	Replace floats with doubles (to avoid the conversion).
    o	Try very hard to replace divides with other operations, as in:
		x / 10
	with:
		x * .1
    o	Use functions like ldexp and frexp as much as possible.
    o	See if some calculations can be done using integer math.
    o	Use temporaries to eliminate all common subexpressions.
    o	Reorganize expressions to make the best use of compile-time
	constant folding, since the compiler is not allowed to do that.

And finally:
	Try looking at the assembler output for your favorite function.
	You might be surprised!

Most of these are probably obvious to you. But I have seen all of the
above repeatedly, and have even noticed it myself in going over my own
code. Also, of course, all of the above are judgement calls, and they
are not universally applicable.

aash@ms.uky.edu ( Aashi Deacon ) (10/04/88)

>    o	Try very hard to replace divides with other operations, as in:
>		x / 10
>	with:
>		x * .1
>    o	See if some calculations can be done using integer math.

Are these two points contradictory?
Thanks for those good hints but I got stopped when I saw this first point.

According to theory, '.1' cannot be represented exactly as a floating
point number because in base2 it is irrational.  Wouldn't then the
first be better in this case?

aash
aash@ms.uky.edu

peter@ficc.uu.net (value) (10/05/88)

In article <225800077@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu writes:
> If you have a simple task (as most of mine are)
> like drawing a bunch of circles and ellipses and moving them around
> the screen fast, use a simple tool. Write your own tool to draw ellipses
> direct to the screen - direct to the hardware.

Noooooooooooo! Don't go around a windowing system, particularly a multitasking
windowing system like Microsoft Windows (or Amiga Intuition), unless you
absolutely have to and are willing to emulate the layer protection for
partially obscured bitmaps. I don't know about Windows, but I once tried
going around Intuition -- in a place where it was safe to -- and found only
a minimal improvement at the cost of much convenience. I would expect safe
places in Windows to be even less common, since it doesn't have a 'screens'
concept.

Before you do anything like this, TRY IT THE OBVIOUS WAY. If it's too slow,
try to cut down on your redraws. I was able to get adequate speed from the
PC's BIOS just by doing some curses-type display optimisation. Then if it's
still too slow you can go straight to the bitmaps.
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation.
"Have you hugged  U  your wolf today?"            peter@ficc.uu.net

gwyn@smoke.ARPA (Doug Gwyn ) (10/05/88)

In article <1700@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:
>Here's some of the things I do:

I have to take exception to most of these micro-efficiency tweaks.
Most of them have no effect on the code generated by even a moderately
good compiler, and nearly all of them make the code more obscure.
It is much more important for the source code to be patently correct
than for every surplus nanosecond to be squeezed out.  Fast, buggy
code does nobody a service, and bugs are hard to spot when the source
code is obscure.  Besides, most efficiency losses are from poor
choice of data structures or algorithms, not from fatty source code.

>    o	Replace stuff like,
>		if (a)
>			x = b;
>		else
>			x = c;
>	with,
>		x = a ? b : c;

I don't know of any compiler that wouldn't produce the same code
for these two cases.  Write whichever is more readable in context.

>    o	Replace stuff like,
>		x = (a == 0) ? 1 : 0;
>	with,
>		x = (a == 0);

Only if x is conceptually Boolean in the first place.

>    o	Use the ^ operator because many times,
>		a = !a;
>	should really be,
>		a ^= 1;

If a is conceptually Boolean, the first is clearer.

>    o	Avoid things like,
>		a %= 4;
>	Replace with,
>		a &= 3;

You're assuming a is nonnegative.  Any decent compiler will
perform such instruction replacements for you.  Write whatever
is clearest in context.  To get a remainder, % is clearer.

In the case that the "4" might change, parameterizing the
first case will give correct code after it changes, while
the second will break unless another power of two is used
for the replacement for "4".  Thus the first is SAFER.

>    o	In fact, avoid !, / and % like the plague.

It's pretty hard to divide without /.  You should be sure
that the divisor cannot be zero before flow reaches this
point in the algorithm.  (This does NOT mean to "stick in
a test for division by 0".)

Because % is not a true modulo operator, its use are
limited; however, it does have its uses.

It's hard to imagine avoiding ! in readable code.

>    o	I gag when I see things like,
>		a = strlen("asdf") + 1;
>	instead of,
>		a = sizeof("asdf");

The general principle is to compute at compile time whatever
CAN be computed at compile time rather than at run time.
But sometimes it is clearer to initialize things at run time
(not in this example, though).

>    o	Combine printfs,
>		printf("aksdf aksjdhf kahdf jhdsfhj\n");
>		printf(" asdkljfhkajshdf djfh kjahsdfkja h\n");
>	Convert to,
>		printf("aksdf aksjdhf kahdf jhdsfhj\n\
>		 asdkljfhkajshdf djfh kjahsdfkja h\n");

Thereby introducing a bug (exercise for the student).
The difference in time between these is negligible,
but if you're really tweaking for efficiency puts()
might have been better, depending on the implementation.

>    o	Try to improve the 'locality' of operations, i.e. move calculations
>	as close as possible to the point where they are used. This helps
>	most compilers to utilize registers better.

Just so long as clarity of the algorithm is maintained.

A related point is to declare local variable in blocks
rather than all at the beginning of the function body.

>    o	Replace int variables with unsigned where possible. This tells the
>	optimizer that the variable can never be negative, making certain
>	optimizations possible.

It can also produce slower code; this depends on the
implementation.

>    o	Put the most frequently accessed member of a struct first, so the
>	offset is 0.

Not all architectures can access the 0 offset faster than
the others.  I knew of one that was actually slower.

>    o	Use char arrays instead of int arrays where possible.

Char access on a word-oriented machine is likely to be
noticeably slower.

>    o	Avoid struct function parameters and return values.

This is a matter of interface design.  Small structs such
as one might use to represent a complex number are not a
problem; large structs are more quickly (though less conveniently)
passed by reference, so long as not too many references to
the members are made inside the function.  Forcing the
caller to allocate the structure may not be convenient.

>    o	Avoid bit fields, most especially signed ones.

I would say rather, use bit fields where they seem to be
the natural mechanism, as in device register definitions.

>    o	Replace sequences of if..else if... with a switch.

These are not equivalent.  Switch wins when there are
numerous cases for different integer values of a variable,
but not in the more general case of alternative conditions.

>    o	Use realloc instead of malloc/free pairs. Use calloc instead of
>	malloc followed by zeroing each member.

realloc() is not usually faster, and it makes the application
bookkeeping harder.

calloc() is almost worthless since it stores zero BYTES
into the storage area.  That may not be appropriate for
data types other than the integral types, in particular
pointers and floating-point data.

>    o	Think about replacing trivial functions with a macro.

Functions are usually better for debugging.  If a function
is a bottleneck, consider making it a macro, but be aware
that macros are not as flexible or as safe as functions.

>    o	Replace floats with doubles (to avoid the conversion).

Use the size that best fits the problem, keeping in mind
that the standard math library deals with doubles.  Float
can be faster; ANSI C does not require all floats to be
promoted to doubles in expressions.  Large arrays almost
certainly should be float[] not double[].

>    o	Try very hard to replace divides with other operations, as in:
>		x / 10
>	with:
>		x * .1

But this is less accurate, and not necessarily noticeably
faster (it depends on the f.p. hardware).

Some optimizers will do this for you.  If it is more readable,
use division when it is the natural way to express the
computation; many robust algorithms will naturally not use
division since they keep numbers in the range from say -2 to 2
and division by something near 0 could be problematic.

>    o	Use functions like ldexp and frexp as much as possible.

No, use them sparingly.  Their main value is to pick apart
a floating point number into integers which are easier to
interchange among heterogeneous processors, for example in
network connections.  In normal application algorithms there
is no value in using these functions.

>    o	Use temporaries to eliminate all common subexpressions.

That can actually interfere with a good optimizer.  Use
temporaries when you suspect that many optimizers will not
take care of the common subexpression for you; otherwise
if it is not in a bottleneck section of code, leave it alone.

henry@utzoo.uucp (Henry Spencer) (10/05/88)

In article <225800077@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>>... I have yet to see more than a
>>20-25% improvment (even over BAD code.)  

As Knuth once said, approximately, "in no other branch of engineering is an
easily-obtained 10% improvement in performance viewed with scorn".
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bright@Data-IO.COM (Walter Bright) (10/06/88)

In article <10332@s.ms.uky.edu> aash@ms.uky.edu ( Aashi Deacon ) writes:
<<    o	Try very hard to replace divides with other operations, as in:
<<		x / 10
<<	with:
<<		x * .1
<<    o	See if some calculations can be done using integer math.
<Are these two points contradictory?
	Not if x is a double, which I implied but didn't state.
<According to theory, '.1' cannot be represented exactly as a floating
<point number because in base2 it is irrational.  Wouldn't then the
<first be better in this case?
	The result may differ in the last bit of the significand because
	of this. However, in most cases, that is irrelevant and the
	multiply will be significantly faster.

henry@utzoo.uucp (Henry Spencer) (10/06/88)

In article <1700@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:
>    o	Replace int variables with unsigned where possible. This tells the
>	optimizer that the variable can never be negative...

This can be a mistake on some machines, and I'm not sure how it would help
very often.

>    o	... Use calloc instead of
>	malloc followed by zeroing each member.

Mistake -- the semantics are not the same.  Calloc fills with zero bits;
that is not necessarily the same thing as zero values, especially for
floating-point and pointers.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bright@Data-IO.COM (Walter Bright) (10/06/88)

In article <8630@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>I have to take exception to most of these micro-efficiency tweaks.

For brevity, I've taken the liberty of paraphrasing your objections:
>1. Optimizing the algorithm is more important than the low level code.
>2. Low level tweaks are not portable.
>3. Low level tweaks are less clear.
>4. A good compiler should do this optimization for you.
>5. On machine X with compiler Y this optimization will make things worse.

All true. However,
1. After you've optimized the algorithm, then you go after the low level
   code. My article was about opportunities for tweaking low level code.
2. Frankly, I don't care about portability to the following machines:
	o 1's complement
	o NULL pointers aren't 0 bit patterns
	o 0 double values aren't 0 bit patterns
   I've never met one, and I'll worry about them if and when I have
   the misfortune of dealing with them. I worry about Vaxes, 68ks,
   8086, 286, 386, and 32032.
3. A matter of opinion. I've met programmers who would go to extreme
   lengths to avoid using the ^ & | operators because they thought that
   bit operations were 'obscure'.
4. Most of us have to use compilers whose optimizations are less than
   perfect. In fact, there are hardly any global flow analysis compilers
   out there.
5. In writing code for machines A, B and C, I write to the lowest common
   denominator, and hand-tweak for the slowest machine.

Micro-efficiency gets regularly denounced. Here are some counterarguments:
o	If your commercial product is slower than the competition, you
	won't get a chance to take advantage of the maintainability/
	portability advantages because you'll be out of business.
o	The sum of all the tweaks can make the difference (on the PC)
	between using a small memory model and the large. This gives
	a lot of leverage to seemingly minor changes.
o	I derive a lot of personal satisfaction from creating 'lean
	and mean' code.
o	Run-time speed makes the difference between a user interface
	that's crisp and snappy and one that's sloppy and sluggish.
o	Customers want speed. I've been selling consumer software for years,
	and I know. Speed *sells*. Telling the customer to buy faster
	hardware doesn't work.
o	Streamlining code even in relatively unused sections of code is
	important, because it reduces the size of the executable. Smaller
	programs run faster because:
	.	Less virtual memory paging
	.	Might be able to use smaller memory model
	.	Fewer overlays are necessary
	.	More memory is available, so you can trade execution speed
		for more memory consumption in your data structures
	.	They compile faster
	.	They link faster
	.	They load faster

ggs@ulysses.homer.nj.att.com (Griff Smith) (10/06/88)

In article <8630@smoke.ARPA|, gwyn@smoke.ARPA (Doug Gwyn ) writes:
| In article <1700@dataio.Data-IO.COM| bright@dataio.Data-IO.COM (Walter Bright) writes:
| |Here's some of the things I do:
| 
| I have to take exception to most of these micro-efficiency tweaks.
| Most of them have no effect on the code generated by even a moderately
| good compiler, and nearly all of them make the code more obscure.

Agreed, but a few might deserve some further comment...

| >    o	Replace stuff like,
| >		if (a)
| >			x = b;
| >		else
| >			x = c;
| >	with,
| >		x = a ? b : c;
| 
| I don't know of any compiler that wouldn't produce the same code
| for these two cases.  Write whichever is more readable in context.

When given the second definition, C++ 2.0 (beta 4) produces the
following error message if `x', `a' and `b' are of class `foo' and
`c' is a function call that returns a `foo'.

"xx.c", line 133: sorry, not implemented: temporary of class foo with destructor needed in ? expression

| >    o	Combine printfs,
| >		printf("aksdf aksjdhf kahdf jhdsfhj\n");
| >		printf(" asdkljfhkajshdf djfh kjahsdfkja h\n");
| >	Convert to,
| >		printf("aksdf aksjdhf kahdf jhdsfhj\n\
| >		 asdkljfhkajshdf djfh kjahsdfkja h\n");
| 
| Thereby introducing a bug (exercise for the student).
| The difference in time between these is negligible,
| but if you're really tweaking for efficiency puts()
| might have been better, depending on the implementation.

Sequences of printf can be harder to read, however.
The following example may show this better:

	(void) fprintf(stderr, "\
Usage:   sltread [options] input-drive-name\n\
Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n\
         [-d dataset-name] [-n dataset-number]\n\
         [-i ignore-count] [-r record-count]\n");

I grant that this is vulnerable to code hackers and naive
pretty-printers messing up the alignment.  On the other hand, it is
easier to see how everything lines up.
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{most AT&T sites}!ulysses!ggs
Internet:	ggs@ulysses.att.com

ark@alice.UUCP (Andrew Koenig) (10/06/88)

In article <1703@dataio.Data-IO.COM>, bright@Data-IO.COM (Walter Bright) writes:

> 	The result may differ in the last bit of the significand because
> 	of this. However, in most cases, that is irrelevant and the
> 	multiply will be significantly faster.

If, however, you want your compiler to implement IEEE floating point,
you had better not substitute a multiply by 0.1 for a divide by 10.

IEEE defines the results of addition, subtraction, multiplication,
division, square root, and a few other operations precisely -- down
to the last bit.  Either you get the right answer or you don't.
It is true that in many cases users don't need the last bit or so,
but a language implementer has an obligation to give it to the
people who demand it.
-- 
				--Andrew Koenig
				  ark@europa.att.com

diamond@csl.sony.JUNET (Norman Diamond) (10/06/88)

In article <10332@s.ms.uky.edu>, aash@ms.uky.edu ( Aashi Deacon ) writes:
->    o	Try very hard to replace divides with other operations, as in:
->		x / 10
->	with:
->		x * .1
>
> ... '.1' cannot be represented exactly as a floating
> point number because in base2 it is irrational.  Wouldn't then the
> first be better in this case?

In C, efficiency is more important than correctness.  Haven't you also
been following the other disputes in this news group?
-- 
-------------------------------------------------------------------------------
  The above opinions are my own.   |   Norman Diamond
  If they're also your opinions,   |   Sony Computer Science Laboratory, Inc.
  you're infringing my copyright.  |   diamond%csl.sony.jp@relay.cs.net

steve@umigw.MIAMI.EDU (steve emmerson) (10/06/88)

In article <10699@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com 
(Griff Smith) writes:
>   (void) fprintf(stderr, "\
>Usage:   sltread [options] input-drive-name\n\
>Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n\
>         [-d dataset-name] [-n dataset-number]\n\
>         [-i ignore-count] [-r record-count]\n");
>
>I grant that this is vulnerable to code hackers and naive
>pretty-printers messing up the alignment.  On the other hand, it is
>easier to see how everything lines up.

Nonsense.  Use instead:
#define	EPUTS(msg)	(void)fputs(msg, stderr)
    EPUTS("Usage:   sltread [options] input-drive-name\n");
    EPUTS("Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n");
    EPUTS("         [-d dataset-name] [-n dataset-number]\n");
    EPUTS("         [-i ignore-count] [-r record-count]\n");

This has all the advantages -- and none of the disadvantages -- of the other.
In ANSII-C you could even have the macro append the newline.

Don't write vulnerable code (at least not without *very* good reason).
-- 
Steve Emmerson                     Inet: steve@umigw.miami.edu [128.116.10.1]
SPAN: miami::emmerson (host 3074::)      emmerson%miami.span@star.stanford.edu
UUCP: ...!ncar!umigw!steve               emmerson%miami.span@vlsi.jpl.nasa.gov
"Computers are like God in the Old Testament: lots of rules and no mercy"

ggs@ulysses.homer.nj.att.com (Griff Smith) (10/07/88)

In article <171@umigw.MIAMI.EDU>, steve@umigw.MIAMI.EDU (steve emmerson) writes:
[my original deleted]
> Nonsense.  Use instead:
> #define	EPUTS(msg)	(void)fputs(msg, stderr)
>     EPUTS("Usage:   sltread [options] input-drive-name\n");
>     EPUTS("Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n");
>     EPUTS("         [-d dataset-name] [-n dataset-number]\n");
>     EPUTS("         [-i ignore-count] [-r record-count]\n");
> 
> This has all the advantages -- and none of the disadvantages -- of the other.

I'm still not crazy about using fputs as an efficiency hack, but this seems to be
on the right track.

> In ANSII-C you could even have the macro append the newline.

If I remember correctly, I can avoid the formatting problem completely
in ANSII C and avoid silly macros:

	(void) fprintf(stderr,
			"Usage:   sltread [options] input-drive-name\n"
			"Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n"
			"         [-d dataset-name] [-n dataset-number]\n"
			"         [-i ignore-count] [-r record-count]\n"
			);

> Don't write vulnerable code (at least not without *very* good reason).

I try not to.  A similar note to the other side: don't make gratuitous
changes to other people's code without a good reason.  I have been
burned several times by having something break because someone else
"improved" it.  In one case I failed to add a comment to explain why a
fairly complicated addition was necessary; someone else decided that it
was ugly and stripped it.  In another case the comment was there, and
ignored.

> -- 
> Steve Emmerson                     Inet: steve@umigw.miami.edu [128.116.10.1]
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{most AT&T sites}!ulysses!ggs
Internet:	ggs@ulysses.att.com

wes@obie.UUCP (Barnacle Wes) (10/07/88)

In article <225800077@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu gives
six good, fairly detailed tips for optimizing C programs.  One thing to
keep in mind here is that in general, you are wasting your time trying
to optimize the entire program.  Most programs spend a large amount of
their processing time in a few critical routines.

The way to conquer this problem, and not waste a lot of valuable time
optimizing the current program, when you could be adding features to the
next version :-), is to profile your code.  Find out which routines the
program spend most of it's time in, and then optimize THOSE routines.
If you spend 3 weeks getting this one routine to run in 350
milliseconds, as opposed to 1 second, you have just wasted 3 weeks if
that routine is only executed once every third or fourth time the user
uses the program.

-- 
                     {hpda, uwmcsd1}!sp7040!obie!wes

         "How do you make the boat go when there's no wind?"
                                 -- Me --

scs@athena.mit.edu (Steve Summit) (10/07/88)

In article <1703@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:
>	The result may differ in the last bit of the significand because
>	of this. However, in most cases, that is irrelevant and the
>	multiply will be significantly faster.

...demonstrating, once again, that it's always more important to
get the answer fast than get it right :-(.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

(Sorry, Walter, this isn't quite fair, and I understand that
being off in the last bit isn't necessarily "wrong," but as long
as the church of efficiency is standing around on soapboxes I'm
going to stand on mine.)

gwyn@smoke.ARPA (Doug Gwyn ) (10/07/88)

In article <1988Oct5.161523.6695@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>As Knuth once said, approximately, "in no other branch of engineering is an
>easily-obtained 10% improvement in performance viewed with scorn".

That isn't even a correct obervation, let alone relevant.
Consider adding nitro to your gasoline.
Unless you're a drag racer it would be pretty stupid,
and I'm pretty sure that most automotive engineers would agree.

The idea is to strive for a truly optimal solution to a problem,
taking ALL relevant factors into proper account.  Of course this
cannot be done exactly in practice, but it should steer one's
approach.  Note that some relevant factors might include the
availability of resources for software maintenance, adaptability
to changing future needs, and other even harder to quantify
factors.  One thing for sure:  Concentrating on a single factor
to the exclusion of other obviously relevant ones is quite
likely to produce a suboptimal result.

bright@Data-IO.COM (Walter Bright) (10/08/88)

In article <8267@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:
>In article <1703@dataio.Data-IO.COM>, bright@Data-IO.COM (Walter Bright) writes:
>> 	The result may differ in the last bit of the significand because
>> 	of this. However, in most cases, that is irrelevant and the
>> 	multiply will be significantly faster.
>If, however, you want your compiler to implement IEEE floating point,
>you had better not substitute a multiply by 0.1 for a divide by 10.
>
>IEEE defines the results of addition, subtraction, multiplication,
>division, square root, and a few other operations precisely -- down
>to the last bit.  Either you get the right answer or you don't.
>It is true that in many cases users don't need the last bit or so,
>but a language implementer has an obligation to give it to the
>people who demand it.

	You (and some others) misunderstood me. In no case did I recommend
	the *compiler* perform the substitution. I recommended that the
	*programmer* do it, of course after making sure that the semantics
	for *his application* were the same.

	For example, suppose you wished to convert Celsius to Farenheit
	in a tight loop. Multiplying by the scale factor is much faster
	than dividing by the reciprocal, and the answer will be correct
	to the desired accuracy.

	Another example is scaling things for a graphics display. It's faster
	to design the system using a multiplicative scale factor rather than
	a dividing scale factor.

	In fact, for all you numerologists, the * may be what the algorithm
	required instead the / to get the correct result.

	In floating point, there is a speed/accuracy tradeoff, not a
	right/wrong answer. Different applications require different
	approaches. I suggested the replacement of a divide with a
	multiply because:
	1. Many programmers aren't aware that * is much faster than /.
	2. Many applications don't care about the difference in result.
	3. It's just another thing to look for.

	I stand by my original comments.

ark@alice.UUCP (Andrew Koenig) (10/08/88)

In article <1706@dataio.Data-IO.COM>, bright@Data-IO.COM (Walter Bright) writes:
> 	For example, suppose you wished to convert Celsius to Farenheit
> 	in a tight loop. Multiplying by the scale factor is much faster
> 	than dividing by the reciprocal, and the answer will be correct
> 	to the desired accuracy.

This obviously depends on what you desire.  For instance,
whether I evaluate

	f = 32 + c * (9.0/5.0);

or

	f = 32 + c / (5.0/9.0);

I will not get as accurate an answer as if I evaluate

	f = 32 + (c * 9.0) / 5.0;

Now, I know that a sensible compiler will do the division inside
the parentheses for me at compile time in each of the first two
examples above, where the third one does a multiplication *and*
a division.  The third is clearly the slowest.  But on a machine
with sensible floating-point, it guarantees me the most accurate
result possible whenever c is an integer.  In particular, when
I convert 100 degrees Celsius to Fahrenheit, I get 212, not
211.9999999999999 or 212.0000000000001.  The other two ways of
doing it do not guarantee that.
-- 
				--Andrew Koenig
				  ark@europa.att.com

rob@kaa.eng.ohio-state.edu (Rob Carriere) (10/09/88)

In article <8273@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:
>In particular, when
>I convert 100 degrees Celsius to Fahrenheit [using * and /], I get 212, not
>211.9999999999999 or 212.0000000000001.  The other two ways of
>doing it do not guarantee that.

Probably.  However, as thermometers more accurate than about +-.1K are
fairly rare, what's the diff?  You can't rely on the number being more
precise than about +-.05 degrees F anyway, due to the inaccuracy of
your input.  This is exactly the point the original poster was trying
to make; if *you* know the last couple of bits don't matter, you might
as well take advantage of that fact.  If on the other hand, this
measurement was made using a Star Trek tricorder (which is accurate to
12 decimal places) and had to be reported to Spock, *then* you'd
better be careful that you don't destroy any bits of accuracy, and use
your trick (and double precision).

Rob Carriere

dharvey@wsccs.UUCP (David Harvey) (10/09/88)

In article <10332@s.ms.uky.edu>, aash@ms.uky.edu ( Aashi Deacon ) writes:
> >    o	Try very hard to replace divides with other operations, as in:
> >		x / 10
> >	with:
> >		x * .1
> >    o	See if some calculations can be done using integer math.
> 
> Are these two points contradictory?
> Thanks for those good hints but I got stopped when I saw this first point.
> 
> According to theory, '.1' cannot be represented exactly as a floating
> point number because in base2 it is irrational.  Wouldn't then the
> first be better in this case?
> 
> aash
> aash@ms.uky.edu

For that matter, it is also very difficult to represent 10.0 (I am
assuming you are working with floating point) in any floating point
representation.  Also, if the operation is done in emulation mode (no
floating point in MPU or if math coprocessor it is not in machine) the
advantage will be nonexistent.  In other words, a sequence of left
shifts and adds is no better than a sequence of right shifts, et al.
Even with the coprocessor (math ops) a MUL takes approximately the same
amount of clock cycles a DIV does.  You would be much better served by
making variables that are used constantly registers (if you have float
registers) than some of this stuff.  Also, making Fortran indexing go
backwards and C's go forwards (better yet, use register pointers) for
multiply dimensioned arrays does wonders to reduce the page faulting
that normally occurs with multitasking/multiuser machines.

dharvey@wsccs

PS: Using the registers usually cuts time in half for affected
operations, IF you get the registers you asked for.

henry@utzoo.uucp (Henry Spencer) (10/09/88)

In article <8646@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>>As Knuth once said, approximately, "in no other branch of engineering is an
>>easily-obtained 10% improvement in performance viewed with scorn".
>
>That isn't even a correct obervation, let alone relevant.

While I'm not really in Knuth's camp on this general issue, I think I must
defend his comment.  There really are people -- too many of them -- who
consider a 10% performance improvement to be so insignificant that it's
not a valid argument in a debate.  These are generally people who have
never done performance-critical programming.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

mcdonald@uxe.cso.uiuc.edu (10/09/88)

>In article <225800077@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu gives
>six good, fairly detailed tips for optimizing C programs.  One thing to
>keep in mind here is that in general, you are wasting your time trying
>to optimize the entire program.  Most programs spend a large amount of
>their processing time in a few critical routines.

>The way to conquer this problem, and not waste a lot of valuable time
>optimizing the current program, when you could be adding features to the
>next version :-), is to profile your code.  Find out which routines the
>program spend most of it's time in, and then optimize THOSE routines.

Agreed, sometimes. Sometimes it is blatently obvious from the problem
description. Of three projects I just finished, all maybe 3000 to 5000
lines, one  does most of its work in 40 lines. One spreads out over
about 25% of the code. These were obvious from the start (pre-known
loop counts). One, my hack of Nelson Beebe's dvi driver, I couldn't
tell de novo since I didn't write it. I found a very nice profiler
called Inside! ( for the IBM-pc) on Bix. It is a crippled demo version,
but got the job done. I found that the work being done is distributed
very evenly over about 85% of the code. That means nothing can be done 
easily. Oh well!

Doug McDonald

henry@utzoo.uucp (Henry Spencer) (10/10/88)

For further reading on optimizing C code, you might want to read that
absolute gem of technical writing, "News Need Not Be Slow", by Collyer
and Spencer, in the proceedings of the Winter 1987 Usenix Conference.
(No, we don't get royalties on proceedings sales!)  We speeded up a
major program by about a factor of 20, without sacrificing portability
(no assembler involved).
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.ARPA (Doug Gwyn ) (10/10/88)

In article <1988Oct9.033134.15553@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <8646@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>>>As Knuth once said, approximately, "in no other branch of engineering is an
>>>easily-obtained 10% improvement in performance viewed with scorn".
>>That isn't even a correct obervation, let alone relevant.
>While I'm not really in Knuth's camp on this general issue, I think I must
>defend his comment.  There really are people -- too many of them -- who
>consider a 10% performance improvement to be so insignificant that it's
>not a valid argument in a debate.  These are generally people who have
>never done performance-critical programming.

There are times when it matters and times when it doesn't.
The latter dominate.
For example, most time in a highly interactive system is spent
waiting for the human to think.  So long as the system response
is fast enough, additional improvement is wasted.  (If the
response is NOT fast enough, then improvement is justifiable.)
Another example is spending a lot of developer time on wringing
out an additional 10% of code speed by machine-specific
micro-level tweaks that make originally simple code hard to
maintain, when a much greater improvement would be to simply
switch to more capable hardware.  This is particularly bad
when there are far more problems needing attention than there
are qualified problem solvers, as is usually the case.

I will agree that UNNECESSARY inefficiency should be avoided.
A few percent lost in several places in the code can produce
a significant loss for the overall system.  But I don't think
that "efficiency" should be the software engineer's main
concern, just one of many factors that have to be balanced.

dhesi@bsu-cs.UUCP (Rahul Dhesi) (10/10/88)

In article <208@obie.UUCP> wes@obie.UUCP (Barnacle Wes) writes:
>One thing to
>keep in mind here is that in general, you are wasting your time trying
>to optimize the entire program.  Most programs spend a large amount of
>their processing time in a few critical routines.

Let's remember that this behavior is true only for *unoptimized*
programs.

After you have located the critical routines and speeded them up, the
program will spend a large amount of its processing time spread over
many routines.  *Now*, if you want further significant speed-ups, you
can't just find a few hot spots.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi

jimw@microsoft.UUCP (Jim Walsh) (10/12/88)

In article <171@umigw.MIAMI.EDU> steve@umigw.miami.edu (steve emmerson) writes:
>Nonsense.  Use instead:
>#define	EPUTS(msg)	(void)fputs(msg, stderr)
>    EPUTS("Usage:   sltread [options] input-drive-name\n");
>    EPUTS("Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n");
>    EPUTS("         [-d dataset-name] [-n dataset-number]\n");
>    EPUTS("         [-i ignore-count] [-r record-count]\n");
>
>This has all the advantages -- and none of the disadvantages -- of the other.
>In ANSII-C you could even have the macro append the newline.

I believe that ANSI C allows (at least the implementation I'm using does
without flagging it in the manual as a deviation)

fputs("Usage:   sltread [options] input-drive-name\n"
      "Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n"
      "         [-d dataset-name] [-n dataset-number]\n"
      "         [-i ignore-count] [-r record-count]\n", stderr);

Thus you truly get ALL the advantages - a single function call, and even a few
null bytes saved.
-- 
Jim Walsh                       jimw@microsof.beaver.cs.washington.EDU
Microsoft Corporation           jimw@microsof@uw-beaver.ARPA 
                                jimw@microsof.UUCP
The views expressed herein are not necessarily mine, let alone Microsoft's...

tneff@dasys1.UUCP (Tom Neff) (10/14/88)

In article <10332@s.ms.uky.edu>, aash@ms.uky.edu ( Aashi Deacon ) writes:
> According to theory, '.1' cannot be represented exactly as a floating
> point number because in base2 it is irrational.  ...

No, .1 is not *irrational* (nonrepeating) in any base.  '.1' cannot
be represented as a conventional finite binary bit set because its
binary representation has a *repeating* mantissa.
-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536	       MCI: TNEFF
	 will function..."	GEnie: TOMNEFF	       BIX: t.neff (no kidding)

lfm@fpssun.fps.com (Larry Meadows) (10/15/88)

In article <711@wsccs.UUCP> dharvey@wsccs.UUCP (David Harvey) writes:
:
:For that matter, it is also very difficult to represent 10.0 (I am
:assuming you are working with floating point) in any floating point
:representation.

This is not the case.  10.0 is 8 * 1.25 or 2^3 * 1.01
						     [base 2].

                                         [P 1s]
In fact, all integers between zero and  1111...1
                                                [base 2]

(where P is the precision) are exactly representable in base 2 floating
point.  It is, however, the case that  N/10.0 is not necessarily the
same as 0.1*N when a binary floating point representation is used.
                                                
-- 
Larry Meadows @ FPS			...!tektronix!fpssun!lfm
					...!nosun!fpssun!lfm

firth@sei.cmu.edu (Robert Firth) (10/18/88)

In article <1700@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:

>Floating point code offers more opportunities:

>    o	Try very hard to replace divides with other operations, as in:
>		x / 10
>	with:
>		x * .1

No: since 10 is exactly representable and 0.1 is not, this change will
very often introduce additional rounding error.

>    o	Reorganize expressions to make the best use of compile-time
>	constant folding, since the compiler is not allowed to do that.

Again no: the compiler is not allowed to make these changes for the
simple reason that they change the results of the calculation.  Many
scientific programmers go to great pains to minimise rounding error;
don't undo their work.

General advice to a maintainer of "scientific" programs: do not TOUCH
any floating-point code unless you know at least as much numerical
analysis as its author.

sarima@gryphon.CTS.COM (Stan Friesen) (10/20/88)

In article <225800077@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>
> Then understand the hardware and instruction set of the
>target computer (portable? what's that? We have already decided to go 
>go the gold medal). Look at the compiler's .asm output. Try changing
>things like x*65 to x<<6+x. It might be faster, might not.

	Well, as far as I can tell, the Portable C Compiler does this
for you, at least with -O set.  I have seen the equivalent of the above
in so many places in the system that I cannot imagine it being in the
source code.  In fact I have seen it in once-executed start-up code,
which very few programmers would bother tweaking to this level.

-- 
Sarima Cardolandion			sarima@gryphon.CTS.COM
aka Stanley Friesen			rutgers!marque!gryphon!sarima
					Sherman Oaks, CA

dan@ccnysci.UUCP (Dan Schlitt) (10/21/88)

In article <8630@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <1700@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:
>>Here's some of the things I do:
>
>I have to take exception to most of these micro-efficiency tweaks.
>Most of them have no effect on the code generated by even a moderately
>good compiler, and nearly all of them make the code more obscure.
>It is much more important for the source code to be patently correct
>than for every surplus nanosecond to be squeezed out.  Fast, buggy
>code does nobody a service, and bugs are hard to spot when the source
>code is obscure.  Besides, most efficiency losses are from poor
>choice of data structures or algorithms, not from fatty source code.

	[Much deleted]
Doug clears up much misguided stuff.

>
>>    o	Use temporaries to eliminate all common subexpressions.
>
>That can actually interfere with a good optimizer.  Use
>temporaries when you suspect that many optimizers will not
>take care of the common subexpression for you; otherwise
>if it is not in a bottleneck section of code, leave it alone.

The first language I coded in was SOAPII and I soon moved on to early
versions of fortran.  Compilers in those days were dumb and hand
optimization in the source code and the assembler output were the 
thing of the day.  This was undoubtedly encouraged by youthful visions of
infalibility.  Also nobody thought seriously about hauling their
programs off to some other kind of computer.

Compilers are much better now.  Age has encouraged a modesty which
allows that the compiler may well know how to do things better than I
do.  That is certainly the case when I take my code off to some
unfamiliar computer.

I have concluded, and I think this is what Doug is saying, that one
should not code in a way that will confuse either the reader of the
code or the compiler.

If your compiler doesn't do a good job of optimization then your money
probably is spent better on a better compiler than it is on the time
of coders diddling with the code.

But that is just the opinion of some old codger and probably should be
ignored.

-- 
Dan Schlitt                        Manager, Science Division Computer Facility
dan@ccnysci                        City College of New York
dan@ccnysci.bitnet                 New York, NY 10031
                                   (212)690-6868

mcdonald@uxe.cso.uiuc.edu (10/23/88)

>Besides, most efficiency losses are from poor
>choice of data structures or algorithms, not from fatty source code.
Agreed.... But.....


>Compilers are much better now.  Age has encouraged a modesty which
>allows that the compiler may well know how to do things better than I
>do.  That is certainly the case when I take my code off to some
>unfamiliar computer.

>I have concluded, and I think this is what Doug<Gwyn - not me> is saying, 
>that one
>should not code in a way that will confuse either the reader of the
>code or the compiler.

>If your compiler doesn't do a good job of optimization then your money
>probably is spent better on a better compiler than it is on the time
>of coders diddling with the code.
Yes.... But.....

Sometimes the better compiler doesn't yet exist. An example is for
the 80386. So far no one is selling a 386 compiler that produces
code for MS-DOS (NOT, I should say, for 386 native mode, but one that
allows using 32 bit instructions in REAL mode - yes, you really can
do that!). By having my C compiler emit .asm output and replacing
5 lines of C with real 32 bit instructions I was able to speed up
a program by an astounding 40%! That is worth doing.

Microsoft - where is that /G3 compiler switch - it is long overdue!

Doug McDonald

yaping@eleazar.dartmouth.edu (11/03/88)

	[This article was written by Scott Horne, not Yaping Xu.
	Direct all e-mail responses to me at jalphin@prism.clemson.edu.]

In article <10332@s.ms.uky.edu>, aash@ms.uky.edu ( Aashi Deacon ) writes:
>>
>> ... '.1' cannot be represented exactly as a floating
>> point number because in base2 it is irrational.  

Irrational?  Clearly not.

				--Scott Horne