U23405@UICVM (Michael J. Steiner) (09/11/88)
First, I have a question. How and why is array indexing slower than pointer arithmetic? They are very similar. Also, I think that compilers should automatically translate array indexing into pointer arithmetic when dealing with arrays in C. Any comments, opinions? Michael Steiner Email: U23405@UICVM.BITNET
rwberry@hubcap.UUCP (Robert W Berry) (09/20/88)
In article <8809191521.AA17824@ucbvax.Berkeley.EDU>, U23405@UICVM (Michael J. Steiner) writes: > First, I have a question. How and why is array indexing slower than > pointer arithmetic? They are very similar. Also, I think that compilers > should automatically translate array indexing into pointer arithmetic when > dealing with arrays in C. Any comments, opinions? > > Michael Steiner > Email: U23405@UICVM.BITNET Indexing requires that a constant (also called an offset) be added to the beginning of the array for each and every reference. This overhead is the number one reason that array indexing is slower than pointer manipulation. (Today's C compilers often view the beginning of the array as a constant and add the offset from a register which is ideal for sequential memory accesses.) Pointers, on the other hand, make use of today's larger word sizes to directly address the object in question. More efficient, and much faster. With today's optimizing compilers this is often a moot point. Many compilers end up implementing both array indexing and pointer manipulation using indirect addressing (adding the value stored in one register to the value stored in another register to calculate and effective address.) Especially in a segmented memory architecture. The end result is that both techniques end up running in about the same time. Be wary however, because there are also a lot of compilers on the market that use different techniques for array indexing and pointer manipulation. So when in doubt, use pointers. They're _almost_ always faster (and closer to the way God intended people to program ;-)!) One good way to get a feel for this type of situation is to have your C compiler compile some source stubs into assembler (OH NO, ANYTHING BUT ASSEMBLER) and watch the way things REALLY happen with your C compiler. Even if you're not an assembler expert, some things become pretty obvious when you see them on a machine level. Happy Programming, Bob
rwberry@hubcap.UUCP (Robert W Berry) (09/20/88)
The doggone news-poster didn't tack on my address, and it also messed up my node name. I'm not an IDIOT, the news poster is!! Bob -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -=- Bob Berry -=- PC-Guru's Inc. ! Compuserve:72646,3331 or 73170,1242 -=- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BITNET:rwberry@clemson -=- -=- This space for rent or lease ! INTERNET:rwberry@hubcap.clemson.edu -=- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
g-rh@xait.CCA.COM (Richard Harter) (09/20/88)
In article <8809191521.AA17824@ucbvax.Berkeley.EDU> U23405@UICVM (Michael J. Steiner) writes: >First, I have a question. How and why is array indexing slower than >pointer arithmetic? They are very similar. Also, I think that compilers >should automatically translate array indexing into pointer arithmetic when >dealing with arrays in C. Any comments, opinions? Depends on the compiler, the machine, and the circumstances. The gain comes when you increment a pointer into the array rather than reference the indexed array. For example for (i=0;i<n;i++) { a batch of code referring to a[i] } versus tmp = a; for (i=0;i<n;,i++,tmp++) { same code referring to *tmp } In the first instance a[i] is equivalent to *(a+i) and most compilers will treat them much the same. The add has to be done each time a[i] is referred to. In the second case there is no add but there is an increment once through. A smart compiler will take the first instance and generate something like tmp = a; for (i=0;i<n;i++) { tmp1 = *tmp++; same code referring to tmp1 } If execution efficiency is your concern and your environment includes simple compilers there is profit to be got by doing your own optimization as above. The down side is that your hand optimization may interfere with the clever tricks of an optimizing compiler and you lose ground by doing hand optimization. There ain't no justice. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
swilson%thetone@Sun.COM (Scott Wilson) (09/20/88)
In article <8809191521.AA17824@ucbvax.Berkeley.EDU> U23405@UICVM (Michael J. Steiner) writes: >First, I have a question. How and why is array indexing slower than >pointer arithmetic? They are very similar. Also, I think that compilers >should automatically translate array indexing into pointer arithmetic when >dealing with arrays in C. Any comments, opinions? In K&R pg. 186 it says: "E1[E2] is identical (by definition) to *((E1)+(E2))." So array indexing is exactly the same as pointer arithmetic. What you might be hearing is about the efficiency of using array indexing in loops versus using pointers. For example: int i, sum = 0, array[100]; for (i = 0; i < 100; i++) sum += array[i]; is less efficient than: int i, sum = 0, array[100], *ip = array; for (i = 0; i < 100; i++, ip++) sum += *ip; In the first case, using array[i] results in a shift and an add to compute the address of the array element. (If the array element size was not a power of two then a multiplication would be needed.) In the second we are just bumping a pointer up each time through the loop. There is more that can be done to the second example as well (why increment two vars side-by-side). -- Scott Wilson arpa: swilson@sun.com Sun Microsystems uucp: ...!sun!swilson Mt. View, CA
jlg@lanl.gov (Jim Giles) (09/20/88)
From article <33488@xait.CCA.COM>, by g-rh@xait.CCA.COM (Richard Harter): > A smart compiler will take the first instance and generate something like > > tmp = a; > for (i=0;i<n;i++) { > tmp1 = *tmp++; > same code referring to tmp1 > } A smarter compiler will generate code more like this: ttt=a+n for (tmp=a; tmp<ttt; tmp++) { same code referring to *tmp } Most Fortran Compilers already make optimizations like this. J. Giles Los Alamos
scs@athena.mit.edu (Steve Summit) (09/20/88)
WARNING: Heretical C advice follows. Efficiency hackers please read no further. In article <8809191521.AA17824@ucbvax.Berkeley.EDU> Michael J. Steiner writes: >How and why is array indexing slower than pointer arithmetic? >I think that compilers should automatically translate array indexing >into pointer arithmetic when dealing with arrays in C. Array indexing may be slower, but it may also be easier (for human readers of the program) to understand. If there is the slightest chance that using array indexing instead of pointers will make your code clearer, I'd like to encourage you to stick with the indices, unless your program ends up running too slowly, and profiling shows the array indices to be at fault. Chances are it won't run noticeably slower, in part because there are good compilers which, in the spirit of your suggestion, effectively use a pointer reference in the generated code. This is not to say that array indexing is always clearer than pointers--often, a well-conceived pointer algorithm is the right one to use for both clarity and efficiency. It's a judgement call. But some code is just ridiculously clearer to the uninitiated if it uses indices rather than pointers, and at no cost in overall efficiency. A perfect example is argv parsing; compare while(--argc) if(**++argv == '-') processoption(*argv); else processfile(*argv); with for(i = 1; i < argc; i++) if(argv[i][0] == '-') processoption(argv[i]); else processfile(argv[i]); The second example may use an extra variable, but it's a lot more obvious that argv[0] is skipped (as it should be) and it has the additional benefit of leaving the initial values of argc and argv undisturbed. Steve Summit scs@adam.pika.mit.edu
cik@l.cc.purdue.edu (Herman Rubin) (09/20/88)
In article <68995@sun.uucp>, swilson%thetone@Sun.COM (Scott Wilson) writes: > In article <8809191521.AA17824@ucbvax.Berkeley.EDU> U23405@UICVM (Michael J. Steiner) writes: > >First, I have a question. How and why is array indexing slower than > >pointer arithmetic? They are very similar. Also, I think that compilers > >should automatically translate array indexing into pointer arithmetic when > >dealing with arrays in C. Any comments, opinions? This has appeared before, but maybe it bears repeating. It depends on the architecture and on where the pointers are located (registers vs. memory). Consider copying an array. If the pointers are in registers and reading and writing *(j++) is hardware, pointers will beat indexing. This is the case on the VAX, but not on the PYRAMID. If, on the other hand, the array bases and the index are in registers, and register indexing is _hardware_, array indexing will beat pointers if the previous situation does not hold. Reverse the machines above. The answer depends on the hardware, the storage classes, the timing, and for machines which have instruction overlap, on the overlaps available. An attempt to force the issue is WRONG. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
harish@ece-csc.UUCP (Harish Hiriyannaiah) (09/21/88)
In article <68995@sun.uucp> swilson@sun.UUCP (Scott Wilson) writes:
->dealing with arrays in C. Any comments, opinions?
-
-In K&R pg. 186 it says: "E1[E2] is identical (by definition) to
-*((E1)+(E2))." So array indexing is exactly the same as pointer
-arithmetic. What you might be hearing is about the efficiency of using
-array indexing in loops versus using pointers. For example:
-
- int i, sum = 0, array[100];
-
- for (i = 0; i < 100; i++)
- sum += array[i];
-
-is less efficient than:
-
- int i, sum = 0, array[100], *ip = array;
-
- for (i = 0; i < 100; i++, ip++)
- sum += *ip;
-
-In the first case, using array[i] results in a shift and an add
-to compute the address of the array element. (If the array element
-size was not a power of two then a multiplication would be needed.)
-In the second we are just bumping a pointer up each time through
-the loop. There is more that can be done to the second example
-as well (why increment two vars side-by-side).
All this is really fine if
1. The processor has autoincrement addressing (any decent processor should
have it. )
2. The compiler generates the code for the same.
#2 does not always happen. So, it is best to see the assembler output
before deciding on the indexing strategy. Also, in some processors,indexing
will mean concatenation of the index to the higher order bits of the
address. In this case, indexing might turn out to be faster than indirect
addressing.
--
harish pu. hi. harish@ece-csc.ncsu.edu
{decvax,possibly other backbone sites}!mcnc!ece-csc!harish
I am not, therefore I think not ?
smryan@garth.UUCP (Steven Ryan) (09/21/88)
> for (i=0;i<n;i++) { > a batch of code referring to a[i] > } >versus > tmp = a; > for (i=0;i<n;,i++,tmp++) { > same code referring to *tmp > } Just to muddy the waters. Most cpus allow address expressions like constant_offset+base_register, where base registers are a scarce resource. In the first case, the loop refers to a[i], b[i], ..., the compiler can sometimes generate better code by only putting i in a register and leaving a_base_address, b_base_address, ... as constant offsets. Making a separate pointer for each array can scrunch the register file. Just a reminder that optimisation is not a trivial process.
braner@batcomputer.tn.cornell.edu (Moshe Braner) (09/24/88)
[] I was surprised by the results of some experiments I ran on a Gould box and on the Inmos Transputer. In both cases the speed of bumping pointers turned out to be similar to, or even slower than, using array indexing. This is not too surprising for a[i], but even holds for a[i][j], where the indexing involves a _multiplication_!!! (I was comparing indexing to with double-indirection, i.e. p[i][j] where p is a pointer to an array of pointers to rows of the matrix, and also to with *p++ where p is a pointer to an element inside a row.) It turns out that while on some machines (e.g., 68000) multiplication is an order of magnitude slower than addition or shifts, on some other machines (e.g., the transputer) multiplication is just as fast as an addition! (The transputer is _really_ wierd: 32*32=64 bits is (slightly) _faster_ than 32*32=32 bits... -- they must have the hardware calculate 64 bits in either case, and very efficiently too!) - Moshe Braner Cornell Theory Center, 265 Olin Hall, Cornell University, Ithaca, NY 14853 (607) 255-9401 (Work) <braner@tcgould.tn.cornell.edu> (INTERNET) <braner@crnlthry> or <braner@crnlcam> (BITNET)
news@amdcad.AMD.COM (Network News) (09/24/88)
In article <6396@batcomputer.tn.cornell.edu> braner@tcgould.tn.cornell.edu (Moshe Braner) writes: | [] | | I was surprised by the results of some experiments I ran on a Gould box | and on the Inmos Transputer. In both cases the speed of bumping pointers | turned out to be similar to, or even slower than, using array indexing. | This is not too surprising for a[i], but even holds for a[i][j], where | the indexing involves a _multiplication_!!! (I was comparing indexing | to with double-indirection, i.e. p[i][j] where p is a pointer to an array | of pointers to rows of the matrix, and also to with *p++ where p is a | pointer to an element inside a row.) It turns out that while on | some machines (e.g., 68000) multiplication is an order of magnitude | slower than addition or shifts, on some other machines (e.g., the | transputer) multiplication is just as fast as an addition! I don't think so. Rather, it is probable that the "multiplication" is optimized via strength reduction into a series of shifts and adds. If this series is small, it may be faster than a load (indirection), which would be the reason why array indexing may be faster than two-level indirection in this case. -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
mec@ardent.UUCP (Michael Chastain) (09/24/88)
Good grief. Say NO to low-level performance tweaking. I prefer to use the most natural data representation I can think of. If something's an array, I don't dink around inside it with pointers. More precisely, I restrict my pointer operations to: -- assignment/parameter passing/return value -- dereferencing -- address-of: p2 = &p1[5]; As another poster mentioned: /* Array */ for ( iArg = 1; iArg < argc; iArg++ ) ProcessArg( argv[iArg] ); /* Pointer */ while ( --argc ) ProcessArg( *++argv ); I find the first version more writeable. More debuggable. More readable. And more likely to be correct *the first time*. If you had to debug this code with a typical sdb-like debugger, which would you rather look at? "iArg" or "argv"? I find wild array references easier to deal with than wild pointer references: less likely to occur, easier to debug/find when they do. I find all of these properties more useful than the small and doubtful performance gain that I can get by *increasing the entropy* of my code and data. Suppose you told your boss: "my next project will be: -- 20% sooner -- 20% cleaner -- 30% less bugs -- 5% larger -- 5% slower (in non-real-time-critical areas)" What do you think he or she would say? #if FLAME The last word for you efficiency geeks: #if BIG_COMPUTER My array code vectorizes more easily than your pointers-that-might-be- aliased code. So while you're saving a cycle here, a cycle there, I'm getting 400% performance improvements just by recompiling. #endif #if INTEL_8086 While your object code is thrashing via ES:BX, my object code is loading ES:BX with a fixed value and loading array indexes into SI and DI. And if you point a normal 32-bit pointer into global data or a stack, while I'm using an array index into a (sufficiently well-known) structure, my code is likely to get 16-bit address computations off of DS or SS:BP. Your code will thrash through ES:BX. #endif #endif FLAME Michael Chastain mec@ardent.com {hplabs,uunet}!ardent!mec "He who dies with the most FRIENDS wins."
psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/27/88)
In article <8809191521.AA17824@ucbvax.Berkeley.EDU> U23405@UICVM (Michael J. Steiner) writes: >First, I have a question. How and why is array indexing slower than >pointer arithmetic? They are very similar. Also, I think that compilers >should automatically translate array indexing into pointer arithmetic when >dealing with arrays in C. Any comments, opinions? I waited a long time to respond to this one, because I expected everyone to post the obvious answer. Yes, a[i] is equivalent to *(a+i), *in* *C*; but what about in machine language? Y'know, the stuff the CPU really eats? "The definition of 'adding 1 to a pointer,', and by extension, all pointer arithmetic, is that the increment is scaled by the size in storage of the object is pointed to." [K&R 1st ed, p. 94] This implies multiplying the pointer offset or array index by the size. Granted, the size is known, is one for char's, and is often a power of two; but what about structures that are seventeen bytes long? So, let's consider the (*un*optimized) translation of the following loops into a pseudo-C assembler language, where pointers arithmetic is scaled by the smallest addressable cell (probably a byte): C: for ( i = 0; i < N; ++i ) a[ i ] = b[ i ]; /* a and b are of type t */ /* or equivalently, *(a+i) = *(b+i) machine language: i = 0 begin_for: if ( ! ( i < N ) ) goto end_for tmp = i * sizeof t *( a + tmp ) = *( b + tmp ) ; probably a single instruction i += 1 goto begin_for end_for: TOTAL: N+1 integer assignments, N increments, N copies; N multiples! C: for ( p = a, q = b, i = 0; i < N; ++i ) *p++ = *q++; machine language: p = a q = b i = 0 begin_for: if ( ! ( i < N ) ) goto end_for *p = *q ; on a PDP-11, these next three lines p += sizeof t ; can be replaced by a single instruction q += sizeof t ; ditto VAX, 680x0, etc. i += 1 goto begin_for end_for: TOTAL: 3 integer assignments, 3*N increments, N copies; 0 multiplies! We've traded N multiplies and N assignments for 2*N increments, which can easily (on a PDP-11 or similar CPU) be snuck into auto-increment modes of the copy instruction. Yes, an optimizer would reduce the first to the second, or better. But one of Ritchie's goals was to make it easy for the compiler generate good low-level code from clever high-level code. Moral: A really good C programmer should know assembler language and compiler technology to write ordinary applications efficiently. Homework: Discover what kind of code your compiler generates for *p++ = *q++ when pointing to a char, int, or odd length structure. If p and q are auto, not register, is *p++ = *q++ the same as: *(++p - 1) = *(++q - 1) /* C, not assembler */ (that is to say, very ugly code), or possibly equivalent to: tmp = *p = *q, ++p, ++q, tmp Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com) AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm I'm not speaking for the company, I'm just speaking my mind.
marc@metavax.UUCP (Marc Paige) (09/28/88)
I just finished reading someones rendition of C to MACHINE language! UGHN!! MACHINE language looks something like this (for a 32 bit word): 10001110111110011011011000001111 10010010111011011000110110011101 10100011001111011011100001100111 . . . Assembler may look something like what this person wrote but only with heavy macro use!! Please get terminology straight. ---------------------------------------------------------------------------- "tired and shagged out from a prolonged squawk" - mpfc the parrot sketch
swilson%thetone@Sun.COM (Scott Wilson) (09/29/88)
In article <3628@metavax.UUCP> marc@metavax.UUCP (Marc Paige) writes: >I just finished reading someones rendition of C to MACHINE language! UGHN!! > >MACHINE language looks something like this (for a 32 bit word): > >10001110111110011011011000001111 >10010010111011011000110110011101 >10100011001111011011100001100111 All right, I'll be a weenie. Unless you can see electrons (or holes or whatever the hell is going on down there), machine language doesn't look like anything. At best it's a bit pattern or a collection of electrical characteristics, hardly a language. Your description of it as 1010... is just an abstraction. But I know what you were trying to say so I'll go back to talking to myself now... -- Scott Wilson arpa: swilson@sun.com Sun Microsystems uucp: ...!sun!swilson Mt. View, CA
bill@proxftl.UUCP (T. William Wells) (09/29/88)
In article <70779@sun.uucp> swilson@sun.UUCP (Scott Wilson) writes:
: All right, I'll be a weenie. Unless you can see electrons (or holes or
: whatever the hell is going on down there), machine language doesn't
: look like anything. At best it's a bit pattern or a collection of
: electrical characteristics, hardly a language. Your description of
: it as 1010... is just an abstraction. But I know what you were trying
: to say so I'll go back to talking to myself now...
All written language is "just an abstraction".
In other words, "machine language" is an abstraction of the
patterns of electrons or what have you.
But I know that you know that I know that he knows what
you were trying to say... :-)
---
Bill
You can still reach me at proxftl!bill
But I'd rather you send to proxftl!twwells!bill
bill@proxftl.UUCP (T. William Wells) (09/30/88)
In article <607@ardent.UUCP> mec@ardent.UUCP (Michael Chastain) writes:
: Good grief. Say NO to low-level performance tweaking.
Good grief. Say NO to understanding the language well enough
that "low-level performance tweaking" is natural to the
programmer. Grrr. Note that the quotes here are because I
disagree that the programming practices being condemned
constitute "low-level performance tweaking".
: I prefer to use the most natural data representation I can think of.
: If something's an array, I don't dink around inside it with pointers.
This completely irrelevant: one man's "natural" is another's
perversion. The claim that such is "natural" is only a statement
of a personal preference. My own preference is to think in terms
of pointers exclusively; and I tend not to use array indexing.
Note that I do not advocate using pointers for this reason, I do
so for a completely different one, which I will state shortly.
: If you had to debug this code with a typical sdb-like debugger, which
: would you rather look at? "iArg" or "argv"? I find wild array
: references easier to deal with than wild pointer references: less
: likely to occur, easier to debug/find when they do. I find all of
: these properties more useful than the small and doubtful performance
But *I* don't. Personal preferences aren't natural law.
: gain that I can get by *increasing the entropy* of my code and data.
This asserts that the use of pointers increases the disorder
(that is what entropy is about, after all) in one's code and
data; this is entirely a subjective judgement.
---
At one time, I spent a lot of my spare time improving other
people's code. One of the things I discovered is that almost all
the code I was dealing with had better than 25% execution time
fat. What I mean is that when I merely changed all of the code
to follow my coding practices (which are not anything special,
just obvious things that every coder ought to know), it got that
much better; I am not talking about mere tweaks, nor am I talking
about optimizing particular sections of code. I am talking about
reading each line of code and making obvious changes (things like
using temporaries when needed, adding appropriate register
declarations, eliminating obviously unnecessary function calls,
etc.)
Supposing that the 90-10 rule actually applies (which experience
shows to be something of an exaggeration, and utterly irrelevant
to a lot of well written code), and I were to take the same code
and optimize that 10% as much as possible. I might expect that
to double the speed of the 10%. Now, how much faster would this
unrealistic example get? 45%.
Now, let us compare these: if the programmer had trained himself
to code my way, he could have written the program to be 25%
faster *with no additional effort*. Or, he could have it 45%
better doing it the original way, after all the additional effort
to optimize that 10% (if we are being optimistic, that is).
If he were to spend that extra effort on the code that follows my
practices, the resultant code would be 59% faster.
The point of all this is that learning coding practices that
result in writing reasonably fast code, and applying them, can
result in code that is as good or better (when measured by
execution time) as code written to standards that ignore
execution speed which is then optimized as necessary. And having
learned to do this, one does not have to spend extra effort to
make faster code, it comes "naturally".
What I am calling for is an understanding of what kind of coding
practices make for good code, and the inclusion of execution
speed in the definition of "good code".
---
Now, as I said, I will justify why *I* use pointers almost to the
exclusion of arrays.
Most of the code I write in which efficiency is a concern falls
into one of two categories: code that takes a long time to run,
but will be run on a particular kind of machine, and code that
takes a fairly short time to run, but will get run on damn near
every kind of machine that is capable of running it at all.
For the code that takes a long time to run, it happens that it
will be run on a more-or-less conventional architecture: this
means that pointer references are generally going to be faster
than indexing.
For the others, what happens is that the unconventional
architectures also run the programs very quickly: the execution
time does not matter for these. However, for a certain class of
conventional processors, the execution time is of great
performance: if the code is not fast enough on an 8086 (or
equivalent), it just isn't useful. For most of these slower
processors, indexing is much slower than using pointers.
Now, for the last case, the reason is compelling: I *must* use
pointers instead of indexing as the code will not be fast
enough. And, at least for my code, the 90-10 rule is pure BS:
rarely does *any* subsystem (never mind particular functions) get
more than 20% of the execution time.
---
The long and short of the above is that neither indexes or
pointers are intrinsically better than the other; programming
being a utilitarian art, "better" depends on what you want to use
it for. And in C, given the equivalence between indexing and
pointer arithmetic, there can be no abstract reason for
preferring one above the other.
---
Bill
You can still reach me at proxftl!bill
But I'd rather you send to proxftl!twwells!bill
cik@l.cc.purdue.edu (Herman Rubin) (09/30/88)
In article <836@proxftl.UUCP>, bill@proxftl.UUCP (T. William Wells) writes: > In article <607@ardent.UUCP> mec@ardent.UUCP (Michael Chastain) writes: > : Good grief. Say NO to low-level performance tweaking. > > Good grief. Say NO to understanding the language well enough > that "low-level performance tweaking" is natural to the > programmer. Grrr. Note that the quotes here are because I > disagree that the programming practices being condemned > constitute "low-level performance tweaking". < Good grief. Say NO to understanding the MACHINE well enough that "low-level performance tweaking" is natural to the programmer. Grrr. Note that the quotes here are because I disagree that the programming practices being condemned constitute "low-level performance tweaking". > I frequently have less difficulty thinking in terms of the machine structure than the arbitrary obfuscated language constructs. The gains are frequently greater at this level. If one thinks this way, it is not at all difficult to find situations in which factors of 10 arise. If I have so little difficulty in understanding machine operations for many different machines, why should it be considered so overwhelmingly difficult for others to be able to read and maintain code written with those considerations in mind? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
rwberry@hubcap.UUCP (Robert W Berry) (09/30/88)
From article <836@proxftl.UUCP>, by bill@proxftl.UUCP (T. William Wells): > In article <607@ardent.UUCP> mec@ardent.UUCP (Michael Chastain) writes: > : Good grief. Say NO to low-level performance tweaking. > ... > At one time, I spent a lot of my spare time improving other > people's code. One of the things I discovered is that almost all > the code I was dealing with had better than 25% execution time > fat. What I mean is that when I merely changed all of the code > to follow my coding practices (which are not anything special, > just obvious things that every coder ought to know), it got that > ... > Now, let us compare these: if the programmer had trained himself > to code my way, he could have written the program to be 25% > faster *with no additional effort*. Or, he could have it 45% > better doing it the original way, after all the additional effort > to optimize that 10% (if we are being optimistic, that is). > > If he were to spend that extra effort on the code that follows my > practices, the resultant code would be 59% faster. > > The point of all this is that learning coding practices that I've been programming for YEARS, and if it's not a professional trade secret, I'd love it if you would summarize your "coding practices" and email them or post them. If you can get half the added efficiency you're talking about, the programming practices be would of interest to the whole computing world. Like you I spend a great deal of my time optimizing code for other people, and I have yet to see more than a 20-25% improvment (even over BAD code.) *** THIS IS NOT A FLAME OF ANY KIND *** I would sincerely like to see these practices. They might give me a new outlook on coding. BTW the way I believe that the 90-10 proposition includes OS overhead, and thus it becomes a much more possible figure. (ie It is really the OS that forms most of that 10% Depending on the complexity of your OS this may often be close to the case.) I do agree on your views of personal preference. Some people just THINK in terms of pointers. For these people writing code with array indexing would be nearly as comfusing as a person who thinks in array indexing trying to write code with pointers. Looking forward to hearing more, Bob -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -=- Bob Berry -=- PC-Guru's Inc. ! INTERNET:rwberry@hubcap.clemson.edu -=- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= ! BITNET:rwberry@clemson -=- -=- This space for rent or lease ! Compuserver:72646,3331 or 73170,1242 -=- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
mouse@mcgill-vision.UUCP (der Mouse) (10/01/88)
In article <607@ardent.UUCP>, mec@ardent.UUCP (Michael Chastain) writes: > /* Array */ > for ( iArg = 1; iArg < argc; iArg++ ) > ProcessArg( argv[iArg] ); > /* Pointer */ > while ( --argc ) > ProcessArg( *++argv ); > I find the first version more writeable. More debuggable. More > readable. And more likely to be correct *the first time*. Your first two words are the most telling: "I find". > If you had to debug this code with a typical sdb-like debugger, which > would you rather look at? "iArg" or "argv"? I would generally shut off the symbolic nonsense and look at offsets from the argument pointer. I don't trust compilers to produce code corresponding to what I wrote any longer. But I would rather look at the pointer version, if you really want to know. > Suppose you told your boss: "my next project will be: > -- 20% sooner > -- 20% cleaner > -- 30% less bugs > -- 5% larger > -- 5% slower (in non-real-time-critical areas)" > What do you think he or she would say? He probably wouldn't care. Next question? :-) But also, I fail to see how forcing myself to code in a style I find unnatural[*] is going to bring any of the improvements from the above list. It most certainly won't bring the first two. [*] Not to imply a value judgement; just because it's not natural for me doesn't mean a thing about anyone else. This also is not to say that I don't use [] notation. I do, sometimes. De gustibus non est disputandum, or whatever it's supposed to be (my knowledge of Latin is right down there with my knowledge of COBOL). > #if FLAME > The last word for you efficiency geeks: > #if BIG_COMPUTER > My array code vectorizes more easily than your > pointers-that-might-be-aliased code. So while you're saving a cycle > here, a cycle there, I'm getting 400% performance improvements just > by recompiling. > #endif Only if your compiler supports noalias or moral equivalent, to allow you tell it that your arrays aren't aliased either. And in that case, I can tell it my pointers aren't aliased.... > #if INTEL_8086 > ....a normal 32-bit pointer.... I was under the impression that a "normal" pointer on the 8086 was 16 bits. Of course, I don't stoop that low very often[%], so I can't claim great familiarity with it. [%] I *do* mean to imply a value judgement here. I think the 8086 is a horrible machine to have to program. > #endif FLAME der Mouse old: mcgill-vision!mouse new: mouse@larry.mcrcim.mcgill.edu
mcdonald@uxe.cso.uiuc.edu (10/01/88)
> I've been programming for YEARS, and if it's not a professional trade >secret, I'd love it if you would summarize your "coding practices" and >email them or post them. If you can get half the added efficiency >you're talking about, the programming practices be would of interest to the >whole computing world. Like you I spend a great deal of my time >optimizing code for other people, and I have yet to see more than a >20-25% improvment (even over BAD code.) > *** THIS IS NOT A FLAME OF ANY KIND *** >I would sincerely like to see these practices. They might give me a new >outlook on coding. I think that 25% or even 60% improvement is not a bad guess. This is only an average, though, and includes, in my experience, some cases of truly great gains , that is 30000% improvement (300 times faster). I'll be happy to share my experiences (I have my flameproof suit on, wear yours). 1. Don't use a bad algorithm. I have seen programs with slow Fourier transforms in them, a factor of 10 to 50 too slow. The NCAR graphics package included a 3d surface contour plotter that drew with a pen plotter in such a way that it picked up the pen after EVERY line segment, even if the next one started where the last one left off (also present in Mathcad)(factor of 8). The latest version (3.9p) of MicroEmacs searches 30 times faster than the previous one by an improved algorithm (though the old one suffered from another problem also.) 2. Don't use a general purpose routine to do a very specific, simple thing. The classic case in C is writing char ss[80]; ss[0] = 'a'; ss[1] = '\0'; printf("%s",ss); instead of putchar('a'); ... note that I'm not talking about the first one happening occasionally in a loop. -2. Don't reinvent a slow wheel if a fast one is lying around for the taking. (Illustrated by the recent thread on Duff's device, where several folks discovered memcpy to be faster. Save hacks like that for where they are NEEDED - like writing to IO space, where memcpy might not work.) 3. Don't use a complicated canned system to push square pegs in round holes. This is where I have found my factors of 300. Graphics systems like GKS and Phigs are the classic case, as are the graphics routines in Microsoft Windows (and, I'll bet a cup of coffee, OS/2), but not, for some odd reason , the Mac. Those things go through so many layers of code from your call to the screen that they are guaranteed to be slow. If you have a simple task (as most of mine are) like drawing a bunch of circles and ellipses and moving them around the screen fast, use a simple tool. Write your own tool to draw ellipses direct to the screen - direct to the hardware. My routines beat Microsoft WIndows by a factor of 10 and IBM's GKS by 100 to 1000 times (that's right, 1000 times faster). Save the big clumsy packages for big clumsy tasks - if your calculations take 10,000,000 lines, they are probably just fine. 4. Don't use nested subroutines to do lots of simple things inside lots of simple things. Write the damn stuff out, even if it makes more code. The classic case here is IBM's PC video bios. 5. Okay, we admit the rest is tweaking. 25% is a good goal. To get the fastest code, the first thing to do is convince yourself (and maybe your boss - luckily I don't have one) that you WANT to get it faster. Then understand the hardware and instruction set of the target computer (portable? what's that? We have already decided to go go the gold medal). Look at the compiler's .asm output. Try changing things like x*65 to x<<6+x. It might be faster, might not. IF your code does indexing like for(i = 0; i < 100; i++)a[i] = i*6; change it to b = a; j = 0; for(i = 0; i < 100; i++, j += 6)*b++ = j; It might (or might not) make a difference. Honestly now - aren't all these things obvious? I would love to hear about non-obvious ones!. Nevertheless, I have seen them missed more often than not. 6. When all else fails, cheat. Use assembly language. Look at the compiler output. Maybe the optimizer isn't perfect. Remove stores of never used values. Play with instruction order. Maybe your processor can overlap floating point computations with fixed point ones - take advantage of this. Maybe you can assign all your variables in a loop to registers and get a big gain there (I got a 15% gain in one graphics routine by storing data in argument pointer and segment registers on the 8086). There are some cases where certain compilers can do a better job than you or me, mainly RISC machines with microscheduling compilers. I'd still like to see what the guy who designed the scheduler could eke out if he tried hard! Doug McDonald D D for(i = 0; i < 100; i++, j+=6)
charette@edsews.EDS.COM (Mark A. Charette) (10/02/88)
In article <836@proxftl.UUCP>, bill@proxftl.UUCP (T. William Wells) writes: > > At one time, I spent a lot of my spare time improving other > people's code. One of the things I discovered is that almost all > the code I was dealing with had better than 25% execution time > fat. What I mean is that when I merely changed all of the code > to follow my coding practices (which are not anything special, > just obvious things that every coder ought to know), it got that > much better; I am not talking about mere tweaks, nor am I talking > about optimizing particular sections of code. I am talking about > reading each line of code and making obvious changes (things like > using temporaries when needed, adding appropriate register > declarations, eliminating obviously unnecessary function calls, > etc.) > We have 6 million lines of PL/1 code we are porting to workstations, mostly well written and modular in form. Your methods were used in writing this code (it was written to take advantage of the IBM 370 architecture). The writers (many, many writers) were trained in making their algorithms fly while keeping the code within ANSI bounds. Lots of effort went into analyzing the object code generated for equivalent algorithms, with the winners (in execution speed) used for the final programs. The port looks like its going to be sheer hell on a team of people for the next two or three years. All the effort taken during the last 12 years to make the code portable while conscientiously taking advantage of the underlying architecture has made optimization on a workstation a nightmare. However, if we limited the programs by keeping them on the mainframes, we would never reach the productivity gains that we hope to attain. All things in moderation. (Of course, 6,000,000 lines of code ain't moderate.) -- Mark Charette "On a clean disk you can seek forever" - Electronic Data Systems Thomas B. Steel Jr. 750 Tower Drive Voice: (313)265-7006 FAX: (313)265-5770 Troy, MI 48007-7019 charette@edsews.eds.com uunet!edsews!charette
gwyn@smoke.ARPA (Doug Gwyn ) (10/03/88)
In article <517@poseidon.UUCP> psrc@poseidon.UUCP (Paul S. R. Chisholm) writes: >Moral: A really good C programmer should know assembler language and >compiler technology to write ordinary applications efficiently. I have to disagree with this. What is "efficient" for one implementation may not be efficient for another, and most real C applications end up being used with several different C implementations. Low-level tweaking should only be done when a specific implementation has unacceptable overhead in some "bottleneck" code, and only after the algorithmic aspects of the application have been made as good as possible. Jon Bentley's book on "Writing Efficient Programs" is recommended reading.
bright@Data-IO.COM (Walter Bright) (10/04/88)
In article <3105@hubcap.UUCP> rwberry@hubcap.UUCP (Robert W Berry) writes: >From article <836@proxftl.UUCP>, by bill@proxftl.UUCP (T. William Wells): >> In article <607@ardent.UUCP> mec@ardent.UUCP (Michael Chastain) writes: > I've been programming for YEARS, and if it's not a professional trade >secret, I'd love it if you would summarize your "coding practices" and >email them or post them. >Like you I spend a great deal of my time >optimizing code for other people, and I have yet to see more than a >20-25% improvment (even over BAD code.) Here's some of the things I do: o Organize complicated if statements so the result of the calculation is known by evaluating it as little as possible. For instance, if (a && b && c) ... organize such that if a is easy to evaluate and usually FALSE, put it first. o Replace stuff like, if (a) x = b; else x = c; with, x = a ? b : c; Many compilers generate better code for the latter. o Replace stuff like, x = (a == 0) ? 1 : 0; with, x = (a == 0); I know it's obvious, but I see it all the time! o Use the ^ operator because many times, a = !a; should really be, a ^= 1; o Avoid things like, a %= 4; Replace with, a &= 3; o In fact, avoid !, / and % like the plague. o Organize control flow such that the most common cases go straight through, this is most important for pipelined machines. o I gag when I see things like, a = strlen("asdf") + 1; instead of, a = sizeof("asdf"); The strlen case above was even printed in a magazine recently to 'prove' that assembler was better than C! o Combine printfs, printf("aksdf aksjdhf kahdf jhdsfhj\n"); printf(" asdkljfhkajshdf djfh kjahsdfkja h\n"); Convert to, printf("aksdf aksjdhf kahdf jhdsfhj\n\ asdkljfhkajshdf djfh kjahsdfkja h\n"); o Try to improve the 'locality' of operations, i.e. move calculations as close as possible to the point where they are used. This helps most compilers to utilize registers better. o Replace int variables with unsigned where possible. This tells the optimizer that the variable can never be negative, making certain optimizations possible. o Put the most frequently accessed member of a struct first, so the offset is 0. o Use char arrays instead of int arrays where possible. Adjust other arrays so that the size of the array elements are a power of 2. (Making the address calculation a simple shift.) o Avoid struct function parameters and return values. o Avoid bit fields, most especially signed ones. o Replace sequences of if..else if... with a switch. o Use realloc instead of malloc/free pairs. Use calloc instead of malloc followed by zeroing each member. o Think about replacing trivial functions with a macro. Floating point code offers more opportunities: o Replace floats with doubles (to avoid the conversion). o Try very hard to replace divides with other operations, as in: x / 10 with: x * .1 o Use functions like ldexp and frexp as much as possible. o See if some calculations can be done using integer math. o Use temporaries to eliminate all common subexpressions. o Reorganize expressions to make the best use of compile-time constant folding, since the compiler is not allowed to do that. And finally: Try looking at the assembler output for your favorite function. You might be surprised! Most of these are probably obvious to you. But I have seen all of the above repeatedly, and have even noticed it myself in going over my own code. Also, of course, all of the above are judgement calls, and they are not universally applicable.
aash@ms.uky.edu ( Aashi Deacon ) (10/04/88)
> o Try very hard to replace divides with other operations, as in: > x / 10 > with: > x * .1 > o See if some calculations can be done using integer math. Are these two points contradictory? Thanks for those good hints but I got stopped when I saw this first point. According to theory, '.1' cannot be represented exactly as a floating point number because in base2 it is irrational. Wouldn't then the first be better in this case? aash aash@ms.uky.edu
peter@ficc.uu.net (value) (10/05/88)
In article <225800077@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu writes: > If you have a simple task (as most of mine are) > like drawing a bunch of circles and ellipses and moving them around > the screen fast, use a simple tool. Write your own tool to draw ellipses > direct to the screen - direct to the hardware. Noooooooooooo! Don't go around a windowing system, particularly a multitasking windowing system like Microsoft Windows (or Amiga Intuition), unless you absolutely have to and are willing to emulate the layer protection for partially obscured bitmaps. I don't know about Windows, but I once tried going around Intuition -- in a place where it was safe to -- and found only a minimal improvement at the cost of much convenience. I would expect safe places in Windows to be even less common, since it doesn't have a 'screens' concept. Before you do anything like this, TRY IT THE OBVIOUS WAY. If it's too slow, try to cut down on your redraws. I was able to get adequate speed from the PC's BIOS just by doing some curses-type display optimisation. Then if it's still too slow you can go straight to the bitmaps. -- Peter da Silva `-_-' Ferranti International Controls Corporation. "Have you hugged U your wolf today?" peter@ficc.uu.net
gwyn@smoke.ARPA (Doug Gwyn ) (10/05/88)
In article <1700@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes: >Here's some of the things I do: I have to take exception to most of these micro-efficiency tweaks. Most of them have no effect on the code generated by even a moderately good compiler, and nearly all of them make the code more obscure. It is much more important for the source code to be patently correct than for every surplus nanosecond to be squeezed out. Fast, buggy code does nobody a service, and bugs are hard to spot when the source code is obscure. Besides, most efficiency losses are from poor choice of data structures or algorithms, not from fatty source code. > o Replace stuff like, > if (a) > x = b; > else > x = c; > with, > x = a ? b : c; I don't know of any compiler that wouldn't produce the same code for these two cases. Write whichever is more readable in context. > o Replace stuff like, > x = (a == 0) ? 1 : 0; > with, > x = (a == 0); Only if x is conceptually Boolean in the first place. > o Use the ^ operator because many times, > a = !a; > should really be, > a ^= 1; If a is conceptually Boolean, the first is clearer. > o Avoid things like, > a %= 4; > Replace with, > a &= 3; You're assuming a is nonnegative. Any decent compiler will perform such instruction replacements for you. Write whatever is clearest in context. To get a remainder, % is clearer. In the case that the "4" might change, parameterizing the first case will give correct code after it changes, while the second will break unless another power of two is used for the replacement for "4". Thus the first is SAFER. > o In fact, avoid !, / and % like the plague. It's pretty hard to divide without /. You should be sure that the divisor cannot be zero before flow reaches this point in the algorithm. (This does NOT mean to "stick in a test for division by 0".) Because % is not a true modulo operator, its use are limited; however, it does have its uses. It's hard to imagine avoiding ! in readable code. > o I gag when I see things like, > a = strlen("asdf") + 1; > instead of, > a = sizeof("asdf"); The general principle is to compute at compile time whatever CAN be computed at compile time rather than at run time. But sometimes it is clearer to initialize things at run time (not in this example, though). > o Combine printfs, > printf("aksdf aksjdhf kahdf jhdsfhj\n"); > printf(" asdkljfhkajshdf djfh kjahsdfkja h\n"); > Convert to, > printf("aksdf aksjdhf kahdf jhdsfhj\n\ > asdkljfhkajshdf djfh kjahsdfkja h\n"); Thereby introducing a bug (exercise for the student). The difference in time between these is negligible, but if you're really tweaking for efficiency puts() might have been better, depending on the implementation. > o Try to improve the 'locality' of operations, i.e. move calculations > as close as possible to the point where they are used. This helps > most compilers to utilize registers better. Just so long as clarity of the algorithm is maintained. A related point is to declare local variable in blocks rather than all at the beginning of the function body. > o Replace int variables with unsigned where possible. This tells the > optimizer that the variable can never be negative, making certain > optimizations possible. It can also produce slower code; this depends on the implementation. > o Put the most frequently accessed member of a struct first, so the > offset is 0. Not all architectures can access the 0 offset faster than the others. I knew of one that was actually slower. > o Use char arrays instead of int arrays where possible. Char access on a word-oriented machine is likely to be noticeably slower. > o Avoid struct function parameters and return values. This is a matter of interface design. Small structs such as one might use to represent a complex number are not a problem; large structs are more quickly (though less conveniently) passed by reference, so long as not too many references to the members are made inside the function. Forcing the caller to allocate the structure may not be convenient. > o Avoid bit fields, most especially signed ones. I would say rather, use bit fields where they seem to be the natural mechanism, as in device register definitions. > o Replace sequences of if..else if... with a switch. These are not equivalent. Switch wins when there are numerous cases for different integer values of a variable, but not in the more general case of alternative conditions. > o Use realloc instead of malloc/free pairs. Use calloc instead of > malloc followed by zeroing each member. realloc() is not usually faster, and it makes the application bookkeeping harder. calloc() is almost worthless since it stores zero BYTES into the storage area. That may not be appropriate for data types other than the integral types, in particular pointers and floating-point data. > o Think about replacing trivial functions with a macro. Functions are usually better for debugging. If a function is a bottleneck, consider making it a macro, but be aware that macros are not as flexible or as safe as functions. > o Replace floats with doubles (to avoid the conversion). Use the size that best fits the problem, keeping in mind that the standard math library deals with doubles. Float can be faster; ANSI C does not require all floats to be promoted to doubles in expressions. Large arrays almost certainly should be float[] not double[]. > o Try very hard to replace divides with other operations, as in: > x / 10 > with: > x * .1 But this is less accurate, and not necessarily noticeably faster (it depends on the f.p. hardware). Some optimizers will do this for you. If it is more readable, use division when it is the natural way to express the computation; many robust algorithms will naturally not use division since they keep numbers in the range from say -2 to 2 and division by something near 0 could be problematic. > o Use functions like ldexp and frexp as much as possible. No, use them sparingly. Their main value is to pick apart a floating point number into integers which are easier to interchange among heterogeneous processors, for example in network connections. In normal application algorithms there is no value in using these functions. > o Use temporaries to eliminate all common subexpressions. That can actually interfere with a good optimizer. Use temporaries when you suspect that many optimizers will not take care of the common subexpression for you; otherwise if it is not in a bottleneck section of code, leave it alone.
henry@utzoo.uucp (Henry Spencer) (10/05/88)
In article <225800077@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: >>... I have yet to see more than a >>20-25% improvment (even over BAD code.) As Knuth once said, approximately, "in no other branch of engineering is an easily-obtained 10% improvement in performance viewed with scorn". -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bright@Data-IO.COM (Walter Bright) (10/06/88)
In article <10332@s.ms.uky.edu> aash@ms.uky.edu ( Aashi Deacon ) writes:
<< o Try very hard to replace divides with other operations, as in:
<< x / 10
<< with:
<< x * .1
<< o See if some calculations can be done using integer math.
<Are these two points contradictory?
Not if x is a double, which I implied but didn't state.
<According to theory, '.1' cannot be represented exactly as a floating
<point number because in base2 it is irrational. Wouldn't then the
<first be better in this case?
The result may differ in the last bit of the significand because
of this. However, in most cases, that is irrelevant and the
multiply will be significantly faster.
henry@utzoo.uucp (Henry Spencer) (10/06/88)
In article <1700@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes: > o Replace int variables with unsigned where possible. This tells the > optimizer that the variable can never be negative... This can be a mistake on some machines, and I'm not sure how it would help very often. > o ... Use calloc instead of > malloc followed by zeroing each member. Mistake -- the semantics are not the same. Calloc fills with zero bits; that is not necessarily the same thing as zero values, especially for floating-point and pointers. -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bright@Data-IO.COM (Walter Bright) (10/06/88)
In article <8630@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >I have to take exception to most of these micro-efficiency tweaks. For brevity, I've taken the liberty of paraphrasing your objections: >1. Optimizing the algorithm is more important than the low level code. >2. Low level tweaks are not portable. >3. Low level tweaks are less clear. >4. A good compiler should do this optimization for you. >5. On machine X with compiler Y this optimization will make things worse. All true. However, 1. After you've optimized the algorithm, then you go after the low level code. My article was about opportunities for tweaking low level code. 2. Frankly, I don't care about portability to the following machines: o 1's complement o NULL pointers aren't 0 bit patterns o 0 double values aren't 0 bit patterns I've never met one, and I'll worry about them if and when I have the misfortune of dealing with them. I worry about Vaxes, 68ks, 8086, 286, 386, and 32032. 3. A matter of opinion. I've met programmers who would go to extreme lengths to avoid using the ^ & | operators because they thought that bit operations were 'obscure'. 4. Most of us have to use compilers whose optimizations are less than perfect. In fact, there are hardly any global flow analysis compilers out there. 5. In writing code for machines A, B and C, I write to the lowest common denominator, and hand-tweak for the slowest machine. Micro-efficiency gets regularly denounced. Here are some counterarguments: o If your commercial product is slower than the competition, you won't get a chance to take advantage of the maintainability/ portability advantages because you'll be out of business. o The sum of all the tweaks can make the difference (on the PC) between using a small memory model and the large. This gives a lot of leverage to seemingly minor changes. o I derive a lot of personal satisfaction from creating 'lean and mean' code. o Run-time speed makes the difference between a user interface that's crisp and snappy and one that's sloppy and sluggish. o Customers want speed. I've been selling consumer software for years, and I know. Speed *sells*. Telling the customer to buy faster hardware doesn't work. o Streamlining code even in relatively unused sections of code is important, because it reduces the size of the executable. Smaller programs run faster because: . Less virtual memory paging . Might be able to use smaller memory model . Fewer overlays are necessary . More memory is available, so you can trade execution speed for more memory consumption in your data structures . They compile faster . They link faster . They load faster
ggs@ulysses.homer.nj.att.com (Griff Smith) (10/06/88)
In article <8630@smoke.ARPA|, gwyn@smoke.ARPA (Doug Gwyn ) writes: | In article <1700@dataio.Data-IO.COM| bright@dataio.Data-IO.COM (Walter Bright) writes: | |Here's some of the things I do: | | I have to take exception to most of these micro-efficiency tweaks. | Most of them have no effect on the code generated by even a moderately | good compiler, and nearly all of them make the code more obscure. Agreed, but a few might deserve some further comment... | > o Replace stuff like, | > if (a) | > x = b; | > else | > x = c; | > with, | > x = a ? b : c; | | I don't know of any compiler that wouldn't produce the same code | for these two cases. Write whichever is more readable in context. When given the second definition, C++ 2.0 (beta 4) produces the following error message if `x', `a' and `b' are of class `foo' and `c' is a function call that returns a `foo'. "xx.c", line 133: sorry, not implemented: temporary of class foo with destructor needed in ? expression | > o Combine printfs, | > printf("aksdf aksjdhf kahdf jhdsfhj\n"); | > printf(" asdkljfhkajshdf djfh kjahsdfkja h\n"); | > Convert to, | > printf("aksdf aksjdhf kahdf jhdsfhj\n\ | > asdkljfhkajshdf djfh kjahsdfkja h\n"); | | Thereby introducing a bug (exercise for the student). | The difference in time between these is negligible, | but if you're really tweaking for efficiency puts() | might have been better, depending on the implementation. Sequences of printf can be harder to read, however. The following example may show this better: (void) fprintf(stderr, "\ Usage: sltread [options] input-drive-name\n\ Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n\ [-d dataset-name] [-n dataset-number]\n\ [-i ignore-count] [-r record-count]\n"); I grant that this is vulnerable to code hackers and naive pretty-printers messing up the alignment. On the other hand, it is easier to see how everything lines up. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: 1-201-582-7736 UUCP: {most AT&T sites}!ulysses!ggs Internet: ggs@ulysses.att.com
ark@alice.UUCP (Andrew Koenig) (10/06/88)
In article <1703@dataio.Data-IO.COM>, bright@Data-IO.COM (Walter Bright) writes: > The result may differ in the last bit of the significand because > of this. However, in most cases, that is irrelevant and the > multiply will be significantly faster. If, however, you want your compiler to implement IEEE floating point, you had better not substitute a multiply by 0.1 for a divide by 10. IEEE defines the results of addition, subtraction, multiplication, division, square root, and a few other operations precisely -- down to the last bit. Either you get the right answer or you don't. It is true that in many cases users don't need the last bit or so, but a language implementer has an obligation to give it to the people who demand it. -- --Andrew Koenig ark@europa.att.com
diamond@csl.sony.JUNET (Norman Diamond) (10/06/88)
In article <10332@s.ms.uky.edu>, aash@ms.uky.edu ( Aashi Deacon ) writes: -> o Try very hard to replace divides with other operations, as in: -> x / 10 -> with: -> x * .1 > > ... '.1' cannot be represented exactly as a floating > point number because in base2 it is irrational. Wouldn't then the > first be better in this case? In C, efficiency is more important than correctness. Haven't you also been following the other disputes in this news group? -- ------------------------------------------------------------------------------- The above opinions are my own. | Norman Diamond If they're also your opinions, | Sony Computer Science Laboratory, Inc. you're infringing my copyright. | diamond%csl.sony.jp@relay.cs.net
steve@umigw.MIAMI.EDU (steve emmerson) (10/06/88)
In article <10699@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com (Griff Smith) writes: > (void) fprintf(stderr, "\ >Usage: sltread [options] input-drive-name\n\ >Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n\ > [-d dataset-name] [-n dataset-number]\n\ > [-i ignore-count] [-r record-count]\n"); > >I grant that this is vulnerable to code hackers and naive >pretty-printers messing up the alignment. On the other hand, it is >easier to see how everything lines up. Nonsense. Use instead: #define EPUTS(msg) (void)fputs(msg, stderr) EPUTS("Usage: sltread [options] input-drive-name\n"); EPUTS("Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n"); EPUTS(" [-d dataset-name] [-n dataset-number]\n"); EPUTS(" [-i ignore-count] [-r record-count]\n"); This has all the advantages -- and none of the disadvantages -- of the other. In ANSII-C you could even have the macro append the newline. Don't write vulnerable code (at least not without *very* good reason). -- Steve Emmerson Inet: steve@umigw.miami.edu [128.116.10.1] SPAN: miami::emmerson (host 3074::) emmerson%miami.span@star.stanford.edu UUCP: ...!ncar!umigw!steve emmerson%miami.span@vlsi.jpl.nasa.gov "Computers are like God in the Old Testament: lots of rules and no mercy"
ggs@ulysses.homer.nj.att.com (Griff Smith) (10/07/88)
In article <171@umigw.MIAMI.EDU>, steve@umigw.MIAMI.EDU (steve emmerson) writes: [my original deleted] > Nonsense. Use instead: > #define EPUTS(msg) (void)fputs(msg, stderr) > EPUTS("Usage: sltread [options] input-drive-name\n"); > EPUTS("Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n"); > EPUTS(" [-d dataset-name] [-n dataset-number]\n"); > EPUTS(" [-i ignore-count] [-r record-count]\n"); > > This has all the advantages -- and none of the disadvantages -- of the other. I'm still not crazy about using fputs as an efficiency hack, but this seems to be on the right track. > In ANSII-C you could even have the macro append the newline. If I remember correctly, I can avoid the formatting problem completely in ANSII C and avoid silly macros: (void) fprintf(stderr, "Usage: sltread [options] input-drive-name\n" "Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n" " [-d dataset-name] [-n dataset-number]\n" " [-i ignore-count] [-r record-count]\n" ); > Don't write vulnerable code (at least not without *very* good reason). I try not to. A similar note to the other side: don't make gratuitous changes to other people's code without a good reason. I have been burned several times by having something break because someone else "improved" it. In one case I failed to add a comment to explain why a fairly complicated addition was necessary; someone else decided that it was ugly and stripped it. In another case the comment was there, and ignored. > -- > Steve Emmerson Inet: steve@umigw.miami.edu [128.116.10.1] -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: 1-201-582-7736 UUCP: {most AT&T sites}!ulysses!ggs Internet: ggs@ulysses.att.com
wes@obie.UUCP (Barnacle Wes) (10/07/88)
In article <225800077@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu gives
six good, fairly detailed tips for optimizing C programs. One thing to
keep in mind here is that in general, you are wasting your time trying
to optimize the entire program. Most programs spend a large amount of
their processing time in a few critical routines.
The way to conquer this problem, and not waste a lot of valuable time
optimizing the current program, when you could be adding features to the
next version :-), is to profile your code. Find out which routines the
program spend most of it's time in, and then optimize THOSE routines.
If you spend 3 weeks getting this one routine to run in 350
milliseconds, as opposed to 1 second, you have just wasted 3 weeks if
that routine is only executed once every third or fourth time the user
uses the program.
--
{hpda, uwmcsd1}!sp7040!obie!wes
"How do you make the boat go when there's no wind?"
-- Me --
scs@athena.mit.edu (Steve Summit) (10/07/88)
In article <1703@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes: > The result may differ in the last bit of the significand because > of this. However, in most cases, that is irrelevant and the > multiply will be significantly faster. ...demonstrating, once again, that it's always more important to get the answer fast than get it right :-(. Steve Summit scs@adam.pika.mit.edu (Sorry, Walter, this isn't quite fair, and I understand that being off in the last bit isn't necessarily "wrong," but as long as the church of efficiency is standing around on soapboxes I'm going to stand on mine.)
gwyn@smoke.ARPA (Doug Gwyn ) (10/07/88)
In article <1988Oct5.161523.6695@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >As Knuth once said, approximately, "in no other branch of engineering is an >easily-obtained 10% improvement in performance viewed with scorn". That isn't even a correct obervation, let alone relevant. Consider adding nitro to your gasoline. Unless you're a drag racer it would be pretty stupid, and I'm pretty sure that most automotive engineers would agree. The idea is to strive for a truly optimal solution to a problem, taking ALL relevant factors into proper account. Of course this cannot be done exactly in practice, but it should steer one's approach. Note that some relevant factors might include the availability of resources for software maintenance, adaptability to changing future needs, and other even harder to quantify factors. One thing for sure: Concentrating on a single factor to the exclusion of other obviously relevant ones is quite likely to produce a suboptimal result.
bright@Data-IO.COM (Walter Bright) (10/08/88)
In article <8267@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes: >In article <1703@dataio.Data-IO.COM>, bright@Data-IO.COM (Walter Bright) writes: >> The result may differ in the last bit of the significand because >> of this. However, in most cases, that is irrelevant and the >> multiply will be significantly faster. >If, however, you want your compiler to implement IEEE floating point, >you had better not substitute a multiply by 0.1 for a divide by 10. > >IEEE defines the results of addition, subtraction, multiplication, >division, square root, and a few other operations precisely -- down >to the last bit. Either you get the right answer or you don't. >It is true that in many cases users don't need the last bit or so, >but a language implementer has an obligation to give it to the >people who demand it. You (and some others) misunderstood me. In no case did I recommend the *compiler* perform the substitution. I recommended that the *programmer* do it, of course after making sure that the semantics for *his application* were the same. For example, suppose you wished to convert Celsius to Farenheit in a tight loop. Multiplying by the scale factor is much faster than dividing by the reciprocal, and the answer will be correct to the desired accuracy. Another example is scaling things for a graphics display. It's faster to design the system using a multiplicative scale factor rather than a dividing scale factor. In fact, for all you numerologists, the * may be what the algorithm required instead the / to get the correct result. In floating point, there is a speed/accuracy tradeoff, not a right/wrong answer. Different applications require different approaches. I suggested the replacement of a divide with a multiply because: 1. Many programmers aren't aware that * is much faster than /. 2. Many applications don't care about the difference in result. 3. It's just another thing to look for. I stand by my original comments.
ark@alice.UUCP (Andrew Koenig) (10/08/88)
In article <1706@dataio.Data-IO.COM>, bright@Data-IO.COM (Walter Bright) writes: > For example, suppose you wished to convert Celsius to Farenheit > in a tight loop. Multiplying by the scale factor is much faster > than dividing by the reciprocal, and the answer will be correct > to the desired accuracy. This obviously depends on what you desire. For instance, whether I evaluate f = 32 + c * (9.0/5.0); or f = 32 + c / (5.0/9.0); I will not get as accurate an answer as if I evaluate f = 32 + (c * 9.0) / 5.0; Now, I know that a sensible compiler will do the division inside the parentheses for me at compile time in each of the first two examples above, where the third one does a multiplication *and* a division. The third is clearly the slowest. But on a machine with sensible floating-point, it guarantees me the most accurate result possible whenever c is an integer. In particular, when I convert 100 degrees Celsius to Fahrenheit, I get 212, not 211.9999999999999 or 212.0000000000001. The other two ways of doing it do not guarantee that. -- --Andrew Koenig ark@europa.att.com
rob@kaa.eng.ohio-state.edu (Rob Carriere) (10/09/88)
In article <8273@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes: >In particular, when >I convert 100 degrees Celsius to Fahrenheit [using * and /], I get 212, not >211.9999999999999 or 212.0000000000001. The other two ways of >doing it do not guarantee that. Probably. However, as thermometers more accurate than about +-.1K are fairly rare, what's the diff? You can't rely on the number being more precise than about +-.05 degrees F anyway, due to the inaccuracy of your input. This is exactly the point the original poster was trying to make; if *you* know the last couple of bits don't matter, you might as well take advantage of that fact. If on the other hand, this measurement was made using a Star Trek tricorder (which is accurate to 12 decimal places) and had to be reported to Spock, *then* you'd better be careful that you don't destroy any bits of accuracy, and use your trick (and double precision). Rob Carriere
dharvey@wsccs.UUCP (David Harvey) (10/09/88)
In article <10332@s.ms.uky.edu>, aash@ms.uky.edu ( Aashi Deacon ) writes: > > o Try very hard to replace divides with other operations, as in: > > x / 10 > > with: > > x * .1 > > o See if some calculations can be done using integer math. > > Are these two points contradictory? > Thanks for those good hints but I got stopped when I saw this first point. > > According to theory, '.1' cannot be represented exactly as a floating > point number because in base2 it is irrational. Wouldn't then the > first be better in this case? > > aash > aash@ms.uky.edu For that matter, it is also very difficult to represent 10.0 (I am assuming you are working with floating point) in any floating point representation. Also, if the operation is done in emulation mode (no floating point in MPU or if math coprocessor it is not in machine) the advantage will be nonexistent. In other words, a sequence of left shifts and adds is no better than a sequence of right shifts, et al. Even with the coprocessor (math ops) a MUL takes approximately the same amount of clock cycles a DIV does. You would be much better served by making variables that are used constantly registers (if you have float registers) than some of this stuff. Also, making Fortran indexing go backwards and C's go forwards (better yet, use register pointers) for multiply dimensioned arrays does wonders to reduce the page faulting that normally occurs with multitasking/multiuser machines. dharvey@wsccs PS: Using the registers usually cuts time in half for affected operations, IF you get the registers you asked for.
henry@utzoo.uucp (Henry Spencer) (10/09/88)
In article <8646@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >>As Knuth once said, approximately, "in no other branch of engineering is an >>easily-obtained 10% improvement in performance viewed with scorn". > >That isn't even a correct obervation, let alone relevant. While I'm not really in Knuth's camp on this general issue, I think I must defend his comment. There really are people -- too many of them -- who consider a 10% performance improvement to be so insignificant that it's not a valid argument in a debate. These are generally people who have never done performance-critical programming. -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
mcdonald@uxe.cso.uiuc.edu (10/09/88)
>In article <225800077@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu gives >six good, fairly detailed tips for optimizing C programs. One thing to >keep in mind here is that in general, you are wasting your time trying >to optimize the entire program. Most programs spend a large amount of >their processing time in a few critical routines. >The way to conquer this problem, and not waste a lot of valuable time >optimizing the current program, when you could be adding features to the >next version :-), is to profile your code. Find out which routines the >program spend most of it's time in, and then optimize THOSE routines. Agreed, sometimes. Sometimes it is blatently obvious from the problem description. Of three projects I just finished, all maybe 3000 to 5000 lines, one does most of its work in 40 lines. One spreads out over about 25% of the code. These were obvious from the start (pre-known loop counts). One, my hack of Nelson Beebe's dvi driver, I couldn't tell de novo since I didn't write it. I found a very nice profiler called Inside! ( for the IBM-pc) on Bix. It is a crippled demo version, but got the job done. I found that the work being done is distributed very evenly over about 85% of the code. That means nothing can be done easily. Oh well! Doug McDonald
henry@utzoo.uucp (Henry Spencer) (10/10/88)
For further reading on optimizing C code, you might want to read that absolute gem of technical writing, "News Need Not Be Slow", by Collyer and Spencer, in the proceedings of the Winter 1987 Usenix Conference. (No, we don't get royalties on proceedings sales!) We speeded up a major program by about a factor of 20, without sacrificing portability (no assembler involved). -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
gwyn@smoke.ARPA (Doug Gwyn ) (10/10/88)
In article <1988Oct9.033134.15553@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <8646@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >>>As Knuth once said, approximately, "in no other branch of engineering is an >>>easily-obtained 10% improvement in performance viewed with scorn". >>That isn't even a correct obervation, let alone relevant. >While I'm not really in Knuth's camp on this general issue, I think I must >defend his comment. There really are people -- too many of them -- who >consider a 10% performance improvement to be so insignificant that it's >not a valid argument in a debate. These are generally people who have >never done performance-critical programming. There are times when it matters and times when it doesn't. The latter dominate. For example, most time in a highly interactive system is spent waiting for the human to think. So long as the system response is fast enough, additional improvement is wasted. (If the response is NOT fast enough, then improvement is justifiable.) Another example is spending a lot of developer time on wringing out an additional 10% of code speed by machine-specific micro-level tweaks that make originally simple code hard to maintain, when a much greater improvement would be to simply switch to more capable hardware. This is particularly bad when there are far more problems needing attention than there are qualified problem solvers, as is usually the case. I will agree that UNNECESSARY inefficiency should be avoided. A few percent lost in several places in the code can produce a significant loss for the overall system. But I don't think that "efficiency" should be the software engineer's main concern, just one of many factors that have to be balanced.
dhesi@bsu-cs.UUCP (Rahul Dhesi) (10/10/88)
In article <208@obie.UUCP> wes@obie.UUCP (Barnacle Wes) writes: >One thing to >keep in mind here is that in general, you are wasting your time trying >to optimize the entire program. Most programs spend a large amount of >their processing time in a few critical routines. Let's remember that this behavior is true only for *unoptimized* programs. After you have located the critical routines and speeded them up, the program will spend a large amount of its processing time spread over many routines. *Now*, if you want further significant speed-ups, you can't just find a few hot spots. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi
jimw@microsoft.UUCP (Jim Walsh) (10/12/88)
In article <171@umigw.MIAMI.EDU> steve@umigw.miami.edu (steve emmerson) writes: >Nonsense. Use instead: >#define EPUTS(msg) (void)fputs(msg, stderr) > EPUTS("Usage: sltread [options] input-drive-name\n"); > EPUTS("Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n"); > EPUTS(" [-d dataset-name] [-n dataset-number]\n"); > EPUTS(" [-i ignore-count] [-r record-count]\n"); > >This has all the advantages -- and none of the disadvantages -- of the other. >In ANSII-C you could even have the macro append the newline. I believe that ANSI C allows (at least the implementation I'm using does without flagging it in the manual as a deviation) fputs("Usage: sltread [options] input-drive-name\n" "Options: [-l] [-s] [-v] [-x] [-z] [-t tape-volume-number]\n" " [-d dataset-name] [-n dataset-number]\n" " [-i ignore-count] [-r record-count]\n", stderr); Thus you truly get ALL the advantages - a single function call, and even a few null bytes saved. -- Jim Walsh jimw@microsof.beaver.cs.washington.EDU Microsoft Corporation jimw@microsof@uw-beaver.ARPA jimw@microsof.UUCP The views expressed herein are not necessarily mine, let alone Microsoft's...
tneff@dasys1.UUCP (Tom Neff) (10/14/88)
In article <10332@s.ms.uky.edu>, aash@ms.uky.edu ( Aashi Deacon ) writes: > According to theory, '.1' cannot be represented exactly as a floating > point number because in base2 it is irrational. ... No, .1 is not *irrational* (nonrepeating) in any base. '.1' cannot be represented as a conventional finite binary bit set because its binary representation has a *repeating* mantissa. -- Tom Neff UUCP: ...!cmcl2!phri!dasys1!tneff "None of your toys CIS: 76556,2536 MCI: TNEFF will function..." GEnie: TOMNEFF BIX: t.neff (no kidding)
lfm@fpssun.fps.com (Larry Meadows) (10/15/88)
In article <711@wsccs.UUCP> dharvey@wsccs.UUCP (David Harvey) writes:
:
:For that matter, it is also very difficult to represent 10.0 (I am
:assuming you are working with floating point) in any floating point
:representation.
This is not the case. 10.0 is 8 * 1.25 or 2^3 * 1.01
[base 2].
[P 1s]
In fact, all integers between zero and 1111...1
[base 2]
(where P is the precision) are exactly representable in base 2 floating
point. It is, however, the case that N/10.0 is not necessarily the
same as 0.1*N when a binary floating point representation is used.
--
Larry Meadows @ FPS ...!tektronix!fpssun!lfm
...!nosun!fpssun!lfm
firth@sei.cmu.edu (Robert Firth) (10/18/88)
In article <1700@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes: >Floating point code offers more opportunities: > o Try very hard to replace divides with other operations, as in: > x / 10 > with: > x * .1 No: since 10 is exactly representable and 0.1 is not, this change will very often introduce additional rounding error. > o Reorganize expressions to make the best use of compile-time > constant folding, since the compiler is not allowed to do that. Again no: the compiler is not allowed to make these changes for the simple reason that they change the results of the calculation. Many scientific programmers go to great pains to minimise rounding error; don't undo their work. General advice to a maintainer of "scientific" programs: do not TOUCH any floating-point code unless you know at least as much numerical analysis as its author.
sarima@gryphon.CTS.COM (Stan Friesen) (10/20/88)
In article <225800077@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: > > Then understand the hardware and instruction set of the >target computer (portable? what's that? We have already decided to go >go the gold medal). Look at the compiler's .asm output. Try changing >things like x*65 to x<<6+x. It might be faster, might not. Well, as far as I can tell, the Portable C Compiler does this for you, at least with -O set. I have seen the equivalent of the above in so many places in the system that I cannot imagine it being in the source code. In fact I have seen it in once-executed start-up code, which very few programmers would bother tweaking to this level. -- Sarima Cardolandion sarima@gryphon.CTS.COM aka Stanley Friesen rutgers!marque!gryphon!sarima Sherman Oaks, CA
dan@ccnysci.UUCP (Dan Schlitt) (10/21/88)
In article <8630@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >In article <1700@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes: >>Here's some of the things I do: > >I have to take exception to most of these micro-efficiency tweaks. >Most of them have no effect on the code generated by even a moderately >good compiler, and nearly all of them make the code more obscure. >It is much more important for the source code to be patently correct >than for every surplus nanosecond to be squeezed out. Fast, buggy >code does nobody a service, and bugs are hard to spot when the source >code is obscure. Besides, most efficiency losses are from poor >choice of data structures or algorithms, not from fatty source code. [Much deleted] Doug clears up much misguided stuff. > >> o Use temporaries to eliminate all common subexpressions. > >That can actually interfere with a good optimizer. Use >temporaries when you suspect that many optimizers will not >take care of the common subexpression for you; otherwise >if it is not in a bottleneck section of code, leave it alone. The first language I coded in was SOAPII and I soon moved on to early versions of fortran. Compilers in those days were dumb and hand optimization in the source code and the assembler output were the thing of the day. This was undoubtedly encouraged by youthful visions of infalibility. Also nobody thought seriously about hauling their programs off to some other kind of computer. Compilers are much better now. Age has encouraged a modesty which allows that the compiler may well know how to do things better than I do. That is certainly the case when I take my code off to some unfamiliar computer. I have concluded, and I think this is what Doug is saying, that one should not code in a way that will confuse either the reader of the code or the compiler. If your compiler doesn't do a good job of optimization then your money probably is spent better on a better compiler than it is on the time of coders diddling with the code. But that is just the opinion of some old codger and probably should be ignored. -- Dan Schlitt Manager, Science Division Computer Facility dan@ccnysci City College of New York dan@ccnysci.bitnet New York, NY 10031 (212)690-6868
mcdonald@uxe.cso.uiuc.edu (10/23/88)
>Besides, most efficiency losses are from poor >choice of data structures or algorithms, not from fatty source code. Agreed.... But..... >Compilers are much better now. Age has encouraged a modesty which >allows that the compiler may well know how to do things better than I >do. That is certainly the case when I take my code off to some >unfamiliar computer. >I have concluded, and I think this is what Doug<Gwyn - not me> is saying, >that one >should not code in a way that will confuse either the reader of the >code or the compiler. >If your compiler doesn't do a good job of optimization then your money >probably is spent better on a better compiler than it is on the time >of coders diddling with the code. Yes.... But..... Sometimes the better compiler doesn't yet exist. An example is for the 80386. So far no one is selling a 386 compiler that produces code for MS-DOS (NOT, I should say, for 386 native mode, but one that allows using 32 bit instructions in REAL mode - yes, you really can do that!). By having my C compiler emit .asm output and replacing 5 lines of C with real 32 bit instructions I was able to speed up a program by an astounding 40%! That is worth doing. Microsoft - where is that /G3 compiler switch - it is long overdue! Doug McDonald
yaping@eleazar.dartmouth.edu (11/03/88)
[This article was written by Scott Horne, not Yaping Xu. Direct all e-mail responses to me at jalphin@prism.clemson.edu.] In article <10332@s.ms.uky.edu>, aash@ms.uky.edu ( Aashi Deacon ) writes: >> >> ... '.1' cannot be represented exactly as a floating >> point number because in base2 it is irrational. Irrational? Clearly not. --Scott Horne