[comp.lang.c] Just rambling about optimization...

dsimon@eniac.seas.upenn.edu (Derron Simon) (05/09/91)

One utility I was looking for about a year ago is an optimizer for C code
that can parse and optimize the original source.  I don't know why this hasn't
been attempted by any PD writers.  It is definately non-trivial, but I'm 
surprised no one has ever attempted it.  Think about it, a global or local
optimizer with full portable C source...  doesn't it make you smile.  Well,
has anyone heard of anything along this line, or have any opinions on the
matter. 



--
/* Derron Simon                * Internet: dsimon@eniac.seas.upenn.edu       */
/*                             * Phone: (215)573-5650  GEnie: D.SIMON        */
/* University of Pennsylvaina  * FidoNet: 273/702  CS: 72571,1524	     */
/* Moore School of Engineering * Box 0938/3700 Spruce St/Phila./PA 19104     */

steve@taumet.com (Stephen Clamage) (05/09/91)

dsimon@eniac.seas.upenn.edu (Derron Simon) writes:

>One utility I was looking for about a year ago is an optimizer for C code
>that can parse and optimize the original source.  I don't know why this hasn't
>been attempted by any PD writers.  It is definately non-trivial, but I'm 
>surprised no one has ever attempted it.

I'm not surprised.  It is of very little use.  Gains in performance at
the source code level are best achieved by choosing better algorithms.
Reliable optimizations achievable by source code manipulation as you
describe are done anyway by quality compilers.  Many of the best
optimizations depend critically on how the compiler generates code,
and on the characteristics of the machine.  For example, an "obvious"
optimization is to jump around unnecessary code.  Yet on some pipelined
machines, the penalty for a jump is so high it is actually faster to
execute the useless instructions.  Another "obvious" optimization is
to convert certain kinds of multiplication into shifts and adds.  The
right way to do this depends critically on the machine -- some shifts on
some machines are slower than some multiplies.  The compiler can do this
best, and quality compilers do it.

Code which is fiddled to run faster with a given compiler/machine
combination may run much slower with another combination.  The result
of the fiddling is generally not very readable, and hard to maintain.
If you can't get a quality compiler and you have positively identified
a bottleneck in your code, that is the time to look for a way to improve
that stretch of code.  Keep the original straightforward code around
and document your fiddling, so that later maintainers can understand
what has happened and why.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (05/10/91)

In article <721@taumet.com> steve@taumet.com (Stephen Clamage) writes:
> dsimon@eniac.seas.upenn.edu (Derron Simon) writes:
    [ wants a source-code optimizer ]
> I'm not surprised.  It is of very little use.

That's what I mean by counterproductive. But rather than trying to
convince you that any competent human can easily outdo the best
available automated optimizers, let me show you an unarguable advantage
of source-code optimization.

A typical math library takes hours, often even days to compile on each
machine. Installing several libraries can start taking a huge chunk from
the CPU time available for users. Why is compiling so slow? Because the
optimizer spends so much time searching for optimizations that could
have been expressed at the source level.

Guess what? If someone includes a source-optimized version along with
the original code, you won't have to waste nearly as much time doing the
same optimizations again for your machine.

This points to a productive area of research: how to make languages
sufficiently expressive that all interesting optimizations *can* be
expressed portably at the source-code level. Now doesn't this seem more
useful than answering every optimization question with ``It is of very
little use''?

---Dan

rh@smds.UUCP (Richard Harter) (05/10/91)

In article <721@taumet.com>, steve@taumet.com (Stephen Clamage) writes:

> I'm not surprised.  It is of very little use.  Gains in performance at
> the source code level are best achieved by choosing better algorithms.
> Reliable optimizations achievable by source code manipulation as you
> describe are done anyway by quality compilers....

Stephen's advice is good and generally sound.  I just want to add some
qualifiers.

One is that there are quite a few local optimizations that lots of
compilers don't seem to catch.  A key point is that quite often you
know more about the constraints than the compiler can assume.

A more general point is that the advice to "choose better algorithms"
is a bit misleading, although true if we take "algorithm" in its most
general sense.  When one thinks of algorithms one tends to think of
algorithms in the abstract, e.g. sort algorithms, search algorithms,
etc.  Stephen's advice can lead to one trying to pick optimal algorithms
which are then glued together.  However a major cost in many programs
is the glue, i.e. the data management and interfaces.  For example

fooby(...) {... p =malloc(n); ... free(p);}

calls malloc and free in each invocation.  If fooby is not recursive
(so that there is a stack of malloc calls) we gain with

static char *p = 0;
static int sizeof_p = 0;
fooby(...) {
	if (n>sizeof_p) {
		if (p) free(p);
		p = malloc(n);
		sizeof_p = n;
		}
	....
	}

This is a space/time tradeoff.  We keep a permanent array (using space
when not in fooby) and eliminate malloc/free calls in all but a few
calls to fooby.  There are lots of instances where well chosen data
structures let you avoid the cost of copying by setting pointers.
The point is that optimization of programs frequently involves better
design of the data management process.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

gwyn@smoke.brl.mil (Doug Gwyn) (05/11/91)

In article <10550:May918:36:1291@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>This points to a productive area of research: how to make languages
>sufficiently expressive that all interesting optimizations *can* be
>expressed portably at the source-code level.

I think that would be heading in exactly the wrong direction.
Producing good code is the task of the language translator,
not the programmer.  The programmer should be able to concentrate
on solving the problem in abstract terms, letting the translator
deal with low-level details.

torek@elf.ee.lbl.gov (Chris Torek) (05/11/91)

In article <453@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
>The point is that optimization of programs frequently involves better
>design of the data management process.

Indeed.  It has been observed (sorry, no references, but this is not
only popularly known but also true :-) ) that many computers spend most
of their time copying data, rather than `computing'.  This is largely
why a fast bcopy/memcpy/memmove is important.  A fast block copy is
a wonderful thing, but better yet is to eliminate the copies altogether.
If the copies are unavoidable, it may pay to compute while copying,
rather than having the sequence:

	memmove(saved_data, data, len);
	... work over the data ...

---particularly if `working over the data' involves a linear pass over
all the items.  (TCP is a good example of this: it must copy outbound
data somewhere for retransmission, and it must also compute a checksum.
These operations can be done at the same time.)

Of course, when optimizing, the place to start is with a profiler.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

rh@smds.UUCP (Richard Harter) (05/12/91)

In article <13089@dog.ee.lbl.gov>, torek@elf.ee.lbl.gov (Chris Torek) writes:

> Indeed.  It has been observed (sorry, no references, but this is not
> only popularly known but also true :-) ) that many computers spend most
> of their time copying data, rather than `computing'.  This is largely
> why a fast bcopy/memcpy/memmove is important.  A fast block copy is
> a wonderful thing, but better yet is to eliminate the copies altogether.

True enough.  One of the traps in construction of large software is that
when one breaks up into components there is a real tendency copy data
from component to component as a matter of convenience.  For example
suppose we have components A, B, and C with data flowing from A to B to
C.  The easy way to deal with this is to make each component its own source
and sink.  I.e. A creates a data set, passes it to B, and then disposes
of it after B is completed.  B copies its input into its own area, passes
its processed data to C, and then disposes of it, and C does the same
thing.  Stated this baldly, it is clear that A should be the source, C
the sink, and nobody should copy.  However, if we look at the general
case, A, B, C, etc will be defined and implemented independently, i.e.
modularly.  If each module has responsibility for creating and disposing
of its own data then inter-module coupling is minimized and there are
no nasty questions about who disposes of what.  

Which, of course, is why it pays to use object-oriented techniques even
in non OO languages.  In this case it would pay to package the data set
with a reference count and a module of data-set methods.  Each user ups
the count upon access and decrements upon dispose.  When the count goes
to 0 the actual physical disposition is done.  [Operations hidden behind
macro or function calls, of course.]
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.