[comp.arch] Approaches to computer system performance

tbray@watsol.waterloo.edu (Tim Bray) (01/29/88)

In discussing system performance, in article <180@granite.dec.com> 
jmd@granite.UUCP (John Danskin) suggests:
>	
>	2) Redo the parts that aren't fast enough in assembler.
>
>	3) Change your algorithms so there is more time.
>
Item #3 is right.  Item #2 is WRONG.  The more years I build up in this
business, the more I come to believe that technical sidesteps like
assembler coding, special purpose hardware, and the like do not give
anything like the payback you get from careful attention to algorithms.

Given a routine which you have determined, with careful profiling, to be 
sucking up a lot of your resources, how much more can you really get
out of it than the compiler-generated code?  Unless you are lucky and have
a deep understanding of your machine's architecture *and* the implementation 
of that architecture, and the routine does something for which the compiler 
generates pathologically poor code, probably no more than a few percentage 
points.  (Given a reasonably modern compiler).

In support of my position, let me chant a few phrases: Database machine,
LISP machine, object-oriented architecture, user-written microcode,
Complex Instruction Set, writing operating systems in assembler.

Among other things, I have my doubts about Sun's NeWS for this very same
set of reasons.  System performance is a high-level issue and in general
requires high-level solutions.

Algorithms si!  Low-level hacking no!
Tim Bray, New OED Project, University of Waterloo

res@ihlpe.ATT.COM (Rich Strebendt, AT&T-DSG @ Indian Hill West) (01/30/88)

In article <3081@watcgl.waterloo.edu>, tbray@watsol.waterloo.edu (Tim Bray) writes:
> The more years I build up in this
> business, the more I come to believe that technical sidesteps like
> assembler coding, special purpose hardware, and the like do not give
> anything like the payback you get from careful attention to algorithms.

I have been developing software for on the order of 20 years now.  I
agree completely with Tim.  By polishing and tuning and assemblerizing
and twiddling, you can get a fair bit of improvement.  In order to get
order-of-magnitude improvement you have to find a better algorithm, not
more tweeks and tunes.

As a case in point, I was asked to develop the on-board (burned in ROM)
software for a tape controller.  The goal was an order of magnitude
increase in performance.  The fundamental algorithm of the beast was
rethought and a completely different approach was used than that used
in the old controller.  We got better than the order of magnitude
improvement without one word of assembler.  The software was written in
C, and executes on a stock WE32100 microcomputer.  

So, if you need to get a big performance improvement in a program, do
not dink with the code -- pitch it and rethink the algorithms.

				Rich Strebendt
				...!ihnp4![iwsl6|ihlpe|ihaxa]!res

dan@srs.UUCP (Dan Kegel) (01/30/88)

In article <3081@watcgl.waterloo.edu> tbray@watsol.waterloo.edu writes (sorta):
> Given a routine which is sucking up a lot of your resources, how much 
> can you gain by hand-coding in assembler?
> Unless you are lucky and have a deep understanding of your machine's 
> architecture *and* the implementation of that architecture, 
> probably no more than a few percentage points.
>
>Algorithms si!  Low-level hacking no!
>Tim Bray, New OED Project, University of Waterloo

Yes... but lots of people are that lucky, and sometimes the payoff is great.  
I recently rewrote a self-recursive procedure to pass all parameters in 
registers, yielding a 2x speedup.  The resulting code contains
about ten asm() statements mixed with about fifty C statements.
I almost wish it hadn't worked, though; the code is less readable,
and will have to be redone if we move to another machine.
-- 
   Dan Kegel
   srs!dan@cs.rochester.edu  dan%srs.uucp@cs.rochester.edu  rochester!srs!dan

radford@calgary.UUCP (Radford Neal) (01/31/88)

In article <3081@watcgl.waterloo.edu>, tbray@watsol.waterloo.edu (Tim Bray) writes:

> Given a routine which you have determined, with careful profiling, to be 
> sucking up a lot of your resources, how much more can you really get
> out of it than the compiler-generated code?  Unless you are lucky and have
> a deep understanding of your machine's architecture *and* the implementation 
> of that architecture, and the routine does something for which the compiler 
> generates pathologically poor code, probably no more than a few percentage 
> points.  (Given a reasonably modern compiler).

Well, the last phrase may make your statement true, but if so, I don't
use many "reasonably modern compilers". I find that with a typical UNIX
C compiler, you can often get a factor of two improvement by going to
assembler. I assume here that your C program is written carefully, but
that you haven't been looking at the generated code to see which constructs
are handled better by your compiler, then making changes that have no 
machine-independent motivation.

This isn't to say that algorithmic improvements won't get you even more, but
if you're desperate, you've already done that...

   Radford Neal
   The University of Calgary

matt@srs.UUCP (Matt Goheen) (02/01/88)

In article <578@srs.UUCP> dan@rem.UUCP (Dan Kegel) writes (pretty much):
>In article <3081@watcgl.waterloo.edu> tbray@watsol.waterloo.edu writes (sorta):
>> Given a routine which is sucking up a lot of your resources, how much 
>> can you gain by hand-coding in assembler?
>>
>>Algorithms si!  Low-level hacking no!
>>Tim Bray, New OED Project, University of Waterloo
>
>Yes... but lots of people are that lucky, and sometimes the payoff is great.  
>I recently rewrote a self-recursive procedure to pass all parameters in 
>registers, yielding a 2x speedup.
>-- 
>   Dan Kegel

Must agree with Dan here.  A few months ago I was writing an approximate
integer square root routine.  I found that the assembly version was more
than 3 times faster than the cafefully coded C routine...


-- 
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
- UUCP:	{allegra,rutgers,ames}!rochester!srs!matt	Matt Goheen 	-
- 	"First the pants, THEN the shoes."		S.R. Systems	-
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

henry@utzoo.uucp (Henry Spencer) (02/08/88)

> Well, the last phrase may make your statement true, but if so, I don't
> use many "reasonably modern compilers". I find that with a typical UNIX
> C compiler, you can often get a factor of two improvement by going to
> assembler. I assume here that your C program is written carefully, but
> that you haven't been looking at the generated code to see which constructs
> are handled better by your compiler, then making changes that have no 
> machine-independent motivation.

Well, first, if by "a typical UNIX C compiler" you mean something like the
typical VAX compilers, you are talking about a ten-year-old compiler that
emphasized portability rather than optimization from the start.  (Yes,
there have been improvements to it since, but not drastic ones.)  Modern
compilers are a different story.

And second, making machine-specific changes to your source is not the same
thing as writing in assembler.  Someone unfamiliar with the code might not
understand why you wrote the code that way, but in C he will still be able
to understand what the code is doing.  Not so in assembler.
-- 
Those who do not understand Unix are |  Henry Spencer @ U of Toronto Zoology
condemned to reinvent it, poorly.    | {allegra,ihnp4,decvax,utai}!utzoo!henry

nather@ut-sally.UUCP (Ed Nather) (02/22/88)

In article <1988Feb7.185328.11509@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> > I find that with a typical UNIX
> > C compiler, you can often get a factor of two improvement by going to
> > assembler. 
> 
> Well, first, if by "a typical UNIX C compiler" you mean something like the
> typical VAX compilers, you are talking about a ten-year-old compiler that
> emphasized portability rather than optimization from the start.  (Yes,
> there have been improvements to it since, but not drastic ones.)  Modern
> compilers are a different story.
> 

Well, I'm using Borland's Turbo C "optimizing" compiler v1.0 to write a
time-critical program, and while it does pretty well most of the time,
it sometimes generates code much too slow for my application.  I then
must go to assembler, in desperation.  I believe it would qualify as a
"modern compiler."

As an example, I use large data arrays (long ints) and must clear them
when I start a data acquisition run.  I wrote a straight-forward "for"
loop to do it, and found it was taking much longer than I could tolerate.
After writing the "clear" routine in assembler, using the repeated store
instruction available on the 80x8x chips, I found the execution time to
be 13 times faster, by direct measurement.

While I agree writing in C is both easier and more portable, there are
times when you just can't pay such a high price.  In my view, compilers
have a long way yet to go.

-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
nather@astro.AS.UTEXAS.EDU