[net.works] Not again!? Assembler vs High-Level languages

doug@terak.UUCP (Doug Pardee) (04/04/85)

I continue to receive mail on the "assembly versus high-level language"
issue.  Many of my correspondents claim that a good optimizing compiler
can produce code as good as an assembler programmer can.

I obviously disagree.  If we are talking about "mediocre" programmers,
okay, I'll accept that a mediocre C programmer with an optimizing
compiler can produce as good of code as a mediocre assembler programmer.
But if we're talking about sharp programmers (like me :-), there is no
way that I can get a C compiler to produce as good of code as I can
write in assembler.

I decided to run a (not too badly rigged) experiment.  I took a Pascal
program and rewrote it into both C and assembler for the NS32016 chip.
The program I chose as the guinea pig is the ol' classroom favorite,
the Ackerman function.  This is a very recursive function, with little
actual computation.  I chose it because it was short and I knew that it
would give me the best results (from my point of view) in the test. (I
did admit to a bit of rigging...)

It took me about five times as long to write the assembler version as
the C version (no surprise).  My goal was to include as many refinements
as possible in the assembler version that were flat-out impossible to
code in C.  There were three major areas of attack.

First, the 32000 series *hates* to branch.  So I wrote the main
recursive loop to be totally straight-line code.  I also put all branch
labels on word boundaries; that cuts down the branching penalty.
Secondly, the instructions were carefully chosen based on the published
32016 timing charts.  And most importantly, the "call" overhead was
decimated by passing parameters in registers instead of on the stack,
stacking only the one variable that would be needed later, and by using
the 32000's simple call instruction instead of the "do-all" instruction
that C has to use.

And the results??

The optimized C program did not run faster than the assembler program.
It did not run as fast as the assembler.  It didn't take only 10%, or
only 25% longer.  It couldn't finish given even *twice* as much time.

In fact, it took more than five times as long as the assembler version.
The unoptimized C version took six times as long as the assembler
version.  And both C versions required 6 times as much memory for stack
space as the assembler program did.

I don't claim that the factor of 5 or 6 is at all representative.  I
did choose Ackerman because I knew that calling and branching are the
biggest chinks in the armor, and that's about *all* that Ackerman does.
But the point is still valid:  there is no way that an optimizing
compiler can equal a sharp assembler programmer when code efficiency is
crucial.
-- 
Doug Pardee -- Terak Corp. -- !{hao,ihnp4,decvax}!noao!terak!doug

ksbszabo@wateng.UUCP (Kevin Szabo) (04/07/85)

All this chatter about Assembler Vs H.L.L. is silly.  Doesn't
anybody read `Programming Pearls' by Jon Bentley in CACM?

You drop into assembler only when you absolutely HAVE to.
But first, you get the program right. Once you have it
correct you leave the correct code in as comments, and
you spend some time on the most time critical portion.
Before you drop into assembler though I hope you have
a) examined your algorithm, and b) profiled your code.
You wouldn't want to waste all that time you are going
to spend trying to make it fast!

Doug spent 5 times as much time coding the algorithm so
it was fast.  He could very easily made a small error.
The asm coding forces you to think about the implementation,
not what you are really trying to do, which is solve a problem.

A fast program that gives the wrong answers is useless.

				Kevin
-- 
Kevin Szabo  watmath!wateng!ksbszabo (U of Waterloo VLSI Group, Waterloo Ont.)

doug@terak.UUCP (Doug Pardee) (04/08/85)

My original posting attacked the popular notion that there exists in
some Nirvana an optimizing compiler which produces code as good as
expertly coded assembler.  The response was on a different subject:

> You drop into assembler only when you absolutely HAVE to.
> ...
> Before you drop into assembler though I hope you have
> a) examined your algorithm, and b) profiled your code.

The Unix community apparently disapproves of such practices, since there
is no profiling program in the "toolbox" that is part of 4.2BSD.  I
dunno about System V.

> Doug spent 5 times as much time coding the algorithm so
> it was fast.  He could very easily made a small error.

Heresy!  I'll have you know that I live in the desert so that no one
notices that my shoes never touch the sidewalk when I take a walk in
the rain   :-)
-- 
Doug Pardee -- Terak Corp. -- !{hao,ihnp4,decvax}!noao!terak!doug

brownc@utah-cs.UUCP (Eric C. Brown) (04/09/85)

In article <492@terak.UUCP> doug@terak.UUCP (Doug Pardee) writes:
>My original posting attacked the popular notion that there exists in
>some Nirvana an optimizing compiler which produces code as good as
>expertly coded assembler.  

Again, I would suggest that you look at BLISS;  BLISS produces code that
is almost always *smaller* and usually much *faster* than most assembler code
produced by assembler.

>The Unix community apparently disapproves of such practices, since there
>is no profiling program in the "toolbox" that is part of 4.2BSD.  I
>dunno about System V.

Gee, why don't you look at 'cc -p' and prof?  They're in *this* 4.2 BSD
installation.

Eric C. Brown
brownc@utah-cs
...!seismo!utah-cs!brownc

guy@sun.uucp (Guy Harris) (04/10/85)

> The Unix community apparently disapproves of such practices, since there
> is no profiling program in the "toolbox" that is part of 4.2BSD.  I
> dunno about System V.

Try looking at PROF(1) in the:

	V7
	4.1BSD
	4.2BSD
	System III
	System V, Release 1
	System V, Release 2

manuals.  Then look at GPROF(1) in the 4.2BSD manual, and at PROF(5) in
the System V, Release 2 Programmer's Manual.

	Guy Harris
	sun!guy

hes@ecsvax.UUCP (Henry Schaffer) (04/10/85)

My BASIC interpreter also produces smaller "code" than does an
assembly language program.  (Of course there is a 16k run time file!)
How can any language (I don't know BLISS) produce a smaller program
than does assembler (assuming a competent assembly programmer)?  I'm
more interested in HOW than in a discussion of how many
competent programmers there are.
--henry schaffer

laura@utzoo.UUCP (Laura Creighton) (04/11/85)

The great problem with assembler is that once you get the sucker working,
you never want to touch it again. The linear sort that was just fine
when you were sorting 5 elements is going to be intolerably slow when
you search 500,000 elements -- but do you *really* want to write
quicksort in (let's pick a horrid example) pdp-8 assembler?

I have mostly used assembler, not when I needed something *fast* as
when I needed something *real small*. But there it is easy -- write the
code in C, produce the assembler and then tighten that. (but not on the
pdp-8, though. I don't think there *is* a C compiler for the pdp-8)

Laura Creighton
utzoo!laura

herbie@watdcsu.UUCP (Herb Chong [DCS]) (04/11/85)

In article <5461@utzoo.UUCP> laura@utzoo.UUCP (Laura Creighton) writes:
>I have mostly used assembler, not when I needed something *fast* as
>when I needed something *real small*. But there it is easy -- write the
>code in C, produce the assembler and then tighten that. (but not on the
>pdp-8, though. I don't think there *is* a C compiler for the pdp-8)
>
>Laura Creighton
>utzoo!laura

on some systems other than unix, assembler is used because of the
special restrictions of the design of the system.  in VM/CMS, many
system services are awkward to get at directly from a high-level
language (poor design) but are very useful.  other system services are
designed to not be accessible from a high level language.  CMS itself
occupies a part of the user's address space but is protected against
write by ordinary programs.  a special area (called the transient area)
is reserved for commands that are not part of the kernel but are often
used.  it's only 8k in size, but there are many compelling reasons to
write commands that use that area.

i wrote a version of the Boyer-Moore string matching algorithm to fit
in 8K, and it contains a shell sort, error handler, and interrupt
handler.  the most important reasons for making it transient was to
make it available to be called by any high-level language program and
to use system services not available otherwise.  because CMS is in the
same address space as a user program, it is possible to use kernel code
by branching directly to it instead of doing a system call (SVC)
provided certain conditons are met.  i do i/o by branching directly to
the i/o routines in the kernel.  execution time is reduced by 1/3
because i bypass the system call interface.  this is a well documented,
but seldom used, feature of CMS because of the restrictions on write
access to the kernel.  it is possible (using our Waterloo C compiler)
to write a C program that fits in the transient area, but the minimal
library occupies about 4K, leaving 4K for code and variables.

using assembler partly depends also upon the coding style used.  everyone
writes it their own way unless compelled to do otherwise.  writing code
that is meant to be modified by others requires discipline and consistency.
at waterloo, we modify and maintain our own version of VM/SP3 that is
sufficiently different from others that it takes getting used to. it's
all written in /370 assembler.  maintaining it is no more trouble, in
my opinion, than 4.2bsd code, not because /370 assembler is so good, but
because badly written C code (as most of 4.2bsd is) is worse than well
written assembler.

miscellaneous ramblings from the keyboard of
Herb Chong...

I'm user-friendly -- I don't byte, I nybble....

UUCP:  {decvax|utzoo|ihnp4|allegra|clyde}!watmath!water!watdcsu!herbie
CSNET: herbie%watdcsu@waterloo.csnet
ARPA:  herbie%watdcsu%waterloo.csnet@csnet-relay.arpa
NETNORTH, BITNET, EARN: herbie@watdcs, herbie@watdcsu

brownc@utah-cs.UUCP (Eric C. Brown) (04/12/85)

In article <1002@ecsvax.UUCP> hes@ecsvax.UUCP (Henry Schaffer) writes:
>How can any language (I don't know BLISS) produce a smaller program
>than does assembler (assuming a competent assembly programmer)?  I'm
>more interested in HOW than in a discussion of how many
>competent programmers there are.
>--henry schaffer

Well, it does things that any competent programmer could do, such as
global data-flow optimization, constant folding, variable life-death
analysis, and so forth, but it does it over and over again without
a) getting confused or b) breaking the program.  The best description
of the BLISS-11 compiler is "The Design of an Optimizing Compiler", by
Wulf et. al., published by (I believe) American Elsevier.

Eric C. Brown
brownc@utah-cs
..!seismo!utah-cs!brownc

Execute People, not Programs!!

henry@utzoo.UUCP (Henry Spencer) (04/14/85)

> How can any language (I don't know BLISS) produce a smaller program
> than does assembler (assuming a competent assembly programmer)?  I'm
> more interested in HOW than in a discussion of how many
> competent programmers there are.

Because producing really tight code is an essentially mechanical job,
requiring complex bookkeeping and application of many special tricks.
Programs are much better at this than human beings.  My recollection
is that a human assembler programmer could, given a lot of work, get
code comparable to that of the Bliss compilers for any given small
piece of code.  But the amount of work involved is prohibitive for any
substantial program.

Remember, we are not talking run-of-the-mill assembler programming here,
we are talking serious crunch-the-last-bit maximum-use-of-every-register
efforts.  Mundane compilers generally do not do as well as good human
assembler programmers, since the human programmer applies the simpler
sorts of optimizations routinely.  But real optimizing compilers put
more work into every piece of code than a human programmer can afford
for anything but the most crucial stuff.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

keithd@cadovax.UUCP (Keith Doyle) (04/18/85)

[...................]
> > How can any language (I don't know BLISS) produce a smaller program
> > than does assembler (assuming a competent assembly programmer)?  I'm
> > more interested in HOW than in a discussion of how many
> > competent programmers there are.
> 
> Because producing really tight code is an essentially mechanical job,
> requiring complex bookkeeping and application of many special tricks.
> Programs are much better at this than human beings.  My recollection
> is that a human assembler programmer could, given a lot of work, get
> code comparable to that of the Bliss compilers for any given small
> piece of code.  But the amount of work involved is prohibitive for any
> substantial program.
> 
> Remember, we are not talking run-of-the-mill assembler programming here,
> we are talking serious crunch-the-last-bit maximum-use-of-every-register
> efforts.  Mundane compilers generally do not do as well as good human
> assembler programmers, since the human programmer applies the simpler
> sorts of optimizations routinely.  But real optimizing compilers put
> more work into every piece of code than a human programmer can afford
> for anything but the most crucial stuff.
> -- 
> 				Henry Spencer @ U of Toronto Zoology
> 				{allegra,ihnp4,linus,decvax}!utzoo!henry

However, size is only one criteria that may exist for a given program.
Performance can also be an issue.  Size and performance have no
direct correlation.  In some parts of a program, performance may be the
primary issue.  Size may be a secondary, but still important issue.  Thus
a machine language programmer can decide where to use which techniques.
This may be an argument for smart compilers that produce really small
code that allow for machine coded subroutines that can use table techniques
that cost space if it buys appropriate performance.  Tradeoffs such as
these have to be weighed, and different parts of the same program may
require different techniques in order to optimize it for several conditions
at once.

Keith Doyle
#  {ucbvax,ihnp4,decvax}!trwrb!cadovax!keithd

henry@utzoo.UUCP (Henry Spencer) (04/22/85)

With regard to speed vs. space in optimizing compilers, it is not
unheard-of for optimizing compilers to vary their tradeoffs depending
on the code, e.g. optimizing for speed rather than space inside nested
loops.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

@RUTGERS.ARPA:Bruce_A._Hamilton.OsbuSouth@Xerox.ARPA (04/24/85)

From: Hamilton.OsbuSouth@Xerox.ARPA

------------
    From: Doug Bryan <BRYAN@SU-SIERRA.ARPA>

    domains where size and speed are very important...
    are becoming fewer and fewer...readability and
    understandability become the major qualities of
    "good code".  
------------    
    From: Keith Doyle
    {ucbvax,ihnp4,decvax}!trwrb!cadovax!keithd

    Size and performance have no direct correlation.

---------------
This discussion should move to SOFT-ENG.

There's a lot of horsefeathers floating around here.  First, size and
speed will always be very important, except in the most trivial
applications.  After 30 years or so, I hope we've all learned that EVERY
computing resource (like every other human resource) gets used to the
max.  Especially in today's cut-throat marketplace, all of us except
MAYBE Grid and Cray will continue to try to stuff ten people into a VW,
so to speak.

Size and speed "versus" readability and understandability is a strawman.
If code is elegant (i.e. readable and understandable), it will tend to
be efficient in both size and space.

There's no magic in HOL's.  The Xerox Star and Network Services are
coded entirely in Mesa (similar to Modula-2), in the
million-lines-of-code ballpark, supported by a smattering of
special-purpose microcode, and have encountered the whole ball of wax in
performance issues: time vs. space tradeoffs, project management
questions, how to indoctrinate new hires,...  I don't care HOW
high-level the language, questions of individual style arise and local
conventions must be arbitrated, in any large system.

I grant that HOL's give some readability advantage and (with type
checking) prevent a lot of dumb mistakes, but they also make it easy for
the incompetent to generate hundreds of lines of HOL garbage, instead of
dozens of lines of assembler garbage.  Nothing substitutes for talent.

Size and performance sure ARE correlated.  My disk doesn't swap in zero
time.  Even if you will soon be blessed with a personal workstation with
multimegs of RAM that can always hold your entire working set, that only
moves the problem to the database servers, bitmap processors, etc., that
you interact with.

--Bruce

jans@mako.UUCP (Jan Steinman) (04/25/85)

In article <5523@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>With regard to speed vs. space in optimizing compilers, it is not
>unheard-of for optimizing compilers to vary their tradeoffs depending
>on the code, e.g. optimizing for speed rather than space inside nested
>loops.

This is why Ada has:

	pragma OPTIMIZE (TIME);		-- Tell compiler to make it quick.
	pragma OPTIMIZE (SPACE);	-- Tell compiler to make it small.

Too bad nobody is wholeheartedly implementing OPTIMIZE yet.  (Appologies in
advance if your favorite Ada compiler does an adequate job of this!)

Most of the people in the "optimizing compiler camp" forget that the basic
mechanisms of a particular language are often slow and/or unwieldy.  I will
readily admit that a good optimizer can do about as good a job on the
internals of a procedure as an assembly coder, but it doesn't have much say
in the code needed to invoke the procedure.  (I don't know anything about th
much discussed BLISS, so I can't comment on it.)  most languages get in the
way of optimizers by having a "deep" structure.

If a program is *designed* for a particular architecture, and is *skillfully*
coded in assembly, I have to throw my vote to Doug and the "assembly camp".

STANDARD "ASSEMBLY CAMP" COWARDLY DISCLAIMER:  Of course, the time and
expense of assembly code should only be used when it has been determined that
the best compilers available simply do not generate the desired code, based on
size and time specifications!  (Such as a Smalltalk Virtual Machine!)
-- 
:::::: Jan Steinman		Box 1000, MS 61-161	(w)503/685-2843 ::::::
:::::: tektronix!tekecs!jans	Wilsonville, OR 97070	(h)503/657-7703 ::::::

sth@rayssd.UUCP (05/08/85)

*** REPLACE THIS LINE WITH YOUR MESSAGE ***