[comp.sys.acorn] C versus ARM

kortink@utrcu1.UUCP (Kortink John) (02/23/91)

In article <1991Feb18.080958.16120@watdragon.waterloo.edu>
ccplumb@rose.uwaterloo.ca (Colin Plumb) writes :

>kortink@utrcu1.UUCP (Kortink John) wrote:
>>> [...] compiled BASIC is only a tenth the speed of efficiently written
>>> compiled C, and 1/12 that of raw ARM code.
>>
>> Utter nonsense. Good, optimized ARM code is at least twice as fast and
>> short as any C code.
>
>No; the ratio quoted (C is 17% slower) is typical of current compiler
>technology on non-disgusting machines (the Intel i860 is disgusting;
>the ARM is not).  If you can regularly achieve twice the speed of your
>compiler's output, then I suggest that your compiler is not very good.
>
>Some cases can still benefit from human sneakiness, but these are
>usually small.  The ARM is so wonderfully simple, it's hard to hide a
>possibility from a compiler.
>[...]

True, still there is a borderline somewhere along the path to the 'perfect'
code which a smart compiler can't pass, while a human can. It is the problem
content and the programmer's ARM skills that determine how much further the
human will go in the end.

If you need a proof of the 1:2 ratio, look at Impression versus Acorn DTP.
If you need further proof, try writing the machinecode for Translator in C.
I gradually get the impression (no fun intended) that 'C-only' programmers
don't really care about optimization, because they *think* the ratio is
10:12 (so why bother?).

>And in some cases (see Fred Brooks' The Mythical Man-Month), C (or other
>compiled language) can be faster, not because it does X faster than an
>assembler version, but becuase finishing it sooner let the C programmer
>notice that X could be replaced with Y, which is less work.
>[...]

Thats just an example of bad thinking. Look before you leap.

John Kortink

-----------------------------------------------------------------------------
Student of Informatics at the University of Twente, The Netherlands
MAIL : kortink@utrcu1.uucp
DISCLAIMER : you know ....             "If language were liquid
                                        it would be rushing in
                                        Instead here we are
    Suzanne Vega (Solitude standing)    in a silence more eloquent
                                        than any word could ever be"
------------------------------------------------------------------------------

rst@cs.hull.ac.uk (Rob Turner) (03/01/91)

John Kortink writes:

> ... try writing the machinecode for Translator in C.

This could result in some interesting insights.

If we took a machine code program and translated every instruction
into one C statement (which should be quite a simple task), and
then compiled the C, what would we end up with?

Let's look at some instructions (my memory is a bit rusty, because I
haven't programmed in ARM code for a while):

arm:    mov r0, r1
C:      r0 = r1

arm:    add r0, r4, r6
C:      r0 = r4 + r6

arm:    add r1, r2, r3, lsl #4
C:      r1 = r2 + (r3 << 4)

A decent ARM compiler should be able to compile each of those C
statements above back into one instruction (if you declared the
variables r0, r1, etc to be "register").

I could envisage problems with compiling instructions which set/test
condition codes efficiently. But as long as all the tests were simple
(they *would* be if we were simulating machine code instructions),
again, the code sequences generated would be very similar to the
original.

As long as all our variables were global (as in [most] machine code),
and all our functions had no parameters, I find it hard to believe
that the resulting compiled program would be only half the speed of
the native ARM program. Surely it would be faster than that.

Rob

jroach@acorn.co.uk (Jonathan Roach) (03/04/91)

In article <6397.9103011248@olympus.cs.hull.ac.uk> rst@cs.hull.ac.uk (Rob Turner) writes:

> .... Large lump describing the translation of assembler to C deleted ...

>As long as all our variables were global (as in [most] machine code),
>and all our functions had no parameters, I find it hard to believe
>that the resulting compiled program would be only half the speed of
>the native ARM program. Surely it would be faster than that.

Here we can now focus on what really makes the difference between assembler
and C - the passing of arguments and results from functions. In Rob's
discussion of translating assembler to C he rightly concludes that the
function bodies will be at least as effecient when written in C as when
written in assembler, indeed the compiler may make a better job of
the optimisations that the original assembler programmer :-) (Aside: I note
that some optimisations can be done in assembler which the compiler cannot
safely do itself, and that some assembler instructions are simply not
accessible from C directly - these have an impact on the code size, but for
most accademic programs (which don't have to mess with the processor mode,
and interface heavily with the OS) the effect is small. End aside). What the
compiler in this particular case gets really hammered on is the necessity of
storing away any of the changed 'registers' in their global locations before
calling any functions, and obtaining the values it needs from these global
locations after the function call. This is an enormous overhead! Also,
assuming use was made of function parameters and the return value, C is
still at a disadvantage as it can only return *one* value from a function.
This value, admitedly, could be a whole structure, but, given the particular
calling standard used on the ARM, this doesn't help as you're stuck with a
single 32-bit value.

So, in summary, when writing in C, you may (and probably will) gain in the
function bodies, just to have it thrown away again at function entry and
exit. So, what does this mean? Well, if the size of your final program is
paramount, and you're prepared to put up with a higher code maintenance
cost, then code in assembler (I have not yet been convinced that the
C:assembler ratio is better (for C) than 1.2:1, which goes up to about 2:1
if there's a lot of SWI calling to be done (which is very cumbersome when
done from C, but real natural from assembler :-)). If you're not too worried
about the final code size, then code carefully in C (with comments,
meaningful variable names (with syllables in them :-), indentation etc etc)
In any case, whatever you're coding in, write code which, if you hadn't
written yourself, you wouldn't be boggled by its obscurity if you had to
figure out what it did. Why? - you're going to be reading this code in about
2 weeks time when you've probably forgotten what the **** it did !!!

--Jonathan

klamer@mi.eltn.utwente.nl (Klamer Schutte -- Universiteit Twente) (03/06/91)

In <5538@acorn.co.uk> jroach@acorn.co.uk (Jonathan Roach) writes:

>In article <6397.9103011248@olympus.cs.hull.ac.uk> rst@cs.hull.ac.uk (Rob Turner) writes:

>>As long as all our variables were global (as in [most] machine code),
>>and all our functions had no parameters, I find it hard to believe
>>that the resulting compiled program would be only half the speed of
>>the native ARM program. Surely it would be faster than that.

>Here we can now focus on what really makes the difference between assembler
>and C - the passing of arguments and results from functions. In Rob's
 < stuff deleted >
>and interface heavily with the OS) the effect is small. End aside). What the
>compiler in this particular case gets really hammered on is the necessity of
>storing away any of the changed 'registers' in their global locations before
>calling any functions, and obtaining the values it needs from these global
>locations after the function call. This is an enormous overhead! Also,

This is a nice point you make there!
Off course, the solution is to use another (read: more efficient)
calling sequence. This will involve passing parameters in registers and
minimizing the number of registers declared scratch in the calling sequence.

>cost, then code in assembler (I have not yet been convinced that the
>C:assembler ratio is better (for C) than 1.2:1, which goes up to about 2:1
>if there's a lot of SWI calling to be done (which is very cumbersome when
>done from C, but real natural from assembler :-)). If you're not too worried

When you want an efficient SWI interface, make those a macro to inline
assembly.

The changes i suggest mean that another C compiler is needed.
But C should run faster compared to assembler than it does now; one of
the reason to go to RISC architectures (never mentioned in Acorn docs) is 
that compilers only use a limited set of instructions, so why implement
the rest? Here clearly did go something wrong in the ARM chip / high level
language design.

Klamer

PS And a question for the better-informed: When running on a R160 + ARM3
   under RISCiX a sample program without very much floating point did
   perform only at 10% of a Sun Sparcstation 1+ (rated 15 MIPS).
   The ARM3 should be better than 1.5 (sun) MIPS, isn't he?
   Unix overhead is not the answer as the SS1+ did run SunOs 4.1 against
   berkely 4.3 for the ARM. Where does the difference come from?
-- 
Klamer Schutte
Faculty of electrical engineering -- University of Twente, The Netherlands
klamer@mi.eltn.utwente.nl	{backbone}!mcsun!mi.eltn.utwente.nl!klamer

adam@ste.dyn.bae.co.uk (Adam Curtin) (03/07/91)

In article <klamer.668261007@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte -- Universiteit Twente) writes:
>PS And a question for the better-informed: When running on a R160 + ARM3
>   under RISCiX a sample program without very much floating point did
>   perform only at 10% of a Sun Sparcstation 1+ (rated 15 MIPS).
>   The ARM3 should be better than 1.5 (sun) MIPS, isn't he?
>   Unix overhead is not the answer as the SS1+ did run SunOs 4.1 against
>   berkely 4.3 for the ARM. Where does the difference come from?

I think the main differences are memory and floating point. my old A310 has
excellent integer performance and subroutine call/return - on a simple ackerman
test it's faster than IBM 6150 (PC/RT) and dumps all over Sun 3/60 and Compaq
DeskPro 386/25, while being a good bit slower than a SPARCstation 1 (12.5 MIPS).
I'd expect the ARM3 to be within 20% of a 1+ although I don't have access to
either of those (just ARM2 (4 MIPS) and SS2 (28MIPS :-))
What really killed the Arc against the big boys toys was floating point
performance - the floating point emulator is desperately slow. In fact for
fp-intensive programs it's quicker in BASIC (which doesn't use the FPE). The
SS1+ has 1.4 MFLOPS where the FPE is best measured in KFLOPS :(
Depending on your benchmark, the other areas where I'd put money on the R160
being inferior to the Sun in memory access speed (80ns SIMMs on an SS1+), 
memory capacity and virtual memory performance - faster disks etc. SunOS 4.1
also has a "tmpfs" file system which is effectively a RAM disk, which peps up
compiles a good bit.

If any of these points dominate your benchmark, then I can well believe the
R160 takes ten times as long as a SPARCstation - fast integer performance isn't
everything you know!

I don't know why anyone would buy the RiscIx machines, to tell the truth. IN MY
OPINION the Sun IPC is a better computer for similar money.

Adam
-- 
/home/research/adam/.signature: No such file or directory