kortink@utrcu1.UUCP (Kortink John) (02/23/91)
In article <1991Feb18.080958.16120@watdragon.waterloo.edu> ccplumb@rose.uwaterloo.ca (Colin Plumb) writes : >kortink@utrcu1.UUCP (Kortink John) wrote: >>> [...] compiled BASIC is only a tenth the speed of efficiently written >>> compiled C, and 1/12 that of raw ARM code. >> >> Utter nonsense. Good, optimized ARM code is at least twice as fast and >> short as any C code. > >No; the ratio quoted (C is 17% slower) is typical of current compiler >technology on non-disgusting machines (the Intel i860 is disgusting; >the ARM is not). If you can regularly achieve twice the speed of your >compiler's output, then I suggest that your compiler is not very good. > >Some cases can still benefit from human sneakiness, but these are >usually small. The ARM is so wonderfully simple, it's hard to hide a >possibility from a compiler. >[...] True, still there is a borderline somewhere along the path to the 'perfect' code which a smart compiler can't pass, while a human can. It is the problem content and the programmer's ARM skills that determine how much further the human will go in the end. If you need a proof of the 1:2 ratio, look at Impression versus Acorn DTP. If you need further proof, try writing the machinecode for Translator in C. I gradually get the impression (no fun intended) that 'C-only' programmers don't really care about optimization, because they *think* the ratio is 10:12 (so why bother?). >And in some cases (see Fred Brooks' The Mythical Man-Month), C (or other >compiled language) can be faster, not because it does X faster than an >assembler version, but becuase finishing it sooner let the C programmer >notice that X could be replaced with Y, which is less work. >[...] Thats just an example of bad thinking. Look before you leap. John Kortink ----------------------------------------------------------------------------- Student of Informatics at the University of Twente, The Netherlands MAIL : kortink@utrcu1.uucp DISCLAIMER : you know .... "If language were liquid it would be rushing in Instead here we are Suzanne Vega (Solitude standing) in a silence more eloquent than any word could ever be" ------------------------------------------------------------------------------
rst@cs.hull.ac.uk (Rob Turner) (03/01/91)
John Kortink writes: > ... try writing the machinecode for Translator in C. This could result in some interesting insights. If we took a machine code program and translated every instruction into one C statement (which should be quite a simple task), and then compiled the C, what would we end up with? Let's look at some instructions (my memory is a bit rusty, because I haven't programmed in ARM code for a while): arm: mov r0, r1 C: r0 = r1 arm: add r0, r4, r6 C: r0 = r4 + r6 arm: add r1, r2, r3, lsl #4 C: r1 = r2 + (r3 << 4) A decent ARM compiler should be able to compile each of those C statements above back into one instruction (if you declared the variables r0, r1, etc to be "register"). I could envisage problems with compiling instructions which set/test condition codes efficiently. But as long as all the tests were simple (they *would* be if we were simulating machine code instructions), again, the code sequences generated would be very similar to the original. As long as all our variables were global (as in [most] machine code), and all our functions had no parameters, I find it hard to believe that the resulting compiled program would be only half the speed of the native ARM program. Surely it would be faster than that. Rob
jroach@acorn.co.uk (Jonathan Roach) (03/04/91)
In article <6397.9103011248@olympus.cs.hull.ac.uk> rst@cs.hull.ac.uk (Rob Turner) writes: > .... Large lump describing the translation of assembler to C deleted ... >As long as all our variables were global (as in [most] machine code), >and all our functions had no parameters, I find it hard to believe >that the resulting compiled program would be only half the speed of >the native ARM program. Surely it would be faster than that. Here we can now focus on what really makes the difference between assembler and C - the passing of arguments and results from functions. In Rob's discussion of translating assembler to C he rightly concludes that the function bodies will be at least as effecient when written in C as when written in assembler, indeed the compiler may make a better job of the optimisations that the original assembler programmer :-) (Aside: I note that some optimisations can be done in assembler which the compiler cannot safely do itself, and that some assembler instructions are simply not accessible from C directly - these have an impact on the code size, but for most accademic programs (which don't have to mess with the processor mode, and interface heavily with the OS) the effect is small. End aside). What the compiler in this particular case gets really hammered on is the necessity of storing away any of the changed 'registers' in their global locations before calling any functions, and obtaining the values it needs from these global locations after the function call. This is an enormous overhead! Also, assuming use was made of function parameters and the return value, C is still at a disadvantage as it can only return *one* value from a function. This value, admitedly, could be a whole structure, but, given the particular calling standard used on the ARM, this doesn't help as you're stuck with a single 32-bit value. So, in summary, when writing in C, you may (and probably will) gain in the function bodies, just to have it thrown away again at function entry and exit. So, what does this mean? Well, if the size of your final program is paramount, and you're prepared to put up with a higher code maintenance cost, then code in assembler (I have not yet been convinced that the C:assembler ratio is better (for C) than 1.2:1, which goes up to about 2:1 if there's a lot of SWI calling to be done (which is very cumbersome when done from C, but real natural from assembler :-)). If you're not too worried about the final code size, then code carefully in C (with comments, meaningful variable names (with syllables in them :-), indentation etc etc) In any case, whatever you're coding in, write code which, if you hadn't written yourself, you wouldn't be boggled by its obscurity if you had to figure out what it did. Why? - you're going to be reading this code in about 2 weeks time when you've probably forgotten what the **** it did !!! --Jonathan
klamer@mi.eltn.utwente.nl (Klamer Schutte -- Universiteit Twente) (03/06/91)
In <5538@acorn.co.uk> jroach@acorn.co.uk (Jonathan Roach) writes: >In article <6397.9103011248@olympus.cs.hull.ac.uk> rst@cs.hull.ac.uk (Rob Turner) writes: >>As long as all our variables were global (as in [most] machine code), >>and all our functions had no parameters, I find it hard to believe >>that the resulting compiled program would be only half the speed of >>the native ARM program. Surely it would be faster than that. >Here we can now focus on what really makes the difference between assembler >and C - the passing of arguments and results from functions. In Rob's < stuff deleted > >and interface heavily with the OS) the effect is small. End aside). What the >compiler in this particular case gets really hammered on is the necessity of >storing away any of the changed 'registers' in their global locations before >calling any functions, and obtaining the values it needs from these global >locations after the function call. This is an enormous overhead! Also, This is a nice point you make there! Off course, the solution is to use another (read: more efficient) calling sequence. This will involve passing parameters in registers and minimizing the number of registers declared scratch in the calling sequence. >cost, then code in assembler (I have not yet been convinced that the >C:assembler ratio is better (for C) than 1.2:1, which goes up to about 2:1 >if there's a lot of SWI calling to be done (which is very cumbersome when >done from C, but real natural from assembler :-)). If you're not too worried When you want an efficient SWI interface, make those a macro to inline assembly. The changes i suggest mean that another C compiler is needed. But C should run faster compared to assembler than it does now; one of the reason to go to RISC architectures (never mentioned in Acorn docs) is that compilers only use a limited set of instructions, so why implement the rest? Here clearly did go something wrong in the ARM chip / high level language design. Klamer PS And a question for the better-informed: When running on a R160 + ARM3 under RISCiX a sample program without very much floating point did perform only at 10% of a Sun Sparcstation 1+ (rated 15 MIPS). The ARM3 should be better than 1.5 (sun) MIPS, isn't he? Unix overhead is not the answer as the SS1+ did run SunOs 4.1 against berkely 4.3 for the ARM. Where does the difference come from? -- Klamer Schutte Faculty of electrical engineering -- University of Twente, The Netherlands klamer@mi.eltn.utwente.nl {backbone}!mcsun!mi.eltn.utwente.nl!klamer
adam@ste.dyn.bae.co.uk (Adam Curtin) (03/07/91)
In article <klamer.668261007@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte -- Universiteit Twente) writes: >PS And a question for the better-informed: When running on a R160 + ARM3 > under RISCiX a sample program without very much floating point did > perform only at 10% of a Sun Sparcstation 1+ (rated 15 MIPS). > The ARM3 should be better than 1.5 (sun) MIPS, isn't he? > Unix overhead is not the answer as the SS1+ did run SunOs 4.1 against > berkely 4.3 for the ARM. Where does the difference come from? I think the main differences are memory and floating point. my old A310 has excellent integer performance and subroutine call/return - on a simple ackerman test it's faster than IBM 6150 (PC/RT) and dumps all over Sun 3/60 and Compaq DeskPro 386/25, while being a good bit slower than a SPARCstation 1 (12.5 MIPS). I'd expect the ARM3 to be within 20% of a 1+ although I don't have access to either of those (just ARM2 (4 MIPS) and SS2 (28MIPS :-)) What really killed the Arc against the big boys toys was floating point performance - the floating point emulator is desperately slow. In fact for fp-intensive programs it's quicker in BASIC (which doesn't use the FPE). The SS1+ has 1.4 MFLOPS where the FPE is best measured in KFLOPS :( Depending on your benchmark, the other areas where I'd put money on the R160 being inferior to the Sun in memory access speed (80ns SIMMs on an SS1+), memory capacity and virtual memory performance - faster disks etc. SunOS 4.1 also has a "tmpfs" file system which is effectively a RAM disk, which peps up compiles a good bit. If any of these points dominate your benchmark, then I can well believe the R160 takes ten times as long as a SPARCstation - fast integer performance isn't everything you know! I don't know why anyone would buy the RiscIx machines, to tell the truth. IN MY OPINION the Sun IPC is a better computer for similar money. Adam -- /home/research/adam/.signature: No such file or directory