stuart@bms-at.UUCP (Stuart Gathman) (04/27/89)
There is one interesting feature of RISC architecture that I haven't seen mentioned much: the fact that an emulated/interpreted program will often run faster than a compiled one. I first noticed this in benchmark results for the ARM where BASIC ran the benchmarks faster than C! The conditions necessary for this to take place are: 1) the system uses slow main memory with a high speed cache. 2) the emulated code is significantly smaller than equivalent direct code. 3) the emulator mostly fits in the cache. 4) the emulator overhead is low enough not to swamp the benefits of 2. The most effective emulators are "semi-compiled", i.e. variable references and branch targets are resolved prior to execution. Table searches will kill condition 4. RM-cobol would be a typical example of an emulator meeting these conditions. (The interpreter is large, but the core routines are small.) This approach turns a RISC machine into a CISC machine with user defined microcode. Unlike loadable microcodes of yesteryear, the cache is swapped by line on a demand basis. -- Stuart D. Gathman <stuart@bms-at.uucp> <..!{vrdxhq|daitc}!bms-at!stuart>
peter@ficc.uu.net (Peter da Silva) (04/28/89)
In article <158@bms-at.UUCP>, stuart@bms-at.UUCP (Stuart Gathman) writes: > There is one interesting feature of RISC architecture that I haven't seen > mentioned much: the fact that an emulated/interpreted program will often > run faster than a compiled one. [explanation of why omitted] Wow, the 1802 looks RISCier all the time. The Forth inner-interpreter call (docol) was faster than the in-line call (standard call/return technique). And a good direct-token-threaded system, with byte tokens, was quite fast. -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.
les@unicads.UUCP (Les Milash) (04/28/89)
In article <4015@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >In article <158@bms-at.UUCP>, stuart@bms-at.UUCP (Stuart Gathman) writes: >> [...] emulation/interpretion on RISCs [...] the scheme-("screme")-on-88K article in ASPLOS-III is neat > > >Wow, the 1802 looks RISCier all the time. ----------^^^^ chough choke ghasp! yes it fully exposes starvation-for-data to the compiler. this is the chip for which the term "memory bottlewidth" was coined. i'm kind of fond of it, tho; single-step in hardware, would run off a lantern battery, would run at .1Hz (for hands-off single-stepping) if you had one of them RC clocks. moLester
peter@ficc.uu.net (Peter da Silva) (04/29/89)
In article <408@unicads.UUCP>, les@unicads.UUCP (Les Milash) writes: > In article <4015@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: > >Wow, the 1802 looks RISCier all the time. > ----------^^^^ chough choke ghasp! > i'm kind of fond of it, tho; Pretty instruction set, too. rather like Japanses minimalist art. A Haiku of an architecture. The computer equivalent of one of those temple gardens... -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.
seanf@sco.COM (Sean Fagan) (05/01/89)
In article <158@bms-at.UUCP> stuart@bms-at.UUCP (Stuart Gathman) writes: >There is one interesting feature of RISC architecture that I haven't seen >mentioned much: the fact that an emulated/interpreted program will often >run faster than a compiled one. >I first noticed this in benchmark results for the ARM where BASIC ran the >benchmarks faster than C! *sigh* This is the result of a nice BASIC interpreter, written in highly-optimized, hand-coded assembly language, versus a poor C compiler. You can get the same results on an Apple ][, but that's more because most C compilers for it are not all that great. Also, there's a good chance that the "benchmark" you saw had many string operations; BASIC is good at that, while C isn't (BASIC can also do floating point operations in single-precision, while C generally doesn't). And, since (if I remember correctly) the ARM doesn't have a floating-point unit, being able to not have to convert from single-precision to double-precision (using only software!) can be a *big* win. In other words, it's not a feature of RISC architecture. -- Sean Eric Fagan | "An acid is like a woman: a good one will eat seanf@sco.UUCP | through your pants." -- Mel Gibson, Saturday Night Live (408) 458-1422 | Any opinions expressed are my own, not my employers'.
rcbaps@eutrc3.UUCP (Pieter Schoenmakers) (05/02/89)
In article <2624@scolex.sco.COM> seanf@scolex.UUCP (Sean Fagan) writes: >[...] >Also, there's a good chance that the "benchmark" you saw had many string >operations; BASIC is good at that, while C isn't (BASIC can also do floating >point operations in single-precision, while C generally doesn't). And, >since (if I remember correctly) the ARM doesn't have a floating-point unit, >being able to not have to convert from single-precision to double-precision >(using only software!) can be a *big* win. > >In other words, it's not a feature of RISC architecture. The ARM BASIC V interpreter uses 5byte (non IEEE) floating point arithmic, without using the FP coprocessor. The C compiler uses the FP coprocessor. If it is not fitted, the FP Emulator is used. All arithmic is performed according to IEEE standards in full precision. That takes time. Thus to compare the two fully, the FP Coprocessor should be added to the system. In other words: It's not a feature (of any architecture). Tiggr -- | Pieter 'Tiggr' Schoenmakers | What Informix presented to the world as being | | rcbaps@eutrc3.uucp | revolutionary is in fact a really bad program | | rcgbbaps@heitue51.bitnet | and not even worth one dollar --- about WingZ | ++ All opinions expressed herein which are not quoted are mine! Mine! MINE! +++
ath@helios.prosys.se (Anders Thulin) (05/02/89)
In article <2624@scolex.sco.COM> seanf@scolex.UUCP (Sean Fagan) writes: >In article <158@bms-at.UUCP> stuart@bms-at.UUCP (Stuart Gathman) writes: >>I first noticed this in benchmark results for the ARM where BASIC ran the >>benchmarks faster than C! > >*sigh* This is the result of a nice BASIC interpreter, written in >highly-optimized, hand-coded assembly language, versus a poor C compiler. The ARM C compiler (from Norcroft) is actually quite good. I wouldn't expect a compute-bound integer benchmark to run faster in Basic than in C. A floating-point benchmark probably would, though, as the ARM Basic uses its own FP format, while C uses the IEEE emulator. >In other words, it's not a feature of RISC architecture. This may still be true, though. -- Anders Thulin INET : ath@prosys.se Programsystem AB UUCP : ...!{uunet,mcvax}!sunic!prosys!ath Teknikringen 2A PHONE: +46 (0)13 21 40 40 S-583 30 Linkoping, Sweden FAX : +46 (0)13 21 36 35
sam@lfcs.ed.ac.uk (S. Manoharan) (05/02/89)
Just wondering. I hear that the RISC (Berk) was designed with a view of supporting C. How would the register windowing help in processing C functions! ( In C args are passed by value rather than reference; Register windowing, on the other hand, supports call by reference ) Voice: 031-667 5076 S. Manoharan Janet: sam@uk.ac.ed.lfcs Dept of Computer Science Uucp : ..!mcvax!ukc!lfcs!sam University of Edinburgh Arpa : sam%lfcs.ed.ac.uk@nsfnet-relay.ac.uk Edinburgh EH9 3JZ UK.
frazier@oahu.cs.ucla.edu (Greg Frazier) (05/03/89)
In article <1896@etive.ed.ac.uk> sam@lfcs.ed.ac.uk (S. Manoharan) writes: > >Just wondering. > >I hear that the RISC (Berk) was designed with a view of supporting C. >How would the register windowing help in processing C functions! >( In C args are passed by value rather than reference; >Register windowing, on the other hand, supports call by reference ) The use of overlapped windows supports passing by either reference or value (a reg can hold an address as easily as an int). The RISC I and RISC II were tuned for C in that extensive studies of C code were used to determine 1) the size of the windows and 2) the number of regs which should be overlapped. An interesting question is how much difference the language makes - to what degree does the language really influence the number of parameters passed to a procedure/fnxn? In addition to the reg windows, the instruction set was chosen with an eye to which instructions were frequently used by C compilers. Beyond the int'n set and the reg file, there isn't much to the RISC machines, so if they were optimized for C, then the whole chip was optimized for C :-). Greg ***********************########################!!!!!!!!!!!!!!!!!!!! Greg Frazier o Internet: frazier@CS.UCLA.EDU CS dept., UCLA /\ UUCP: ...!{ucbvax,rutgers}!ucla-cs!frazier ----^/---- /
johnl@ima.ima.isc.com (John R. Levine) (05/04/89)
In article <1896@etive.ed.ac.uk> sam@lfcs.ed.ac.uk (S. Manoharan) writes: >I hear that the RISC (Berk) was designed with a view of supporting C. >How would the register windowing help in processing C functions! The RISC group did not presuppose a particularly good C compiler. I expect they did their work with PCC. As a result, the fixed per-function overhead of call and return was clearly a problem, and register windows let you make function call and return faster without making the compiler any smarter, by moving the normally time-consuming register save and restore into hardware. The IBM 801 project which was looking at similar problems at about the same time worked with some of IBM's best compiler people. Many of their decisions were similar to the RISC group's, e.g. fixed-length instructions that execute in one cycle, but their register model is quite different -- they have 32 conventional registers. They found that their compiler could do an excellent job of register allocation at compile time, including minimizing saves across procedure calls, so they could use the chip real estate that might have been allocated to an enormous register file to other things. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869 { bbn | spdcc | decvax | harvard | yale }!ima!johnl, Levine@YALE.something Massachusetts has 64 licensed drivers who are over 100 years old. -The Globe
jkrueger@daitc.daitc.mil (Jonathan Krueger) (05/10/89)
In article <158@bms-at.UUCP>, stuart@bms-at (Stuart Gathman) writes: >There is one interesting feature of RISC architecture that I haven't seen >mentioned much: the fact that an emulated/interpreted program will often >run faster than a compiled one. >... >This approach turns a RISC machine into a CISC machine with user defined >microcode. Unlike loadable microcodes of yesteryear, the cache is swapped >by line on a demand basis. One might also cite relational engines for DBMS, which interpret query languages (even when executing "compiled" procedures, they do not generate standalone images, they just pre-fetch and "decode" (parse and optimize) the query. Clearly this approach yields a more flexible system than linking (ISAM, VSAM, etc.) routines into each image. The surprising part is it can be faster, too. One reason is that the engine can do a better job of staying in cache than the images. The engine implements common functions that service everyone. The most often used ones are found in cache. (Sets of images lose even when the operating system supports execution from shared memory, because it's hard to decide at runtime whether each image's routines can service other images.) And unlike loadable microcodes of yesteryear, they allocate fast memory in response to usage patterns that vary at runtime. So next time you get a new release of your RDBMS, just tell management that you're installing new microcode :-) -- Jon Jonathan Krueger ...uunet!daitc!jkrueger jkrueger@daitc.mil (703) 998-4600 My opinions are not necessarily those of my wallpaper.