[comp.arch] RISC and emulated languages

stuart@bms-at.UUCP (Stuart Gathman) (04/27/89)

There is one interesting feature of RISC architecture that I haven't seen
mentioned much: the fact that an emulated/interpreted program will often
run faster than a compiled one.

I first noticed this in benchmark results for the ARM where BASIC ran the
benchmarks faster than C!

The conditions necessary for this to take place are:

1)	the system uses slow main memory with a high speed cache.
2)	the emulated code is significantly smaller than equivalent
	direct code.
3)	the emulator mostly fits in the cache.
4)	the emulator overhead is low enough not to swamp the benefits
	of 2.

The most effective emulators are "semi-compiled", i.e. variable references
and branch targets are resolved prior to execution.  Table searches will
kill condition 4.  RM-cobol would be a typical example of an emulator meeting
these conditions.  (The interpreter is large, but the core routines are
small.)

This approach turns a RISC machine into a CISC machine with user defined
microcode.  Unlike loadable microcodes of yesteryear, the cache is swapped
by line on a demand basis.
-- 
Stuart D. Gathman	<stuart@bms-at.uucp>
			<..!{vrdxhq|daitc}!bms-at!stuart>

peter@ficc.uu.net (Peter da Silva) (04/28/89)

In article <158@bms-at.UUCP>, stuart@bms-at.UUCP (Stuart Gathman) writes:
> There is one interesting feature of RISC architecture that I haven't seen
> mentioned much: the fact that an emulated/interpreted program will often
> run faster than a compiled one.

	[explanation of why omitted]

Wow, the 1802 looks RISCier all the time. The Forth inner-interpreter call
(docol) was faster than the in-line call (standard call/return technique).
And a good direct-token-threaded system, with byte tokens, was quite fast.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

les@unicads.UUCP (Les Milash) (04/28/89)

In article <4015@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>In article <158@bms-at.UUCP>, stuart@bms-at.UUCP (Stuart Gathman) writes:
>> [...] emulation/interpretion on RISCs [...]
the scheme-("screme")-on-88K article in ASPLOS-III is neat
>
>
>Wow, the 1802 looks RISCier all the time.
----------^^^^ chough choke ghasp!  yes it fully exposes 
starvation-for-data to the compiler.  this is the chip
for which the term "memory bottlewidth" was coined.  
i'm kind of fond of it, tho; single-step in hardware,
would run off a lantern battery, would run at .1Hz
(for hands-off single-stepping) if you had one of them
RC clocks.

moLester

peter@ficc.uu.net (Peter da Silva) (04/29/89)

In article <408@unicads.UUCP>, les@unicads.UUCP (Les Milash) writes:
> In article <4015@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
> >Wow, the 1802 looks RISCier all the time.
> ----------^^^^ chough choke ghasp!

> i'm kind of fond of it, tho;

Pretty instruction set, too. rather like Japanses minimalist art. A Haiku
of an architecture. The computer equivalent of one of those temple
gardens...
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

seanf@sco.COM (Sean Fagan) (05/01/89)

In article <158@bms-at.UUCP> stuart@bms-at.UUCP (Stuart Gathman) writes:
>There is one interesting feature of RISC architecture that I haven't seen
>mentioned much: the fact that an emulated/interpreted program will often
>run faster than a compiled one.
>I first noticed this in benchmark results for the ARM where BASIC ran the
>benchmarks faster than C!

*sigh*  This is the result of a nice BASIC interpreter, written in
highly-optimized, hand-coded assembly language, versus a poor C compiler.

You can get the same results on an Apple ][, but that's more because most C
compilers for it are not all that great.

Also, there's a good chance that the "benchmark" you saw had many string
operations; BASIC is good at that, while C isn't (BASIC can also do floating
point operations in single-precision, while C generally doesn't).  And,
since (if I remember correctly) the ARM doesn't have a floating-point unit,
being able to not have to convert from single-precision to double-precision
(using only software!) can be a *big* win.

In other words, it's not a feature of RISC architecture.

-- 
Sean Eric Fagan  | "An acid is like a woman:  a good one will eat
seanf@sco.UUCP   |  through your pants." -- Mel Gibson, Saturday Night Live
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

rcbaps@eutrc3.UUCP (Pieter Schoenmakers) (05/02/89)

In article <2624@scolex.sco.COM> seanf@scolex.UUCP (Sean Fagan) writes:
>[...]
>Also, there's a good chance that the "benchmark" you saw had many string
>operations; BASIC is good at that, while C isn't (BASIC can also do floating
>point operations in single-precision, while C generally doesn't).  And,
>since (if I remember correctly) the ARM doesn't have a floating-point unit,
>being able to not have to convert from single-precision to double-precision
>(using only software!) can be a *big* win.
>
>In other words, it's not a feature of RISC architecture.

The ARM BASIC V interpreter uses 5byte (non IEEE) floating point arithmic,
without using the FP coprocessor. The C compiler uses the FP coprocessor.
If it is not fitted, the FP Emulator is used. All arithmic is performed
according to IEEE standards in full precision. That takes time.
   Thus to compare the two fully, the FP Coprocessor should be added to the
system.

In other words: It's not a feature (of any architecture).

Tiggr

-- 
| Pieter 'Tiggr' Schoenmakers | What Informix presented to the world as being |
| rcbaps@eutrc3.uucp          | revolutionary is in fact a really bad program |
| rcgbbaps@heitue51.bitnet    | and not even worth one dollar --- about WingZ |
++ All opinions expressed herein which are not quoted are mine! Mine! MINE! +++

ath@helios.prosys.se (Anders Thulin) (05/02/89)

In article <2624@scolex.sco.COM> seanf@scolex.UUCP (Sean Fagan) writes:
>In article <158@bms-at.UUCP> stuart@bms-at.UUCP (Stuart Gathman) writes:
>>I first noticed this in benchmark results for the ARM where BASIC ran the
>>benchmarks faster than C!
>
>*sigh*  This is the result of a nice BASIC interpreter, written in
>highly-optimized, hand-coded assembly language, versus a poor C compiler.

The ARM C compiler (from Norcroft) is actually quite good. I wouldn't
expect a compute-bound integer benchmark to run faster in Basic than
in C.  A floating-point benchmark probably would, though, as the ARM
Basic uses its own FP format, while C uses the IEEE emulator.

>In other words, it's not a feature of RISC architecture.

This may still be true, though.

-- 
Anders Thulin			INET : ath@prosys.se
Programsystem AB		UUCP : ...!{uunet,mcvax}!sunic!prosys!ath
Teknikringen 2A			PHONE: +46 (0)13 21 40 40
S-583 30 Linkoping, Sweden	FAX  : +46 (0)13 21 36 35

sam@lfcs.ed.ac.uk (S. Manoharan) (05/02/89)

Just wondering.

I hear that the RISC (Berk) was designed with a view of supporting C.
How would the register windowing help in processing C functions!
( In C args are passed by value rather than reference;
Register windowing, on the other hand, supports call by reference )


Voice: 031-667 5076                          S. Manoharan
Janet: sam@uk.ac.ed.lfcs                     Dept of Computer Science
Uucp : ..!mcvax!ukc!lfcs!sam                 University of Edinburgh
Arpa : sam%lfcs.ed.ac.uk@nsfnet-relay.ac.uk  Edinburgh EH9 3JZ    UK.

frazier@oahu.cs.ucla.edu (Greg Frazier) (05/03/89)

In article <1896@etive.ed.ac.uk> sam@lfcs.ed.ac.uk (S. Manoharan) writes:
>
>Just wondering.
>
>I hear that the RISC (Berk) was designed with a view of supporting C.
>How would the register windowing help in processing C functions!
>( In C args are passed by value rather than reference;
>Register windowing, on the other hand, supports call by reference )

The use of overlapped windows supports passing by either reference
or value (a reg can hold an address as easily as an int).  The RISC I
and RISC II were tuned for C in that extensive studies of C code were
used to determine 1) the size of the windows and 2) the number of
regs which should be overlapped.  An interesting question is how
much difference the language makes - to what degree does the language
really influence the number of parameters passed to a procedure/fnxn?

In addition to the reg windows, the instruction set was chosen with
an eye to which instructions were frequently used by C compilers.
Beyond the int'n set and the reg file, there isn't much to the
RISC machines, so if they were optimized for C, then the whole chip
was optimized for C :-).

Greg
***********************########################!!!!!!!!!!!!!!!!!!!!
Greg Frazier	    o	Internet: frazier@CS.UCLA.EDU
CS dept., UCLA	   /\	UUCP: ...!{ucbvax,rutgers}!ucla-cs!frazier
	       ----^/----
		   /

johnl@ima.ima.isc.com (John R. Levine) (05/04/89)

In article <1896@etive.ed.ac.uk> sam@lfcs.ed.ac.uk (S. Manoharan) writes:
>I hear that the RISC (Berk) was designed with a view of supporting C.
>How would the register windowing help in processing C functions!

The RISC group did not presuppose a particularly good C compiler. I expect
they did their work with PCC. As a result, the fixed per-function overhead of
call and return was clearly a problem, and register windows let you make
function call and return faster without making the compiler any smarter, by
moving the normally time-consuming register save and restore into hardware.

The IBM 801 project which was looking at similar problems at about the same
time worked with some of IBM's best compiler people.  Many of their decisions
were similar to the RISC group's, e.g. fixed-length instructions that execute
in one cycle, but their register model is quite different -- they have 32
conventional registers.  They found that their compiler could do an excellent
job of register allocation at compile time, including minimizing saves across
procedure calls, so they could use the chip real estate that might have been
allocated to an enormous register file to other things.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869
{ bbn | spdcc | decvax | harvard | yale }!ima!johnl, Levine@YALE.something
Massachusetts has 64 licensed drivers who are over 100 years old.  -The Globe

jkrueger@daitc.daitc.mil (Jonathan Krueger) (05/10/89)

In article <158@bms-at.UUCP>, stuart@bms-at (Stuart Gathman) writes:
>There is one interesting feature of RISC architecture that I haven't seen
>mentioned much: the fact that an emulated/interpreted program will often
>run faster than a compiled one.
>...
>This approach turns a RISC machine into a CISC machine with user defined
>microcode.  Unlike loadable microcodes of yesteryear, the cache is swapped
>by line on a demand basis.

One might also cite relational engines for DBMS, which interpret query
languages (even when executing "compiled" procedures, they do not
generate standalone images, they just pre-fetch and "decode" (parse
and optimize) the query.  Clearly this approach yields a more flexible
system than linking (ISAM, VSAM, etc.) routines into each image.  The
surprising part is it can be faster, too.  One reason is that the
engine can do a better job of staying in cache than the images.  The
engine implements common functions that service everyone.  The most
often used ones are found in cache.  (Sets of images lose even when
the operating system supports execution from shared memory, because
it's hard to decide at runtime whether each image's routines can
service other images.)  And unlike loadable microcodes of yesteryear,
they allocate fast memory in response to usage patterns that vary at
runtime.

So next time you get a new release of your RDBMS, just tell management
that you're installing new microcode :-)

-- Jon

Jonathan Krueger 
...uunet!daitc!jkrueger     jkrueger@daitc.mil     (703) 998-4600
		My opinions are not necessarily those of my wallpaper.