[comp.lang.scheme] dynamic compilation for scheme, with inlining, etc?

wilson@uicbert.eecs.uic.EDU (Paul Wilson) (09/15/88)

I've been thinking about using dynamic compilation for Scheme,
and wondering if anybody else is.  Have people abandoned the idea
of dynamic compilation, and if so, was it premature?

Some of the advantages might be:

   1) portability images could still run fast
   2) an optimizer could be called in to optimize those chunks of
      code that are ACTUALLY executed a bunch of times.
      (rather than just optimizing the time/space tradeoff between
       interpreted and compiled code, it could be used to adjust
       compile time/run time tradeoffs.)
   3) Transparent, automatic compilation and optimization.

   In particular, I'm thinking about a scheme in which inlining would
be done automatically, but dependencies would be recorded.  If somebody
changed the value of an inlined procedure or constant, the code
depending on those values would be invalidated and dynamically
recompiled at runtime.
   (The mechanism for this depends on near-lexical-scoping.  Procedures
that are never redefined in a lexically apparent way could be inlined.
Primitives that can modify things (like first-class environments) in
non-lexically-apparent ways would have to check for dependencies.)

   Anybody care to comment on this idea?  In particular, is there some
irreducible cost of dynamic compilation that I don't know about?  Any
advantages?

   (It seems to me that you might dynamically determine what types of
arguments frequently-executed procedures were frequently called with.
You could then compile special versions for the common combinations.
A large-grained check would dispatch to the proper version, within
which most type checking could be eliminated using type inference.
Would this be workable/worthwhile?)


Paul R. Wilson                         
Human-Computer Interaction Laboratory
U. of Illin. at C. EECS Dept. (M/C 154)   wilson%uicbert@uxc.cso.uiuc.edu
Box 4348   Chicago,IL 60680 

adg@cl.cam.ac.uk (Andy Gordon) (09/20/88)

In article <8809142159.AA28551@uicbert.eecs.uic.edu> wilson@uicbert.eecs.uic.EDU (Paul Wilson) writes:
>I've been thinking about using dynamic compilation for Scheme,
>and wondering if anybody else is.  Have people abandoned the idea
>of dynamic compilation, and if so, was it premature?

Firstly, by "dynamic compilation" I understand a system that
represents a programme to start with as a syntax tree or some other form
suitable for interpretation, but compiles parts of it into native code
as execution proceeds.  Interpreted code is generally smaller than compiled
code so less space is needed than compiling the whole thing.

I worked on a related problem for my honours thesis---how to interface
compiled and interpreted code in the Edinburgh Standard ML compiler.  In the
system I built the programmer indicated by annotations for each function
whether it was to be compiled or interpreted, but I did have in mind a scheme
that would do the annotations automatically, based on data from previous runs,
or even dynamically converting interpreted functions into native code if they
were frequently executed.  I found on small benchmarks that native code
was between four and six times bigger than interpreted byte codes, and
ran between one and seven times faster.

Let's call a system that mixes compiled and interpreted code a "hybrid"
system, and then distinguish between "static" and "dynamic" hybrids, i.e.,
between those in which the mode of each function declaration is determined at
compile time and those in which it can change at run time.  So I built a
static hybrid system, and I think you plan to build a dynamic hybrid system.

There appear to be two reasons for hybrid systems: (1) to give a variable
time/space tradeoff, i.e., between fast/bulky native code and slow/lean
interpreted code; (2) to allow fancy interpretive debuggers and tracers
in the presence of native code.

I don't think reason (1) is very compelling these days, because the size of
compiled code is not an issue with today's computers---certainly, big ML
programmes that are compiled to native code by one of the production compilers
don't take up inordinate space.  However, reason (2) is still attractive,
because interpreted code will always be easier to debug than compiled code.  A
dynamic hybrid system is all the work of a static one, and more, so for it to
be worthwhile it must make debugging much easier.  One can imagine a debugger
threading through a programme, seeking to use interpreted versions of compiled
functions as it finds them---maybe you could keep a pointer in each compiled
function to an interpreted version or the source.  I'd be interested to hear
how you get on.

Here are some references you might find useful when implementing your scheme.
I published a technical report [1] that has a comprehensive literature survey
on hybrid compilation/interpretation, and presents a taxonomy of ways to
switch between compiled and interpreted modes.  I have a few copies of it that
I could post to anyone who is interested (email me).  Other papers that might
be of interest: Brown on "throw away compiling" [2], Low on automatically
selecting an appropriate representation of an abstract data type depending on
usage statistics [3] and Dakin and Dawson on a text editor that mixes
compilation and interpretation, together with performance graphs [4].

[1] A. D. Gordon, "How to Breed Hybrid Compiler/Interpreters",
    Technical report ECS--LFCS--88--50, Laboratory for the Foundations of
    Computer Science, Computer Science Dept., University of Edinburgh,
    Edinburgh, EH9 3JZ, Scotland. April, 1988.

[2] P. J. Brown, "Throw-away compiling", SWPE, 6, pages 423--434, 1976.

[3] J.R. Low, "Automatic data structure selection: an example and overview",
    CACM, 21(5), pages 376--385, May 1978.

[4] R. J. Dakin and P. C. Poole, "A mixed code approach", The Computer Journal,
    16(3), pages 219--222, 1973.

ACW@IVORY.S4CC.SYMBOLICS.COM (Allan C. Wechsler) (09/22/88)

    Date: 19 Sep 88 20:36:51 GMT
    From: mcvax!ukc!cam-cl!adg@uunet.uu.net  (Andy Gordon)

[...]

			    Interpreted code is generally smaller than compiled
    code so less space is needed than compiling the whole thing.

This guy's whole approach is based on this premise.  Does anyone want to
disabuse him.  For a simple recursive factorial, for example, I get
about 25 words interpreted, and about ten compiled.  I don't know what
happens with large programs, but I suspect they show about the same
ratio.

Krulwich-Bruce@cs.yale.edu (Bruce Krulwich) (09/23/88)

In article <327@scaup.cl.cam.ac.uk>, adg@cl (Andy Gordon) writes:
>In article <8809142159.AA28551@uicbert.eecs.uic.edu>
>wilson@uicbert.eecs.uic.EDU (Paul Wilson) writes:
>>I've been thinking about using dynamic compilation for Scheme,
>>and wondering if anybody else is.  Have people abandoned the idea
>>of dynamic compilation, and if so, was it premature?
>
>Firstly, by "dynamic compilation" I understand a system that
>represents a programme to start with as a syntax tree or some other form
>suitable for interpretation, but compiles parts of it into native code
>as execution proceeds.  Interpreted code is generally smaller than compiled
>code so less space is needed than compiling the whole thing.

I think you're overlooking several possible program transformations that could
be done in addition to straightforward compilation, such as:

	(1) Inlining functions and de-inlining them when necessary
	(2) Different levels of compiled-code optimization
	(3) Caching of results
	(4) Special case handling of parameters by type
	(5) Analysis of memory use to reduce garbage generation
	(6) Dynamic analysis of ordering of comparisons

>There appear to be two reasons for hybrid systems: (1) to give a variable
>time/space tradeoff, i.e., between fast/bulky native code and slow/lean
>interpreted code; (2) to allow fancy interpretive debuggers and tracers
>in the presence of native code.
>
>I don't think reason (1) is very compelling these days, because the size of
>compiled code is not an issue with today's computers---certainly, big ML
>programmes that are compiled to native code by one of the production compilers
>don't take up inordinate space.  

As I said above, I would bet that this does become an issue when more possible
transformations are considered.

>However, reason (2) is still attractive, because interpreted code will always
>be easier to debug than compiled code.  A dynamic hybrid system is all the
>work of a static one, and more, so for it to be worthwhile it must make
>debugging much easier.  One can imagine a debugger threading through a
>programme, seeking to use interpreted versions of compiled functions as it
>finds them---maybe you could keep a pointer in each compiled function to an
>interpreted version or the source.  I'd be interested to hear how you get on.

This is a good point.  Similar uses might be the ability to redefine a
component of a function without recompiling the entire function, or the 
ability to optimize only the inner loops of a function, 


Bruce Krulwich

hbs@LUCID.COM (Harlan Sexton) (09/23/88)

I wouldn't be surprised to find interpreted representations bigger than
machine-code. However, you can usually get byte-code to be smaller than
machine-code by factors of about 4 to 10 (higher for RISC machines, in my
limited experience), so compiling to byte-codes can win. Also, in spite of
what someone said, memory is NOT so cheap that it doesn't matter how big
the code for an application is, and I can imagine that it will never be so
for certain classes of applications.


--Harlan