[comp.compilers] Compilers For Typeless Languages

Will@cup.portal.com (09/21/89)

Do any of you who have written interpreters or compilers for
typeless languages have any thoughts on whether it is worthwhile
to even bother writing a compiler for the language?

Let's say I have a language, X, where every variable is treated by
the user as a character string, and the interpreter knows that in
certain contexts a string implies a non-character type of data and
operation (e.g., "2/3", "3.343434 * 23423.23423").  The user
doesn't ever declare a variable to be of any type; datatypes are
implied by the use of a variable.  Since in many situations there
is no a-priori way to know whether a variable is going to be used
exclusively as a character type, integer, or floating point type,
a compiler for such a language would need to include run-time
support to determine the way in which variables were being used as
statements are executed.  Since you need this run-time support,
some might argue, your compiler is essentially an interpreter and
why bother to write the compiler at all?

With that background, the question is are there other reasons to
write a compiler for such a language?  A few possible reasons that
come to mind:  1) in certain contexts your compiler could
determine that a variable was used exclusively as an int or
floating type and declare it that way in order to be able to do
the math using machine instructions instead of subroutine calls
for math operations; 2) you could optimize those parts of the code
that were machine code; 3) you could include pragmas that let a
programmer instruct the compiler that if a variable is used
exclusively for math in a subroutine that the compiler should
assume that correctly "typed" arguments were passed to the routine
(hence, you gain the benefits of 1 and 2); 4) you could let the
user compile his code for different target processors (e.g. 8086,
80286, 80386, 80486) instead of forcing the least common
denominator (8086) because that's what your interpreter was
compiled to use in order to be able to run on any 80x86 platform.

Are these valid reasons to build a compiler for a typeless
language?  Are there other reasons?  Would one expect to get
substantial performance benefits?

Will
[These issues have been hashed over in the Lisp community for
30 years, and most Lisp systems include compilers.  Anybody have some
good references on compilation issues?  -John]
[From Will@cup.portal.com]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

sra@ecs.southampton.ac.uk (Stephen Adams) (09/27/89)

Compilation is figuring out and doing some things at compile time
(once) so that those things don't have to be done (possibly many times)
at run time.

In his list of reasons for compiling a `typeless language' I think
that Will missed the main source of performance benefit from
compilation which is that the interpreter's case analysis is done once
at compile time rather than many times during run-time.  Interpreting
a loop involves interpreting the loop body for each iteration.  Not so
with compiled code.

I would suggest that compilation is worthwhile unless the primitive
operations are all *so* expensive that the interpreter is spending
`all' of the time doing them and `no' time interpreting the program.
I wouldn't write a compiler for a language of arithmetic expressions
like A+B*C of huge matrixes but I would for smaller values like
strings or scalars or (string or scalar)'s .

An interesting technique for writing compilers based on the
interpreter is given on comp.lang.scheme some time ago:

> ... This method may be called (in
> a very loose way) the "threaded code" technique, or enclosing
> everything into a lambda closure. The technique is discussed in
> detail in the following paper:
> 
> 	%A Marc Feeley
> 	%A Guy LaPalme
> 	%T Using Cloures for Code Generation
> 	%J Journal of Computer Languages
> 	%V 12
> 	%N 1
> 	%P 47-66
> 	%I Pergamon Press
> 	%D 1987

Considering the simplicity of their approach the authors report
impressive speedups (which I have verified).
[From Stephen Adams <sra@ecs.southampton.ac.uk>]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

plogan@pdx.MENTOR.COM (Patrick Logan) (09/29/89)

In article <1989Sep21.051232.1868@esegue.segue.boston.ma.us> Will@cup.portal.com writes:
  >Do any of you who have written interpreters or compilers for
  >typeless languages have any thoughts on whether it is worthwhile
  >to even bother writing a compiler for the language?
  >
  >Let's say I have a language, X, where every variable is treated by
  >the user as a character string, and the interpreter knows that in
  >certain contexts a string implies a non-character type of data and
  >operation (e.g., "2/3", "3.343434 * 23423.23423").  The user
  >doesn't ever declare a variable to be of any type; datatypes are
  >implied by the use of a variable.  Since in many situations there
  >is no a-priori way to know whether a variable is going to be used
  >exclusively as a character type, integer, or floating point type,
  >a compiler for such a language would need to include run-time
  >support to determine the way in which variables were being used as
  >statements are executed.  
  >...
  > Will
  >[These issues have been hashed over in the Lisp community for
  >30 years, and most Lisp systems include compilers.  Anybody have some
  >good references on compilation issues?  -John]
  >[From Will@cup.portal.com]

The difference between Will's scenario and Lisp (modern Lisps, anyway) 
is that Lisp's data is typed. For example, you can't add two numbers
together and then concatenate a string to the result. Unless you do
something like

        (string-append (number->string (+ 1 2)) " more characters")

which explicitly creates a new string before the concatenation.

The "command language" here at Mentor treats variables and data in a
way similar to what Will described. To me it is very confusing with
little or no benefit, especially if procedures, like "number->string"
above, exist to perform conversions where they make sense. Then there
is more benefit from a compiler and one can tap into the lore of Lisp
and similar languages. In fact, one might choose Lisp or to build the
language on top of Lisp.

See Byte, 2/88 for more on creating languages within Lisp, and
    Kranz's PhD dissertation at Yale for more on compiling Lisp-like 
    languages.

See also articles on SELF for interesting new work on compiling
    languages with type-less variables and typed data (like Lisp,
    Smalltalk, and SELF).  They even have application to C++, if
    you're interested in that sort of thing.

    These articles have appeared in OOPSLA '87 and SIGPLAN Notices,
    July 1989. Another will be presented next week at OOPSLA '89.

--
Patrick Logan                | ...!{decwrl,sequent,tessi}!mntgfx!plogan 
Mentor Graphics Corporation  | plogan@pdx.MENTOR.COM                    
Beaverton, Oregon            | "My other computer is a Lisp machine."
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

limonce@pilot.njin.net (Tom Limoncelli) (09/30/89)

Yes, you will get a speedup.

What are you willing to invest?  If this is a simple project, maybe 
just tokenizing the input and executing it from the compressed format
will result in the kind of speed increase that you require.

Others have suggested "real" solutions (i.e. code-generation)
-Tom
-- 
Drew University   -- Tom Limoncelli
C M Box 1060      -- limonce@pilot.njin.net
P O Box 802       -- tlimonce@drunivac.Bitnet
Madison, NJ 07940 -- 201-408-5389
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.