[comp.compilers] Compilation to Assembler in Poplog

aarons@cvaxa.sussex.ac.uk (Aaron Sloman) (01/31/88)
I hope it is of some interest, and I apologise for its length.
Although Sussex University has a commercial interest in Poplog I have
tried to avoid raising any commercial issues.
                         ---------------------

               COMPILING TO ASSEMBLY LANGUAGE IN POPLOG

There have been discussions on the network about the merits of compiling to
assembly language. Readers may be interested in the methods used for
implementing and porting Poplog, a multi-language software development system
containing incremental compilers for Common Lisp, Prolog, ML and POP-11, a
Lisp-like language with a more readable Pascal-like syntax. Before I explain
how assembly language is used as output from the compiler during porting and
system building, I need to explain how the running system works. The
mechanisms described below were designed and implemented by John Gibson, at
Sussex University.

All the languages in Poplog compile to a common virtual machine, the Poplog VM
which is then compiled to native machine code. First an over-simplified
description:

The Poplog system allows different languages to share a common store manager,
and common data-types, so that a program in one language can call another and
share data-structures. Like most AI environments it also allows incremental
compilation: individual procedures can be compiled and re-compiled and are
immediately automatically linked in to the rest of the system, old versions
being garbage collected if no longer pointed to. Moreover, commands to run
procedures or interrogate data-structures can be typed in interactively, using
exactly the same high level language as the programs are written in. The
difference between this and most AI systems is that ALL the languages are
compiled in the same way. E.g. Prolog is not interpreted by a POP-11 or Lisp
program: they all compile (incrementally) to machine code.

The languages are all implemented using a set of tools for adding new
incremental compilers. These tools include procedures for breaking up a text
stream into items, and tools for planting VM instructions when procedures are
compiled. They are used by the Poplog developers to implement the four Poplog
languages but are also available for users to implement new languages suited
to particular applications. (E.g. one user claims he implemented a complete
Scheme in Poplog in about three weeks, in his spare time, getting a portable
compiler and development environment for free once he had built the Scheme
front-end compiler in Poplog.)

All this makes it possible to build a range of portable incremental compilers
for different sorts of programming languages. This is how POP-11, PROLOG,
COMMON LISP and ML are implemented. They all compile to a common internal
representation, and share machine-specific run-time code generators. Thus
several different machine-independent "front ends" for different languages can
share a machine-specific "back end" which compiles to native machine code,
which runs far more quickly than if the new language had been interpreted.

The actual story is more complicated: there are two Poplog virtual machines, a
high level and a low level one, both of which are language independent and
machine independent. The high level VM has powerful instructions, which makes
it convenient as a target language for compilers for high level languages.
This includes special facilities to support Prolog operations, dynamic and
lexical scoping of variables, procedure definitions, procedure calls,
suspending and resuming processes, and so on. Because these are quite
sophisticated operations, the mapping from the Poplog VM to native machine
code is still fairly complex.

So there is a machine independent and language independent intermediate
compiler which compiles from the high level VM to to a low level VM, doing a
considerable amount of optimisation on the way. A machine-specific back-end
then translates the low-level VM to native machine code, except when porting
or re-building the system. In the latter case the final stage is translation
to assembly language. (See diagram below.)

The bulk of the core Poplog system is written in an extended dialect of
POP-11, with provision for C-like addressing modes, for efficiency. We call it
SYSPOP. The system sources, written in SYSPOP, are also compiled to the
high-level VM, and then to the low level VM. But instead of then being
translated to machine code, the low level instructions are automatically
translated to assembly language files for the target machine. This is much
easier than producing object files, because there is a fairly straight-forward
mapping from the low level VM to assembly language, and the programs that do
the translation don't have to worry about formats for object files: we leave
that to the assembler and linker supplied by the manufacturer.

In fact, the system sources need facilities not available to users, so the two
intermediate virtual machines are slightly enhanced for SYSPOP. The following
diagram summarises the situation.

                {POP-11, COMMON LISP, PROLOG, ML, SYSPOP}
                                    |
                               Compile to
                                    |
                                    V
                             [High level VM]
                          (extended for SYSPOP)
                                    |
                          Optimise & compile to
                                    |
                                    V
                             [Low level VM]
                          (modified for SYSPOP)
                                    |
                         Compile (translate) to
                                    |
                                    V
                      [Native machine instructions]
                       [or assembler - for SYSPOP]

So for ordinary users compiling or re-compiling their procedures in the
system, the machine code generator is used and compilation is very fast, with
no linking required. For rebuilding the whole system we go via assembly
language for maximum flexibility and it is indeed a slow process. But it does
not need to be done very often, and not (yet) by ordinary users. Later (1989)
they will have the option to use the system building route in order to
configure the version of Poplog they want. So we sit on both sides of the
argument about speed raised in comp.compilers.

All the compilers and translators are implemented in Poplog (mostly in
POP-11). Only the last stage is machine specific. The low level VM is at a
level that makes it possible on the VAX, for example, to generate
approximately one machine instruction per low level VM instruction. So writing
the code generator for something like a VAX or M68020 was relatively easy. For
a RISC machine the Clipper the task is a little more complicated.

Porting to a new computer requires the run-time "back end", i.e. the low level
VM compiler, to be changed and also the system-building tools which output
assembly language programs for the target machine. There are also a few
hand-coded assembly files which have to be re-written for each machine.
Thereafter all the high level languages have incremental compilers for the new
machine. (The machine-independent system building tools perform rather complex
tasks, such as creating a dictionary of procedure names and system variables
that have to be accessible to users at run time. So besides translating system
source files, the tools create additional assembler files and also check for
consistency between the different system source files.)

I believe most other interactive systems provide at most an incremental
compiler for one language, and any other language has to be interpreted. If
everything is interpreted, then porting is much easier, but execution is much
slower. The advantage of the Poplog approach is that it is not necessary to
port different incremental compilers to each new machine.

This makes it relatively easy for the language designer to implement complex
languages, since the Poplog VM provides a varied, extendable set of data-types
and operations thereon, including facilities for logic programming, list,
record and array processing, 'number crunching', sophisticated control
structures (e.g. co-routines), 'active variables' and 'exit actions', that is
instructions executed whenever a procedure exits, whether normally or
abnormally. Indefinite precision arithmetic, ratios and complex numbers are
accessible to all the languages that need them. Both dynamic and lexical
scoping of variables are provided. A tree-structured "section" mechanism
(partly like packages) gives further support for modular design. External
modules (e.g. programs in C or Fortran) can be dynamically linked in and
unlinked. A set of facilities for accessing the operating system is also
provided. Poplog allows functions to be treated as "first class" objects, and
this is used to great advantage in POP-11 and ML.

The VM facilities are relatively easy to port to a range of computers and
operating systems because the core system is mostly implemented in SYSPOP, and
is largely machine independent. Only the machine-dependent portions mentioned
above (e.g. run-time code generator, and translator from low level VM to
assembler), plus a small number of assembler files need be changed for a new
machine (unless the operating system is also new). Since the translators are
all written in a high level AI language, altering them is relatively easy.

Porting requires compiling all the SYSPOP system sources, to generate the
corresponding new assmbler files, then moving them and the hand-made assembler
files to the new machine, where they are assembled then linked. The same
process is used to rebuild the system on an existing machine when new features
are added deep in the system. Much of the system is in source libraries
compiled as needed by users, and modifying those components does not require
re-building.

Using this mechanism an experienced programmer with no prior knowledge of
Poplog or the target processor was able to port Poplog to a RISC machine in
about 7 months. But for the usual crop of bugs in the operating system,
assembler, and other software of the new machine the actual porting time would
have been shorter. In general, extra time is required for user testing,
producing system specific documentation, tidying up loose ends etc.

Thus 7 to 12 months work ports incremental compilers for four sophisticated
languages, a screen editor, and a host of utilities. Any other languages
implemented by users using the compiler-building tools should also run
immediately. So in principle this mechanism allows a fixed amount of work to
port an indefinitely large number of incremental compilers. Additional work
will be required if the operating system is different from Unix or VMS, or if
a machine specific window manager has to be provided. This should not be
necessary for workstations supporting X-windows.

The use of assembler output considerably simplifies the porting task, and also
aids testing and debugging, since the output is far more intelligible to the
programmer than if object files were generated.

Comments welcome.

Aaron Sloman,
School of Cognitive Sciences, Univ of Sussex, Brighton, BN1 9QN, England
    ARPANET : aarons%uk.ac.sussex.cvaxa@nss.cs.ucl.ac.uk
    JANET     aarons@cvaxa.sussex.ac.uk
    BITNET:   aarons%uk.ac.sussex.cvaxa@uk.ac

As a last resort
    UUCP:     ...mcvax!ukc!cvaxa!aarons
            or aarons@cvaxa.uucp

Phone: University +(44)-(0)273-678294 (Direct line. Diverts to secretary)
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request