aarons@cvaxa.sussex.ac.uk (Aaron Sloman) (02/07/88)
(Message also posted to comp.compilers and to prolog-request)
---------------------
COMPILING TO ASSEMBLY LANGUAGE IN POPLOG
There have been discussions on the network about the merits of
compiling to assembly language. Readers may be interested in the
methods used for implementing and porting Poplog, a multi-language
software development system containing incremental compilers for
Common Lisp, Prolog, ML and POP-11, a Lisp-like language with a more
readable Pascal-like syntax. Before I explain how assembly language is
used as output from the compiler during porting and system building, I
need to explain how the running system works. The mechanisms described
below were designed and implemented by John Gibson, at Sussex
University.
All the languages in Poplog compile to a common virtual machine, the
Poplog VM, which is then compiled to native machine code. First an
over-simplified description:
The Poplog system allows different languages to share a common store
manager, and common data-types, so that a program in one language can
call another and share data-structures. Like most AI environments it
also allows incremental compilation: individual procedures can be
compiled and re-compiled and are immediately automatically linked in
to the rest of the system, old versions being garbage collected if no
longer pointed to. Moreover, commands to run procedures or interrogate
data-structures can be typed in interactively, using exactly the same
high level language as the programs are written in. The difference
between this and most AI systems is that ALL the languages are
compiled in the same way. E.g. Prolog is not interpreted by a POP-11
or Lisp program: they all compile (incrementally) to machine code.
The languages are all implemented using a set of tools for adding new
incremental compilers. These tools include procedures for breaking up
a text stream into items, and tools for planting VM instructions when
procedures are compiled. They are used by the Poplog developers to
implement the four Poplog languages but are also available for users
to implement new languages suited to particular applications. (E.g.
one user claims he implemented a complete Scheme in Poplog in about
three weeks, in his spare time, getting a portable compiler and
development environment for free once he had built the Scheme
front-end compiler in Poplog.)
All this makes it possible to build a range of portable incremental
compilers for different sorts of programming languages. This is how
POP-11, PROLOG, COMMON LISP and ML are implemented. They all compile
to a common internal representation, and share machine-specific
run-time code generators. Thus several different machine-independent
"front ends" for different languages can share a machine-specific
"back end" which compiles to native machine code, which runs far more
quickly than if the new language had been interpreted.
The actual story is more complicated: there are two Poplog virtual
machines, a high level and a low level one, both of which are language
independent and machine independent. The high level VM has powerful
instructions, which makes it convenient as a target language for
compilers for high level languages. This includes special facilities
to support Prolog operations, dynamic and lexical scoping of
variables, procedure definitions, procedure calls, suspending and
resuming processes, and so on. Because these are quite sophisticated
operations, the mapping from the Poplog VM to native machine code is
still fairly complex.
So there is a machine independent and language independent
intermediate compiler which compiles from the high level VM to to a
low level VM, doing a considerable amount of optimisation on the way.
A machine-specific back-end then translates the low-level VM to native
machine code, except when porting or re-building the system. In the
latter case the final stage is translation to assembly language. (See
diagram below.)
The bulk of the core Poplog system is written in an extended dialect
of POP-11, with provision for C-like addressing modes, for efficiency.
We call it SYSPOP. The system sources, written in SYSPOP, are also
compiled to the high-level VM, and then to the low level VM. But
instead of then being translated to machine code, the low level
instructions are automatically translated to assembly language files
for the target machine. This is much easier than producing object
files, because there is a fairly straight-forward mapping from the low
level VM to assembly language, and the programs that do the
translation don't have to worry about formats for object files: we
leave that to the assembler and linker supplied by the manufacturer.
In fact, the system sources need facilities not available to users, so
the two intermediate virtual machines are slightly enhanced for
SYSPOP. The following diagram summarises the situation.
{POP-11, COMMON LISP, PROLOG, ML, SYSPOP}
|
Compile to
|
V
[High level VM]
(extended for SYSPOP)
|
Optimise & compile to
|
V
[Low level VM]
(modified for SYSPOP)
|
Compile (translate) to
|
V
[Native machine instructions]
[or assembler - for SYSPOP]
So for ordinary users compiling or re-compiling their procedures in
the system, the machine code generator is used and compilation is very
fast, with no linking required. For rebuilding the whole system we go
via assembly language for maximum flexibility and it is indeed a slow
process. But it does not need to be done very often, and not (yet) by
ordinary users. Later (1989) they will have the option to use the
system building route in order to configure the version of Poplog they
want. So we sit on both sides of the argument about speed raised in
comp.compilers.
All the compilers and translators are implemented in Poplog (mostly in
POP-11). Only the last stage is machine specific. The low level VM is
at a level that makes it possible on the VAX, for example, to generate
approximately one machine instruction per low level VM instruction. So
writing the code generator for something like a VAX or M68020 was
relatively easy. For a RISC machine the task is a little more
complicated.
Porting to a new computer requires the run-time "back end", i.e. the
low level VM compiler, to be changed and also the system-building
tools which output assembly language programs for the target machine.
There are also a few hand-coded assembly files which have to be
re-written for each machine. Thereafter all the high level languages
have incremental compilers for the new machine. (The
machine-independent system building tools perform rather complex
tasks, such as creating a dictionary of procedure names and system
variables that have to be accessible to users at run time. So besides
translating system source files, the tools create additional assembler
files and also check for consistency between the different system
source files.)
I believe most other interactive systems provide at most an
incremental compiler for one language, and any other language has to
be interpreted. If everything is interpreted, then porting is much
easier, but execution is much slower. The advantage of the Poplog
approach is that it is not necessary to port different incremental
compilers to each new machine.
This makes it relatively easy for the language designer to implement
complex languages, since the Poplog VM provides a varied, extendable
set of data-types and operations thereon, including facilities for
logic programming, list, record and array processing, 'number
crunching', sophisticated control structures (e.g. co-routines),
'active variables' and 'exit actions', that is instructions executed
whenever a procedure exits, whether normally or abnormally. Indefinite
precision arithmetic, ratios and complex numbers are accessible to all
the languages that need them. Both dynamic and lexical scoping of
variables are provided. A tree-structured "section" mechanism (partly
like packages) gives further support for modular design. External
modules (e.g. programs in C or Fortran) can be dynamically linked in
and unlinked. A set of facilities for accessing the operating system
is also provided. Poplog allows functions to be treated as "first
class" objects, and this is used to great advantage in POP-11 and ML.
The VM facilities are relatively easy to port to a range of computers
and operating systems because the core system is mostly implemented in
SYSPOP, and is largely machine independent. Only the machine-dependent
portions mentioned above (e.g. run-time code generator, and translator
from low level VM to assembler), plus a small number of assembler
files need be changed for a new machine (unless the operating system
is also new). Since the translators are all written in a high level AI
language, altering them is relatively easy.
Porting requires compiling all the SYSPOP system sources, to generate
the corresponding new assmbler files, then moving them and the
hand-made assembler files to the new machine, where they are assembled
then linked. The same process is used to rebuild the system on an
existing machine when new features are added deep in the system. Much
of the system is in source libraries compiled as needed by users, and
modifying those components does not require re-building.
Using this mechanism an experienced programmer with no prior knowledge
of Poplog or the target processor was able to port Poplog to a RISC
machine in about 7 months. But for the usual crop of bugs in the
operating system, assembler, and other software of the new machine the
actual porting time would have been shorter. In general, extra time is
required for user testing, producing system specific documentation,
tidying up loose ends etc.
Thus 7 to 12 months work ports incremental compilers for four
sophisticated languages, a screen editor, and a host of utilities. Any
other languages implemented by users using the compiler-building tools
should also run immediately. So in principle this mechanism allows a
fixed amount of work to port an indefinitely large number of
incremental compilers. Additional work will be required if the
operating system is different from Unix or VMS, or if a machine
specific window manager has to be provided. This should not be
necessary for workstations supporting X-windows.
The use of assembler output considerably simplifies the porting task,
and also aids testing and debugging, since the output is far more
intelligible to the programmer than if object files were generated.
Comments welcome.
Aaron Sloman,
School of Cognitive Sciences, Univ of Sussex, Brighton, BN1 9QN, England
ARPANET : aarons%uk.ac.sussex.cvaxa@nss.cs.ucl.ac.uk
JANET aarons@cvaxa.sussex.ac.uk
BITNET: aarons%uk.ac.sussex.cvaxa@uk.ac
As a last resort
UUCP: ...mcvax!ukc!cvaxa!aarons
or aarons@cvaxa.uucp
Phone: University +(44)-(0)273-678294 (Direct line. Diverts to secretary)