[comp.lang.lisp] Use of assembler as intermediate language in POPLOG

aarons@cvaxa.sussex.ac.uk (Aaron Sloman) (02/07/88)
      (Message also posted to comp.compilers and to prolog-request)
                         ---------------------

               COMPILING TO ASSEMBLY LANGUAGE IN POPLOG

There have  been  discussions  on  the network  about  the  merits  of
compiling to  assembly  language. Readers  may  be interested  in  the
methods used  for implementing  and porting  Poplog, a  multi-language
software  development  system  containing  incremental  compilers  for
Common Lisp, Prolog, ML and POP-11,  a Lisp-like language with a  more
readable Pascal-like syntax. Before I explain how assembly language is
used as output from the compiler during porting and system building, I
need to explain how the running system works. The mechanisms described
below  were  designed  and  implemented  by  John  Gibson,  at  Sussex
University.

All the languages in Poplog compile  to a common virtual machine,  the
Poplog VM, which  is then  compiled to  native machine  code. First an
over-simplified description:

The Poplog system allows different  languages to share a common  store
manager, and common data-types, so that a program in one language  can
call another and share data-structures.  Like most AI environments  it
also allows  incremental  compilation: individual  procedures  can  be
compiled and re-compiled and  are immediately automatically linked  in
to the rest of the system, old versions being garbage collected if  no
longer pointed to. Moreover, commands to run procedures or interrogate
data-structures can be typed in interactively, using exactly the  same
high level language  as the  programs are written  in. The  difference
between this  and  most AI  systems  is  that ALL  the  languages  are
compiled in the same way. E.g.  Prolog is not interpreted by a  POP-11
or Lisp program: they all compile (incrementally) to machine code.

The languages are all implemented using a set of tools for adding  new
incremental compilers. These tools include procedures for breaking  up
a text stream into items, and tools for planting VM instructions  when
procedures are compiled.  They are  used by the  Poplog developers  to
implement the four Poplog languages  but are also available for  users
to implement new  languages suited to  particular applications.  (E.g.
one user claims he  implemented a complete Scheme  in Poplog in  about
three weeks,  in  his spare  time,  getting a  portable  compiler  and
development  environment  for  free  once  he  had  built  the  Scheme
front-end compiler in Poplog.)

All this makes it  possible to build a  range of portable  incremental
compilers for different  sorts of programming  languages. This is  how
POP-11, PROLOG, COMMON LISP and  ML are implemented. They all  compile
to  a  common  internal  representation,  and  share  machine-specific
run-time code generators.  Thus several different  machine-independent
"front ends"  for different  languages  can share  a  machine-specific
"back end" which compiles to native machine code, which runs far  more
quickly than if the new language had been interpreted.

The actual story  is more  complicated: there are  two Poplog  virtual
machines, a high level and a low level one, both of which are language
independent and machine  independent. The high  level VM has  powerful
instructions, which  makes  it convenient  as  a target  language  for
compilers for high level  languages. This includes special  facilities
to  support  Prolog  operations,   dynamic  and  lexical  scoping   of
variables, procedure  definitions,  procedure  calls,  suspending  and
resuming processes, and so on.  Because these are quite  sophisticated
operations, the mapping from the Poplog  VM to native machine code  is
still fairly complex.

So  there   is  a   machine  independent   and  language   independent
intermediate compiler which compiles  from the high  level VM to  to a
low level VM, doing a considerable amount of optimisation on the  way.
A machine-specific back-end then translates the low-level VM to native
machine code, except when  porting or re-building  the system. In  the
latter case the final stage is translation to assembly language. (See
diagram below.)

The bulk of the core Poplog  system is written in an extended  dialect
of POP-11, with provision for C-like addressing modes, for efficiency.
We call it  SYSPOP. The system  sources, written in  SYSPOP, are  also
compiled to  the high-level  VM, and  then to  the low  level VM.  But
instead of  then  being translated  to  machine code,  the  low  level
instructions are automatically translated  to assembly language  files
for the  target machine.  This is  much easier  than producing  object
files, because there is a fairly straight-forward mapping from the low
level  VM  to  assembly  language,  and  the  programs  that  do   the
translation don't have  to worry  about formats for  object files:  we
leave that to the assembler and linker supplied by the manufacturer.

In fact, the system sources need facilities not available to users, so
the two  intermediate  virtual  machines  are  slightly  enhanced  for
SYSPOP. The following diagram summarises the situation.

                {POP-11, COMMON LISP, PROLOG, ML, SYSPOP}
                                    |
                               Compile to
                                    |
                                    V
                             [High level VM]
                          (extended for SYSPOP)
                                    |
                          Optimise & compile to
                                    |
                                    V
                             [Low level VM]
                          (modified for SYSPOP)
                                    |
                         Compile (translate) to
                                    |
                                    V
                      [Native machine instructions]
                       [or assembler - for SYSPOP]

So for ordinary  users compiling or  re-compiling their procedures  in
the system, the machine code generator is used and compilation is very
fast, with no linking required. For rebuilding the whole system we  go
via assembly language for maximum flexibility and it is indeed a  slow
process. But it does not need to be done very often, and not (yet)  by
ordinary users. Later  (1989) they  will have  the option  to use  the
system building route in order to configure the version of Poplog they
want. So we sit on  both sides of the  argument about speed raised  in
comp.compilers.

All the compilers and translators are implemented in Poplog (mostly in
POP-11). Only the last stage is machine specific. The low level VM  is
at a level that makes it possible on the VAX, for example, to generate
approximately one machine instruction per low level VM instruction. So
writing the code  generator for  something like  a VAX  or M68020  was
relatively easy.  For  a  RISC  machine the  task  is  a  little  more
complicated.

Porting to a new computer requires  the run-time "back end", i.e.  the
low level  VM compiler,  to be  changed and  also the  system-building
tools which output assembly language programs for the target  machine.
There are  also a  few  hand-coded assembly  files  which have  to  be
re-written for each machine. Thereafter  all the high level  languages
have   incremental    compilers   for    the   new    machine.    (The
machine-independent  system  building  tools  perform  rather  complex
tasks, such as  creating a  dictionary of procedure  names and  system
variables that have to be accessible to users at run time. So  besides
translating system source files, the tools create additional assembler
files and  also check  for consistency  between the  different  system
source files.)

I  believe  most  other  interactive   systems  provide  at  most   an
incremental compiler for one language,  and any other language has  to
be interpreted. If  everything is  interpreted, then  porting is  much
easier, but  execution is  much slower.  The advantage  of the  Poplog
approach is that  it is  not necessary to  port different  incremental
compilers to each new machine.

This makes it relatively easy  for the language designer to  implement
complex languages, since the Poplog  VM provides a varied,  extendable
set of  data-types and  operations thereon,  including facilities  for
logic  programming,  list,  record   and  array  processing,   'number
crunching',  sophisticated  control  structures  (e.g.   co-routines),
'active variables' and 'exit  actions', that is instructions  executed
whenever a procedure exits, whether normally or abnormally. Indefinite
precision arithmetic, ratios and complex numbers are accessible to all
the languages  that need  them. Both  dynamic and  lexical scoping  of
variables are provided. A tree-structured "section" mechanism  (partly
like packages)  gives further  support  for modular  design.  External
modules (e.g. programs in C or  Fortran) can be dynamically linked  in
and unlinked. A set of  facilities for accessing the operating  system
is also  provided. Poplog  allows functions  to be  treated as  "first
class" objects, and this is used to great advantage in POP-11 and ML.

The VM facilities are relatively easy to port to a range of  computers
and operating systems because the core system is mostly implemented in
SYSPOP, and is largely machine independent. Only the machine-dependent
portions mentioned above (e.g. run-time code generator, and translator
from low level  VM to  assembler), plus  a small  number of  assembler
files need be changed for a  new machine (unless the operating  system
is also new). Since the translators are all written in a high level AI
language, altering them is relatively easy.

Porting requires compiling all the SYSPOP system sources, to  generate
the corresponding  new  assmbler  files,  then  moving  them  and  the
hand-made assembler files to the new machine, where they are assembled
then linked. The  same process  is used to  rebuild the  system on  an
existing machine when new features are added deep in the system. Much
of the system is in source libraries compiled as needed by users, and
modifying those components does not require re-building.

Using this mechanism an experienced programmer with no prior knowledge
of Poplog or the target  processor was able to  port Poplog to a  RISC
machine in about  7 months.  But for  the usual  crop of  bugs in  the
operating system, assembler, and other software of the new machine the
actual porting time would have been shorter. In general, extra time is
required for user  testing, producing  system specific  documentation,
tidying up loose ends etc.

Thus 7  to  12  months  work  ports  incremental  compilers  for  four
sophisticated languages, a screen editor, and a host of utilities. Any
other languages implemented by users using the compiler-building tools
should also run immediately. So  in principle this mechanism  allows a
fixed  amount  of  work  to  port  an  indefinitely  large  number  of
incremental  compilers.  Additional  work  will  be  required  if  the
operating system  is different  from  Unix or  VMS,  or if  a  machine
specific window  manager  has  to  be provided.  This  should  not  be
necessary for workstations supporting X-windows.

The use of assembler output considerably simplifies the porting  task,
and also aids  testing and  debugging, since  the output  is far  more
intelligible to the programmer than if object files were generated.

Comments welcome.

Aaron Sloman,
School of Cognitive Sciences, Univ of Sussex, Brighton, BN1 9QN, England
    ARPANET : aarons%uk.ac.sussex.cvaxa@nss.cs.ucl.ac.uk
    JANET     aarons@cvaxa.sussex.ac.uk
    BITNET:   aarons%uk.ac.sussex.cvaxa@uk.ac

As a last resort
    UUCP:     ...mcvax!ukc!cvaxa!aarons
            or aarons@cvaxa.uucp

Phone:  University +(44)-(0)273-678294  (Direct line. Diverts to secretary)