[comp.arch] Independent Architecture Complilers

tassos@rti.rti.org (Tassos Markas) (04/11/89)

I'm looking for independent architecture compilers.
This compiler should accept a high level langauage (I would prefer C) and
produce microcode that can be easily retargeted for any system architecture.

thanks in advance

Tassos Markas  (919) 541-7020
Research Triangle Institute
Research Triangle Park, NC 27709
tassos@rti.rti.org  [128.109.139.2]
{decvax,ihnp4}!mcnc!rti!tassos
[This topic comes up from time to time.  I don't think there is any such
thing, though I note that the OSF has an RFT out for exactly this sort of
facility to make it possible to ship one set of object code to be run on many
different architectures.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

tassos@rti.UUCP (Tassos Markas) (04/14/89)

I'm looking for independent architecture compilers.
This compiler should accept a high level langauage (I would prefer C) and
produce microcode that can be easily retargeted for any system architecture.

thanks in advance

					Tassos Markas  (919) 541-7020
					Research Triangle Institute
					Research Triangle Park, NC 27709
					tassos@rti.rti.org  [128.109.139.2]
					{decvax,ihnp4}!mcnc!rti!tassos

jac@paul.rutgers.edu (J. A. Chandross) (04/17/89)

tassos@rti.UUCP (Tassos Markas) writes:
> I'm looking for independent architecture compilers.
> This compiler should accept a high level langauage (I would prefer C) and
> produce microcode that can be easily retargeted for any system architecture.

I would suggest that anyone interested in microcode compilation read the
proceedings of the past few Micro conferences (Proceedings of the 
International Conference on Microprogramming). 

Microcode compilation is a very hard problem for non-traditional architectures.
(I know, I've tried.) If you have something fairly straightforward, ie a 
non-VLIW you may be able to use one of the existing retargetable microcode
compilers.  Otherwise, write it from scratch.  It's a lot easier than coercing 
something designed for a relatively simple machine into working for your 
machine.

As far as "easily retargeted for any system architecture" goes, beware of 
anyone who tries to sell you such a system.  This isn't something that is
trivially done.


Jonathan A. Chandross
Internet: jac@paul.rutgers.edu
UUCP: rutgers!paul.rutgers.edu!jac

schow@bnr-public.uucp (Stanley Chow) (04/18/89)

In article <Apr.16.22.18.25.1989.11912@paul.rutgers.edu> jac@paul.rutgers.edu (J. A. Chandross) writes:
>tassos@rti.UUCP (Tassos Markas) writes:
>> I'm looking for independent architecture compilers.
>> This compiler should accept a high level langauage (I would prefer C) and
>> produce microcode that can be easily retargeted for any system architecture.
>
>Microcode compilation is a very hard problem for non-traditional architectures.
>(I know, I've tried.) If you have something fairly straightforward, ie a 
>non-VLIW you may be able to use one of the existing retargetable microcode
>compilers.  Otherwise, write it from scratch.  It's a lot easier than coercing 
>something designed for a relatively simple machine into working for your 
>machine.
>

I agree with Chandross that in many cases, it is better to write the micro-code
from scratch. Considering the amount of micro-code in a machine (especially the
really timetime critical bits), it is often quicker to rewrite the micro-code as
opposed to writing (or porting) a truly optimizing compiler to a new architecture.

If you have a lot of *critical* micro-code, (by a lot, I mean tens of K words),
you should question the need. [Remember, this is a proponent of micro-code 
talking.]

Stanley Chow  ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public
	      (613) 763-2831

Clever disclaimers are hard to come by, I will save them for the articles
that need them. For now: I speak only for myself.

schow@bnr-public.uucp (Stanley Chow) (04/20/89)

In article <10441@polyslo.CalPoly.EDU> cquenel@polyslo.CalPoly.EDU (34 more school days) writes:
>
>What if your machine only runs micro-code ?  (This is not an idle
>question).  The term I've heard coined recently is "superscalar".
>If one were to write a compiler for a superscalar machine, it
>seems that one might want to design it a lot like a micro-code
>compiler.
>
>This is NOT an argument for a "retargetable" (ha ha ha ha ha) micro-code
>compiler, just a micro-code compiler.  
>


In my mind, micro-code means that stuff that implements the instructions used
by compilers. Usually, the micro-code also have lots of strange encoding with
parallellism and limited to a small addressing range.

Since I don't know what a superscalar machine looks like, I really can't say
much about it. If your target architecture has "small" instructions that the
compiler must string together to do "big" operations, then you probably want
to get a RISC compiler.


Stanley Chow   ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public


Opinion? Did I say something in that posting? Wow! Please, can I let that
opinion represent just me? I promise to tell everyone I am the sole representee.

weaver@prls.UUCP (Michael Weaver) (04/22/89)

In article <10441@polyslo.CalPoly.EDU> cquenel@polyslo.CalPoly.EDU (34 more school days) writes:
>
>What if your machine only runs micro-code ?  (This is not an idle
>question).  The term I've heard coined recently is "superscalar".
>If one were to write a compiler for a superscalar machine, it
>seems that one might want to design it a lot like a micro-code
>compiler.
>
>This is NOT an argument for a "retargetable" (ha ha ha ha ha) micro-code
>compiler, just a micro-code compiler.  
>

If your machine runs only microcode, it will generally be much simpler 
to generate code for it than a machine that uses microcode to implement
an instruction set. The reason for this is quite simple: in the latter 
case the hardware designers have a pretty good idea what the only program
the machine will run will look like, and may introduce some odd features
(such as OR-ing address with data to form a branch target address), if
they make the machine cheaper or faster.


Michael Weaver
Signetics/Philips Components
811 East Arques Avenue
Sunnyvale CA 94086 USA
Phone: (408) 991-3450
Usenet: ...!mips!prls!weaver

jac@paul.rutgers.edu (J. A. Chandross) (04/22/89)

cquenel@polyslo.CalPoly.EDU (34 more school days) writes:
>
>What if your machine only runs micro-code ?  (This is not an idle
>question).  
>

weaver@prls.UUCP (Michael Weaver)
> If your machine runs only microcode, it will generally be much simpler 
> to generate code for it than a machine that uses microcode to implement
> an instruction set.

This is indeed the case.

Instruction sets are generally written once, but executed many many times.
In order to deliver the highest performance you will likely want to write
the code by hand.  Besides, most microcoded instruction sets, even the
VAX, are relatively simple compared to the features afforded by a true 
VLIW (ie horizontally microcoded) machine.

However, if you want to generate user customizable instructions sets, 
or have user programs written entirely in microcode you will run into 
the problem of how to generate the microcode form a high-level language.
It is bad enough having to debug the hardware with hand-written programs;
forcing users to write in microcode means the top executives of your 
company are going to be selling real-estate in 6 months.

However, programming disadvanatges aside, high-performance microcoded 
machines are likely to be the wave of the future.  It is only with 
microcoded machines that you can take maximal advantage of your hardware. 

The RISC machines have merely proven what microarchitects have known
since time immemorial:  keep it single cycle, don't put a feature in
if it will slow things down (even if your marketing people insist),
don't put it in if you can make better use of the hardware, use
parallelism to improve performance, and keep the hardware busy all of
the time, etc..  And the devil take anyone who wants to program it by
hand.

(Of course, there are additional issues for microprogrammed machines like 
leave out pipelining because it makes it hard to write compilers for the 
machine as well as introducing needless complexity, handle branches 
intelligently, etc.)

I'll construct a hypothetical machine to show what sort of performance
gains it delivers and to demonstrate the demands it places on the compiler:

2 ALU's, conventional design, driveable in parallel
4 increment/decrement units. operations:
	add/subtract {1,2,4,nothing} to register
memory access unit:
	{read,write} {8,16,32} bits offset is {register, constant, none}
branch unit:
	jump, call subroutine, return from subroutine
registers:
	64 always accessible
	64 accessible only through an ALU A
	64 accessible only through an ALU B

The most efficient code will use all these resources at the same time.
Any compiler that will generate code for such a machine will require some
sort of data flow analysis to determine how the various fields (ie an
ALU op, branch, etc) can be compacted together to produce optimal code.
For instance, the sequence:

	while(foo->next != NULL) {
		foo = foo->next;
		bar++;
		}

Could compile into code like: 

R0 = foo
R1 = offset for next

	loop:	alu_1(compare(R0, NULL))
		branch(equal, done);
		R0 = read(R0 + R1, Long)
		increment(R2,1)
		goto loop;
	done:

But this is extremely inefficient.  Instead, we can compact it to a
2 instruction loop: 
	loop:	alu_1(compare(R0, NULL)) branch(equal, done);
		R0 = read(R0 + R1, Long) increment(R2,1) goto loop;
	done:

Now when you add in the complexity of folding in the instructions before
and after the loop the compiler must understand a great deal about the
target machine.  After all, you now have scheduling problems.  Recall
that some registers are only accessible on certain ALUs.  (These would
be used to store commonly constants.)  You also can have resource
conflicts if various fields in your instruction are overlapped.  For
instance, you might discover that you typically do 1 alu operation and a
memory operation or 2 alu operations.  This would allow you to overlap
the field for a memory operation with one of the alu fields.  The problem 
grows as you add hardware.  However, you can get performance with this sort 
of machine that you couldn't get out of a RISC chip.

While the compiler problems are large, they are not insurmountable.  
Compilers have been written that generate tolerable code for machines like
this.  You need look no farther than the Multiflow or ELI-512 for proof.

It is not clear to me exactly what model the current crop of commercial
retargetable microcode compilers use.  The research ones, ie the only
ones that reveal their private parts to the world, tend to take a 
simplistic view of the world.  I suspect that the commercial ones are 
more hype than substance, although I would be delighted to be proven 
wrong.


Jonathan A. Chandross
Internet: jac@paul.rutgers.edu
UUCP: rutgers!paul.rutgers.edu!jac

aarons@syma.sussex.ac.uk (Aaron Sloman) (04/28/89)

tassos@rti.UUCP (Tassos Markas) writes:

> Date: 14 Apr 89 01:20:12 GMT
> Organization: Research Triangle Institute, RTP, NC
>
> I'm looking for independent architecture compilers.
> This compiler should accept a high level langauage (I would prefer C) and
> produce microcode that can be easily retargeted for any system architecture.
>

The Poplog two-level virtual machine  with the two levels linked  by a
machine-independend  and   language  independent   compiler   provides
relatively easy portability for incremental  compilers for a range  of
high level  languages, though  not  C at  present.  It also  makes  it
relatively easy to add a new new high level language that  immediately
runs (with a rich environment) on all the target architectures.

I append a more detailed description.  I hope it is of some  interest,
and I  apologise  for its  length.  Although Sussex  University  has a
commercial interest  in  Poplog I  have  tried to  avoid  raising  any
commercial issues.
                         ---------------------

Poplog provides development  tools for  a range  of languages:  Common
Lisp, Prolog, ML and POP-11 (a Lisp-like language with a more readable
Pascal-like syntax). It also provides tools for adding new incremental
compilers. It might be possible to add an incremental compiler for  C,
though it would not run very  fast. However, from late 1989 we  expect
to give users access to a C-like extension to Pop-11 that is used  for
developing and  porting Poplog.  It is  not  quite as  fast as  C  but
provides far more facilities.

Before I describe  porting I need  to explain how  the running  system
works. The mechanisms described below were designed and implemented by
John Gibson, at Sussex University.

All the languages in Poplog compile  to a common virtual machine,  the
Poplog VM which  is then  compiled to  native machine  code. First  an
over-simplified description:

The Poplog system allows different  languages to share a common  store
manager, and common data-types, so that a program in one language  can
call another  and  share  data-structures.  There  is  also  a  common
interface to the host operating system and an "external" interface, to
non-Poplog languages  (C,  Fortran,  etc). The  Poplog  languages  are
incrementally compiled for rapid  development and testing:  individual
procedures can rapidly be  compiled, tested, modified and  re-compiled
and are immediately automatically linked in to the rest of the system,
old versions being garbage collected if no longer pointed to.

The languages are all implemented using a set of tools for adding  new
incremental compilers. These tools include procedures for breaking  up
a text stream into items, and tools for planting VM instructions  when
procedures are compiled. These tools are used by the Poplog developers
to implement  the four  Poplog languages  but are  also available  for
users to implement new languages suited to particular applications.

All this makes it  possible to build a  range of portable  incremental
compilers  for  different  sorts  of  programming  languages.  POP-11,
PROLOG,  COMMON  LISP  and  ML  all  compile  to  a  common   internal
representation, and share  machine-specific run-time code  generators.
Thus several different machine-independent "front ends" for  different
languages can share  a machine-specific "back  end" which compiles  to
native machine  code, which  runs far  more quickly  than if  the  new
language had been interpreted.

The actual story  is more  complicated: there are  two Poplog  virtual
machines, a high level and a low level one, both of which are language
independent and machine  independent. The high  level VM has  powerful
instructions, which  makes  it convenient  as  a target  language  for
compilers for high level  languages. This includes special  facilities
to  support  Prolog  operations,   dynamic  and  lexical  scoping   of
variables, procedure  definitions,  procedure  calls,  suspending  and
resuming processes, and so on.  Because these are quite  sophisticated
operations, the mapping from the Poplog  VM to native machine code  is
still fairly complex.

So  there   is  a   machine  independent   and  language   independent
intermediate compiler which compiles  from the high  level VM to  to a
low level VM, doing a considerable amount of optimisation on the  way.
A machine-specific back-end then translates the low-level VM to native
machine code, except when  porting or re-building  the system. In  the
latter case the final stage is translation to assembly language.  (See
diagram below.)

The bulk of the core Poplog  system is written in an extended  dialect
of POP-11, with provision for C-like addressing modes, for efficiency.
We call it SYSPOP. The system  sources, mostly written in SYSPOP,  are
also compiled to the high-level VM, and then to the low level VM.  But
instead of  then  being translated  to  machine code,  the  low  level
instructions are automatically translated  to assembly language  files
for the  target machine.  This is  much easier  than producing  object
files, because there is a fairly straight-forward mapping from the low
level  VM  to  assembly  language,  and  the  programs  that  do   the
translation don't have  to worry  about formats for  object files:  we
leave that to the assembler and linker supplied by the manufacturer.

In fact, the system sources need facilities not available to users, so
the two  intermediate  virtual  machines  are  slightly  enhanced  for
SYSPOP. The following diagram summarises the situation.

             {POP-11, COMMON LISP, PROLOG, ML, SYSPOP, etc}
                                    |
                               Compile to   [language specific]
                                    |
                                    V
                             [High level VM]
                          (extended for SYSPOP)
                                    |
                          Optimise & compile to
                                    |
                                    V
                             [Low level VM]
                          (modified for SYSPOP)
                                    |
                         Compile (translate) to     [machine specific]
                                    |
                                    V
                      [Native machine instructions]
                       [or assembler - for SYSPOP]

So for  ordinary users  compiling  or re-compiling  procedures  during
software development the built in  machine code generator is used  and
compilation is very fast, with no linking required. For rebuilding the
whole system  the  back end  is  changed to  generate  assembler,  and
rebuilding is much slower. But it does not need to be done very often.

All the compilers and translators are implemented in Poplog (mostly in
POP-11). Only the last stage is machine specific. The low level VM  is
at a level that makes it possible on the VAX, for example, to generate
approximately one machine instruction per low level VM instruction. So
writing the code  generator for  something like  a VAX  or M68020  was
relatively easy.  For  a  RISC  machine the  task  is  a  little  more
complicated.

Porting to a new computer requires  the run-time "back end", i.e.  the
low level  VM compiler,  to be  changed and  also the  system-building
tools which output assembly language programs for the target  machine.
There are  also a  few  hand-coded assembly  files  which have  to  be
re-written for each machine. Thereafter  all the high level  languages
have   incremental    compilers   for    the   new    machine.    (The
machine-independent  system  building  tools  perform  rather  complex
tasks, such as  creating a  dictionary of procedure  names and  system
variables that have to be accessible to users at run time. So  besides
translating system source files, the tools create additional assembler
files and  also check  for consistency  between the  different  system
source files.)

The Poplog  VM provides  a varied,  extendable set  of data-types  and
operations thereon, including facilities for logic programming,  list,
record and array processing, 'number crunching', sophisticated control
structures (e.g. co-routines), 'active variables' and 'exit  actions',
that is  instructions executed  whenever  a procedure  exits,  whether
normally or abnormally.  Indefinite precision  arithmetic, ratios  and
complex numbers are accessible  to all the  languages that need  them.
Both  dynamic  and  lexical  scoping  of  variables  are   provided. A
tree-structured  "section"  mechanism  (partly  like  packages)  gives
further support for modular design.

External modules (e.g. programs  in C or  Fortran) can be  dynamically
linked in  and  unlinked.  A  set  of  facilities  for  accessing  the
operating system is also provided.

The VM facilities are relatively easy to port to a range of  computers
and operating systems because the core system is mostly implemented in
SYSPOP, and is largely machine independent. Only the machine-dependent
portions mentioned above (e.g. run-time code generator, and translator
from low level  VM to  assembler), plus  a small  number of  assembler
files need be changed for a  new machine (unless the operating  system
is also new). Since the translators are all written in a high level AI
language, altering them is relatively easy.

Porting requires compiling all the SYSPOP system sources, to  generate
the corresponding  new  assmbler  files,  then  moving  them  and  the
hand-made assembler files to the new machine, where they are assembled
then linked. The  same process  is used to  rebuild the  system on  an
existing machine when new features are added deep in the system. Much
of the system is in source libraries compiled as needed by users, and
modifying those components does not require re-building.

Using this mechanism an experienced programmer with no prior knowledge
of Poplog or the target  processor was able to  port Poplog to a  RISC
machine in about  7 months.  But for  the usual  crop of  bugs in  the
operating system, assembler, and other software of the new machine the
actual porting time would have been shorter. In general, extra time is
required for user  testing, producing  system specific  documentation,
tidying up loose ends etc.

Thus 7  to  12  months  work  ports  incremental  compilers  for  four
sophisticated languages, a screen editor, and a host of utilities. Any
other languages implemented by users using the compiler-building tools
should also run immediately. So  in principle this mechanism  allows a
fixed  amount  of  work  to  port  an  indefinitely  large  number  of
incremental  compilers.  Additional  work  will  be  required  if  the
operating system  is different  from  Unix or  VMS,  or if  a  machine
specific window  manager  has  to  be provided.  This  should  not  be
necessary for workstations supporting X-windows.

POPLOG is too big for 80286-based PCs. Currently it runs on

    VAX (VMS/Unix),
    Sun2, Sun3, Sun4(SPARC), Sun386i (Road-runner),
    HP 9000 300 series workstations with HPUX
    Apollo 680?0 worstations with Bsd Unix
    Sequent Symmetry with Dynix
    Orion 1/05 (with Clipper). This version is not supported at present

Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QN, England
    INTERNET: aarons%uk.ac.sussex.cogs@nsfnet-relay.ac.uk
              aarons%uk.ac.sussex.cogs%nsfnet-relay.ac.uk@relay.cs.net
    JANET     aarons@cogs.sussex.ac.uk
    BITNET:   aarons%uk.ac.sussex.cogs@uk.ac
        or    aarons%uk.ac.sussex.cogs%ukacrl.bitnet@cunyvm.cuny.edu

    UUCP:     ...mcvax!ukc!cogs!aarons
            or aarons@cogs.uucp