martins@rwthinf.uucp (Martin Schoenert) (02/22/89)
Suppose I want to write a new compiler. Portability is a very important goal, maximum speed is not. The language is unspecified, I tend to think of a compiler that would support multiple front ends. One possibilty is to have the compiler generate instructions for a virtual machine. These instructions are interpreted by a virtual machine emulator. So porting the compiler to a new machine or new architecture just requires writing the virtual machine emulator. The UCSD Pascal compiler follows this approach. This emulator could even be written in a high level language like C. MIT CScheme compiles to bytecode, which is then interpreted by a C program. The GNU C compiler takes a different route. A large collection of definitions and code fragments descirbes an architecture. Porting the compiler to a new architecture requires to write a new set of definitions. I have never tried it, but it seems to be not so easy. My idea is a reduced machine description. It consists of the following functions, which emit corresponding instructions for say a MC68000: load( size, dest, src, offset ) Load a word, signed halfword, unsigned halfword, signed byte or unsigned byte from the address pointed to by the <src> register plus the signed <offset> into the <dest> register. store( size, src, dest, off ) Store a word, halfword or byte from the <src> register at the address pointed to by the <dest> register plus the signed <offset>. add( dest, src1, src2, imm ) If <imm> == 0 then add the contents of register <src1> and <src2> and put the result into register <dest>. If <imm> == 1 then add the immediate value <src2> to the register <src1> and put the result into register <dest>. sub( dest, src1, src2, imm ) Like add, but subtract <src2> from <src1>. neg( dest, src1 ) Store the two's complement of register <src1> into register <dest>. and( dest, src1, src2, imm ) or( dest, src1, src2, imm ) xor( dest, src1, src2, imm ) Like add, but compute the bitwise logical operations. not( dest, src1 ) Store the inverse of register <src1> into register <dest>. shl( dest, src1, src2, imm ) shrl( dest, src1, src2, imm ) shra( dest, src1, src2, imm ) Shifts left or right, logical or arithmetical the value of register <src1> by <src2> position into register <dest>. label( string ) Insert a label (for a branch) into the instruction sequence. branch( cond, label ) Insert a conditional or unconditional (cond==0) branch into the instruction sequence. Conditions are: true, overflow, equal, greater than signed, greater than unsigned, ... call( label ) Subroutine call. enter( isLeaf, regsUsed ) Generates the subroutine enter code sequence, saves the registers and depending on <isLeaf> the return address (if the architecture puts it into a register.) return( isLeaf, regsUsed ) Generates the subroutine exit code. This is completed by a definition giving the number of registers that the compiler may use, whether the processor is basically a two or three address processor, and in some form a description which registers can be used for what purposes. The structure of this `virtual instruction set' is of course very much influenced by RISC architectures. I think for a portable compiler it has many advantages. Since only one addressing mode is used and all arithmetic operations are done in registers mapping this virtual instruction set to the real instruction set of a computer is an easy task. In fact I wrote down a definition for the MC68000 in a day. This even included some peephole optimizations, like turning move off(r1),r2;add r2,r3 into add off(r1),r3. For RISC processors like a Sun Sparc or MIPS R2000 the definition should even be easier. The code generator should be much easier than a code generator that is able to support various architectures directly, as the GNU C compiler. All papers on RISC processor claim that writing a compiler for such a instruction set is easier and offers more ways for optimizing than a compiler for a complex instruction set. Has anyone tried this approach ? Has anyone rejected is as too expensive ? How much of the maximal performance (say defined by the speed of the code generated by the GNU C compiler) of a MC68000 or a VAX could such a compiler achieve ? For C or similar languages ? For Lisp ? ! Martin Schoenert, martins@rwthinf.uucp, fm@dacth51.bitnet, +49 241 804547 ! ! Lehrstuhl D Mathematik, RWTH, Templergraben 64, D 51 Aachen, West Germany ! [My limited understanding of the IBM PL.8 compiler, the one that brought us register allocation by graph coloring, suggests that this is something like the scheme that they use. Real machines have warts that make this difficult, in particular they have irregular register sets -- on an IBM 360 the first operand of a divide has to be in an even-odd pair of registers, and on the various Intel architectures the shift count for a shift or rotate has to be in the CX register. -John] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request