[comp.sys.amiga.emulations] Not an emulator, but a compiler

jkp@cs.HUT.FI (Jyrki Kuoppala) (03/07/91)
In article <4992@mindlink.UUCP>, Chris_Johnsen@mindlink (Chris Johnsen) writes:
>     Another approach I've heard of is to "compile" the code to be emulated
>into native machine code.  This would involve a front-end program which would
>read the target machine's program and analyze the instructions.  For instance,
>if the "compiler" detects an instruction that does a move between two of the
>emulated machine's registers, it would simply generate a move instruction in
>the emulating machine's code.  It could generate either a translated assembly
>language source file or a machine-language file ready to load into the
>emulating machine.  This would require the "compilation" process to be run once
>on the program to be emulated, and you'd then run the output of this
>"compiler."  There are special tricks to consider here, such as resolving
>addresses - you couldn't just copy the memory addresses across because the
>emulated routines would likely be a different size.  It might be easier to
>generate a label (e.g. Axxxx where xxxx is the hex address in question) in an
>assembly source file and let the emulating machine's assembler sort it all out.
>
>     I've never actually seen this process in action, but it's another
>possibility.  --CJG

Actually, I read an article in Byte about this being done as a
commercial thingy - they would compile various PC software packages to
run on Risc / Unix machines.  I don't remember that much about it.

The first problem is to do flow analysis, that is to figure out which
part of the program are data and which are code - this is not quite
simply, esp. if the code does nasty tricks like self-modifying code
etc.  I remember the PC-RISC compiling people used cooperation with
the original software author organization with this.

A simulator would be an excellent help in this process.  It could
figure out when references to hardware chips are happening, whether
self-modifying code is used, what parts of the code are executed, etc.
The simulator would then write this information out for the compiler
to use, so the compiler knows which parts can be straight-forwardly
compiled to normal target code and for which parts calls to library
routines need to be generated.  The original program should be tried
to run in as many ways as possible so all of the code can be checked.
Perhaps the simulator could also follow follow conditional jumps to
both directions when used in this way.  Some heuristics should
probably be used.

The compiling process, assuming everything the code does is relatively
simple output from a C compiler or something (the flow analysis pass
is done), should not be that hard.  I've thought about making a gcc
front end for a 8086 or any other processors you want to use.  This
way gcc'c excellent optimization capabilities could be used, and you
could easily compile the code for any processor / machine gcc is
capable of targetting to (and that's quite many).

By the way, the only emulator I ever did was a gawk script for a
hypothetical processor with a dozen or so instructions and 256-byte
address space used by a lecturer.  We were given a few fundred zeros
and ones and asked to find out what the program does - well, I
wouldn't do it by hand, would I ?  It appeared to even be
self-moifying code ;-).

I wonder how fast would an 8086 emulator written in gawk run ...

//Jyrki