jkp@cs.HUT.FI (Jyrki Kuoppala) (03/07/91)
In article <4992@mindlink.UUCP>, Chris_Johnsen@mindlink (Chris Johnsen) writes: > Another approach I've heard of is to "compile" the code to be emulated >into native machine code. This would involve a front-end program which would >read the target machine's program and analyze the instructions. For instance, >if the "compiler" detects an instruction that does a move between two of the >emulated machine's registers, it would simply generate a move instruction in >the emulating machine's code. It could generate either a translated assembly >language source file or a machine-language file ready to load into the >emulating machine. This would require the "compilation" process to be run once >on the program to be emulated, and you'd then run the output of this >"compiler." There are special tricks to consider here, such as resolving >addresses - you couldn't just copy the memory addresses across because the >emulated routines would likely be a different size. It might be easier to >generate a label (e.g. Axxxx where xxxx is the hex address in question) in an >assembly source file and let the emulating machine's assembler sort it all out. > > I've never actually seen this process in action, but it's another >possibility. --CJG Actually, I read an article in Byte about this being done as a commercial thingy - they would compile various PC software packages to run on Risc / Unix machines. I don't remember that much about it. The first problem is to do flow analysis, that is to figure out which part of the program are data and which are code - this is not quite simply, esp. if the code does nasty tricks like self-modifying code etc. I remember the PC-RISC compiling people used cooperation with the original software author organization with this. A simulator would be an excellent help in this process. It could figure out when references to hardware chips are happening, whether self-modifying code is used, what parts of the code are executed, etc. The simulator would then write this information out for the compiler to use, so the compiler knows which parts can be straight-forwardly compiled to normal target code and for which parts calls to library routines need to be generated. The original program should be tried to run in as many ways as possible so all of the code can be checked. Perhaps the simulator could also follow follow conditional jumps to both directions when used in this way. Some heuristics should probably be used. The compiling process, assuming everything the code does is relatively simple output from a C compiler or something (the flow analysis pass is done), should not be that hard. I've thought about making a gcc front end for a 8086 or any other processors you want to use. This way gcc'c excellent optimization capabilities could be used, and you could easily compile the code for any processor / machine gcc is capable of targetting to (and that's quite many). By the way, the only emulator I ever did was a gawk script for a hypothetical processor with a dozen or so instructions and 256-byte address space used by a lecturer. We were given a few fundred zeros and ones and asked to find out what the program does - well, I wouldn't do it by hand, would I ? It appeared to even be self-moifying code ;-). I wonder how fast would an 8086 emulator written in gawk run ... //Jyrki