[comp.sys.amiga.emulations] Emulator Mechanics Transpiler

Chris_Johnsen@mindlink.UUCP (Chris Johnsen) (03/11/91)

        Thank you all for the valued input into this discussion to date,  both
pro and con.  I must say I find it very stimulating.

        When discussing this form platform porting translator/compiler  with
anyone, I find the need for a short word to use.  In a humble attempt  to coin
a descriptive phrase, may I suggest transpiler?
Charlie Gibbs (Charlie_Gibbs@mindlink.UUCP) Jean-Noel Moyne (jnmoyne@lbl.gov)
Dave Clemans (dclemans@mentorg.com) Dwight Hubbard (uunet.uu.net!easy!lron)
Pete Ashdown (pashdown@javelin.es.com) Jyrki Kuoppala (jkp@cs.HUT.FI) confirm
that, indeed some research and even program development has been done in this
direction.

Eddy Carroll (ecarroll@maths.tcd.ie) has written such a transpiler, as a last
year project, with some success, some reservations.

Chris Gray (cg@ami-cg.UUCP) suggests that the most practicable route to
desinging a viable transpiler would be to make it interactive.  BTW Chris, I
very much enjoyed your compiler articles in the Amiga Transactor.

Ray Cromwell (rjc@pogo.ai.mit.edu) suggested an interesting thought, a sort of
Usenetware combined effort for development, he also thinks reasonable execution
speed can be achieved.

        Those are what I percieve to be the ideas supporting the transpiler
concept.  The statements of contrary considerations are more voluminous. These
appear to fall into a number of categories.
          o  Self-modifying code
          o  Separating code from data
          o  Determining video access
          o  Stack handling
          o  Jump table problems
          o  Handling overlay segments
Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) suggested that a compiled module  be
run concurrent with an emulator type section so that in some parallel  way, any
references within the source executable, which would also be loaded during
runtime of the emulation, could be validated.  It could be argued that this
should be placed on the pro side of the ledger, but the incurred overhead
during execution would be large.  This function, I had imagined initially,
would be carried out during the transpiler phase and not attached to, or
burdening, the runtime execution.

Brad Pepers (pepers@enme1.ucalgary.ca) Jyrki Kuoppala (jkp@cs.HUT.FI)  Jonathan
David Abbey (jonabbey@cs.utexas.edu) were somewhat concerned with
self-modifying code.  There were a significant number of voices that dismissed
this concern.  Personally, I wouldn't worry about it.  If a particular program
used this technique, for whatever reason, I would accept the fact that not all
programs can be transpiled.

Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) Dave Clemans (dclemans@mentorg.com)
Sullivan (Radagast@cup.portal.com) Jean-Noel Moyne (jnmoyne@lbl.gov)  Chris
Gray (cg@ami-cg.UUCP) raised concerns about determining code from bytes of the
data persuation.  This is what I had thought would be the biggest stumbling
block. I hadn't thought there would be so many others. :-)  It's great to hear
how how long the road is before you begin the journey.

Udi Finkel (finkel@TAURUS.BITNET) Eddy Carroll (ecarroll@maths.tcd.ie) are
concerned about determining when the access to memory is video ram.  This, I
believe, can be solved, if it is dealt with in the same manner as the code/data
resolution.  Please read on.

Eddy Carroll (ecarroll@maths.tcd.ie) points out problems with processor stack
handling.  How did you resolve this with the transpiler you developed?  It
occurs to me that every emulator whether it be a transpiler, or interpretive
type must handle the stack properly.  To me, it would be more difficult to
approach this problem if I were writting an emulator, as opposed to a
transpiler, as less executable size constraints would inhibit producing an
intelligent stack handler.  On the 68000 the stack would be handled as
word/longword only.  Any bytewise stack functions within the  source executable
would be transpiled into wordwise handling.  A problem, but surmountable.

Chris Gray (cg@ami-cg.UUCP) Eddy Carroll (ecarroll@maths.tcd.ie) raised the
problem of coping with jump or branch tables properly.  Weighty point. This I
would describe as a grey area between code and data where it may be actual jump
instructions, or a table of label locations, the code to be  emulated
calculating an offset into it.  This must be handled in a similar fashion to
the code/data recognition problem.

Dave Clemans (dclemans@mentorg.com) Kurt Tappe (JKT100@psuvm.psu.edu) Jonathan
David Abbey (jonabbey@cs.utexas.edu) were concerned with the  handling of
overlay segments.  This bothers me too.  With the Amiga, it  would be possible
to either use overlays, in the case of very large programs,  or convert to
all-in-one programs by consolidating the overlay hunks.  The  main problem is
determining that overlays are employed, loading them and  transpiling.  This is
closely related to the code/data determination problem.

Dwight Hubbard (uunet.uu.net!easy!lron) suggest that this would work, kind of
like DSM.  I have DSM and must say it was a model for me when contemplating
this idea.  Chris Gray (cg@ami-cg.UUCP) believes that an  interactive compiler
holds some promise.  In one of my initial messages to  Charlie Gibbs I
entertained the reality of having to fall back on human  intervention, at the
transpiler output source code level.  This would be  determined at runtime,
when the emulated program failed to operate correctly  or at all.  A programmer
would be required to fix the problem.

        What I interpert the huge human parallel processor the net represents,
to be saying, is that human intervention must be employed.  My only reservation
in this regard is that there are interperative emulators that run a wide
variety of software on-the-fly.  To me, this feat is more difficult than a
transpiler, which can ruminate over the executable for as long as say a
raytracer.

        To incorporate the above concepts, and arrive at a workable resolution,
what is required is an expert system, probably requiring a resident expert (not
software, though brain matter is soft), to resolve the code/data, video access,
jump/branch table and overlay problems.  I don't suppose that the average
emulator user is prepared to deal with or even understand the problems we have
been discussing.  I don't think it would be worthwhile writing a transpiler
unless it could be made to work in some standalone fashion.  Feed it the
source executable and out pops the Amiga version.

        Mind shifting stage left...

        I have a suggestion, since it is not considered practicable to write a
transpiler that will work unassisted, that may resolve the percieved problems
using some of the concepts from this message thread.

        First, an off-the-wall analogy.  Here he goes again. :-)

        Consider a desirable program, running on another platform, useful to
an Amiga user, but with a very large dongle attached.  This useful program must
be run on the dongle in fact.  This is inconvient.  What do I do if I loose my
dongle?  I propose you think of this as a form of copy protection  for a
moment, obstructiing the Amiga owner, who also owns a program that runs  on
another platform.  Follow me so far?  The legitimate program for the  other
platform cannot be used on the favoured, Amiga machine, so it needs  to be
decopy-protected.  What do you use if you wish to backup and/or deprotect
software?  A copier program.  Most copiers can be employed by users who have
little knowledge of copy protection, yet they succeed in making the copy.  The
more difficult protection schemes require "brain files" which are written by
experts to achieve this end.  End of transpiler/copy-protection analogy.

        Take a basic transpiler, such as the one Eddy Carroll wrote, and add a
toolkit.  The transpiler would do its best to translate and compile the source
executable.  If this failed, an expert, not just the average user, would either
run the transpiler in expert mode, or a debug tool from the toolkit, which
would work through any problem areas.  The result of this would be an expert
transpiler file.  The beauty of this approach is that any user could run the
transpiler, given access to the expert file.  These expert files could be
included with the release package and any new ones could be included in any
updates or shared by conventional electronic means as PD.

What do you think?

csj

 No really officer, I wasn't speeding, just keeping a safe distance in front of
the car behind me!

Usenet: a542@mindlink.UUCP Phone: (604)853-5426 FAX: (604)854-8104

ecarroll@maths.tcd.ie (Eddy Carroll) (03/12/91)

In article <5097@mindlink.UUCP> Chris_Johnsen@mindlink.UUCP (Chris Johnsen)
writes:
>Eddy Carroll (ecarroll@maths.tcd.ie) points out problems with processor stack
>handling.  How did you resolve this with the transpiler you developed?  It
>occurs to me that every emulator whether it be a transpiler, or interpretive
>type must handle the stack properly.  To me, it would be more difficult to
>approach this problem if I were writting an emulator, as opposed to a
>transpiler, as less executable size constraints would inhibit producing an
>intelligent stack handler.  On the 68000 the stack would be handled as
>word/longword only.  Any bytewise stack functions within the  source executable
>would be transpiled into wordwise handling.  A problem, but surmountable.

Well, there were a number of things in my version I was hoping to implement
that I didn't have time to; one of these was the stack handling you mention
above. My solution would have been to constantly monitor stack usage while
a function is being traced (i.e. what is pushed on/popped off). Whenever
the current function called another function, a record was made for this
second function listing the state of the stack at that time (i.e., what sort
of parameters you might expect to find on it). This information could then be
used while transpiling the second function, to make sure the right items on
the stack were accessed. This still leaves a problem with functions like
printf() which can be called at different times with different parameters.

Another way of handling it would be to maintain a seperate stack for
pushing things onto. This would get around the problem of using the system
stack, but you would have to manually copy the PC to that stack everytime
you made a subroutine call. The end result would probably be more robust
though.

>        Take a basic transpiler, such as the one Eddy Carroll wrote, and add a
>toolkit.  The transpiler would do its best to translate and compile the source
>executable.  If this failed, an expert, not just the average user, would either
>run the transpiler in expert mode, or a debug tool from the toolkit, which
>would work through any problem areas.  The result of this would be an expert
>transpiler file.  The beauty of this approach is that any user could run the
>transpiler, given access to the expert file.  These expert files could be
>included with the release package and any new ones could be included in any
>updates or shared by conventional electronic means as PD.
>
>What do you think?

Interesting idea. On a smaller scale, I originally intended to include some
library recognition code so that, for example, a call to a Turbo C graphics
library function (assuming Turbo C has a graphics library) would be
recognised, and rather than translate the function an instruction at a
Time, the whole could be replaced by equivalent Amiga code (which might use
the blitter or whatever).

Elsewhere, you mentioned that it seemed more difficult to write an emulator
than a transpiler; I think you'll find you're mistaken. The difference is
roughly comparable to writing a language interpreter as opposed to a
compiler (except that the transpiler is even more difficult than a real
compiler because there are so few constraints on the input it has to deal
with).

In another message, you asked if I would be willing to share my efforts so
far with the rest of the world. Yes, certainly I would. I don't really
feel it's in a state ready for public consumption though, so I've no desire
to send it off to c.s.a. or Fred Fish. If anyone wants to play with it
though, I can mail out copies. It won't be much use to anyone not interested
in the source code though; as it stands, it gets bogged down too esaily to
transpile anything significant. (If anyone wants a quick'n'dirty 8086
disassembler on the other hand, it will do that quite well.)

Eddy
--
Eddy Carroll           ----* Genuine MUD Wizard  | "You haven't lived until
ADSPnet:  cbmuk!cbmuka!quartz!ecarroll           |    you've died in MUD!" 
Internet: ecarroll@maths.tcd.ie                  |   -- Richard Bartle

Chris_Johnsen@mindlink.UUCP (Chris Johnsen) (03/13/91)

 Eddy Carroll (ecarroll@maths.tcd.ie) writes:

>  In another message, you asked if I would be willing to share my  efforts so
> far with the rest of the world. Yes, certainly I would. I  don't really feel
> it's in a state ready for public consumption though,  so I've no desire to
> send it off to c.s.a. or Fred Fish. If anyone  wants to play with it though,
> I can mail out copies. It won't be much  use to anyone not interested in the
> source code though; as it stands,  it gets bogged down too esaily to
> transpile anything significant. (If  anyone wants a quick'n'dirty 8086
> disassembler on the other hand, it  will do that quite well.)  

Please place me on your mailing list for a copy of your transpiler Eddy.  Were
you thinking conventional or electronic mail? I'm certainly willing to  send
you media or expenses for your efforts. My usenet address is below. If  you'd
prefer to use the postal services will you post your address and I'll  get
something off in the mail right away. Thanks, net-friend!

Oh yes, and what exactly is a  ----* Genuine MUD Wizard  anyway? :-)

 A glass half filled with water is full; half with water, half with air.

Usenet: a542@mindlink.UUCP  Phone: (604)853-5426  FAX: (604)854-8104

<LEEK@QUCDN.QueensU.CA> (03/13/91)

In article <1991Mar12.102921.23420@maths.tcd.ie>, ecarroll@maths.tcd.ie (Eddy
Carroll) says:
>
>>        Take a basic transpiler, such as the one Eddy Carroll wrote, and add
>a
>>toolkit.  The transpiler would do its best to translate and compile the
>source
>>executable.  If this failed, an expert, not just the average user, would     r
 >.....

How about the transpiler program would try its best to translate 80x86
stuff into 68000 code and if it think that portion of the code is likely
to be selfmodifying or unresolved at translation time, leave it as is.
At run time run that part through an interpreter...  The overall speed
won't be as good as to translate the whole thing, but I think it is a
good trade off.  If the code is too much of a mess (selfmodifying code all
over the place), run it through an emulator.


>
>Eddy
>--
>Eddy Carroll           ----* Genuine MUD Wizard  | "You haven't lived until
>ADSPnet:  cbmuk!cbmuka!quartz!ecarroll           |    you've died in MUD!"
>Internet: ecarroll@maths.tcd.ie                  |   -- Richard Bartle

K. C. Lee
Elec. Eng. Grad. Student  <- Disclaims almost everything :)

jnmoyne@lbl.gov (Jean-Noel MOYNE) (03/13/91)

             Good resume of the current situation !! 

As for a name, Transpiler sounds fine to me (and has a "Transp..er" flavor 
too (-:).

I also think this "brain file" system is the best way. I think the 
Video-RAM access is one of the tricky points. Since the position of this 
RAM is the same on all the PC Clones, it is true that even very clean 
programs do direct access to this RAM for speed, and that you're not in an 
Emulator that will trap these moves. Of course, the use of a MMU may be a 
solution, but when you're running a transpilled (is that the right 
spelling, sorry I'm still learning English (-:), you shouldn't rely on 
catching memory exceptions like that, especially on the Amiga.

          As for the stack problems, I also think that since you look at 
the source code, and control the translated generation, you can easily map 
any sort of move.b (a7)[+-] in a couple of 68K instructions. As you say, 
it's not easy but feasable to control the stack yourself (I don't think 
doing something like a fake stack, and relying on 68020/030 possibility of 
acessing even addresses is a good solution, like the use of a MMU, it'd be 
much better if the generated code could run on a simple 68000).

         Jump table problems are also tricky, but I think that could be 
handled in the first pass of the transpiler. Overlays is a problem nobody 
offered a solution for yet. I don't know about overlays on the PC, do they 
use the OS a lot ? (in case you can try to trap them in the bios.lib or 
something like that).

          JNM (just agreeing (-:).

--
These are my own ideas (not LBL's)

ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) (03/13/91)

In article <5097@mindlink.UUCP> Chris_Johnsen@mindlink.UUCP (Chris Johnsen) writes:
>Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) suggested that a compiled module  be
>run concurrent with an emulator type section so that in some parallel  way, any
>references within the source executable, which would also be loaded during
>runtime of the emulation, could be validated.  It could be argued that this
>should be placed on the pro side of the ledger, but the incurred overhead
>during execution would be large.  This function, I had imagined initially,
>would be carried out during the transpiler phase and not attached to, or
>burdening, the runtime execution.

No, I suggested that an image of the original code, without translation,
be carried around with the compiled code, with indexes into the compiled
code's equivalent sections that would allow jump tables and so forth to
function correctly.  This would be a memory intensive but low-time-overhead
time way of resolving problems.  It should also be noted that such a
system would allow trapping of addresses that need to be handled by
special code (hardware locations etc.), and that the image is also used
for data storage, meaning that once loaded and allocated, the program
would be unlikely to need any further allocation of heap.

This system solves two problems: the data/code differentiation (if you
accidentally compile some data, it is no great problem) though a much
more minor problem remains if a couple of opcodes are missed because of
data being accidentally compiled and the compiler assuming an opcode is
the data for a false opcode, and also the problem of jump tables
and jumping to locations stored in registers.

--
Ian Farquhar                      Phone : + 61 2 805-9400
Office of Computing Services      Fax   : + 61 2 805-7433
Macquarie University  NSW  2109   Also  : + 61 2 805-7420
Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au

jnmoyne@lbl.gov (Jean-Noel MOYNE) (03/14/91)

 Not to mention I'd be interested too ...

In article <5122@mindlink.UUCP> Chris_Johnsen@mindlink.UUCP (Chris 
Johnsen) writes:
> Oh yes, and what exactly is a  ----* Genuine MUD Wizard  anyway? :-)

             Where (Level>9) is true. (now, go and transpile that !)
 MUD mean Multi User Dungeon, A Dungeon and Dragons game where all the 
players log in the machine by telnet, and play together. It's very 
addictive ! And doesn't run on Ms/Dos (-:

            JNM

--
These are my own ideas (not LBL's)