[comp.arch] Object Translator

schow@leibniz.uucp (Stanley Chow) (05/10/89)

In article <GRUNWALD.89May9113443@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes:
>
>In his picture, when you have a new architecture with, say, more registers,
>different delay costs or deeper pipelines, you translate your .o files and/or
>your final binaries. This is essentially what scoreboarding is doing, albeit
>dynamically. There are some limits to this approach, some obvious, some not
>so obvious.
>
>Comments?

Well, it would depend a lot on how much you expect the object translator to
do. Somethings are clearly impossible, some may be just inefficent and some
may even be useful. 

To take the impossibles first, I would be really impressed if anyone can
take the context switching code for one machine and translates it into
functional context switching code for a different machine. Similarly, there
are many problems for which optimal solution requires different algorithms
for different architectures. Yes, I would be impressed if someone can take
i860 code and translate it for Z80.

Just to show that this comparison of Z80 and i860 is not silly, consider the
Chess programs. For quite a long time, the best chess programs were written
by a husband & wife team (I forgot their names) for Z80 (or is it 6502?)
chips. The commercial products (the Chess challengers, I think) would 
compete with the programs running on Cray's and other supercomputers. Do
you seriously think an object code translator could achive the performance
on a 6502/Z80?

There are, however, valid uses for object translators. For example, Hunter
systems will translate MS-DOS programs to run on Unix boxes. From what I
understand, this is done for support cost reasons! It is cheaper for a
software house to support one version for MS-DOS than it is to support
multiple versions for different systems. This is strictly a support cost
vs. translation cost trade-off. I have not heard performance or optimization
as a sales pitch for Hunter systems. Anyone with better knowledge?

As far as object translation as related to comp.arch, I think it is at best
a kludge. Some architectures may well need it since as you (or Hennesy) 
point out, different delay costs and lack of scorboarding make life very
interesting over the long term.

 It would also be interesting to look at the complexity of an "Optimizing
Translator". Any OT would first have to discover the real intent of the
program, then optimize it for the new architecture. This is a much harder
problem than an optimizing compiler where the source code is available.
Consider all the work that has gone into language design to make it easy
for compilers to optimize!

Basically, the object code (for any architecture) is a very poor medium
for communicating algorithm or program intent. Most emulators have trouble
just faithfully micmacing the target system! Optimizing translators sound
very hard to me. :-)

Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!psuvax1!BNR.CA.bitnet!schow
(613) 763-2831		     ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public
I am just a small cog in a big machine. I don't represent nobody.

mash@mips.COM (John Mashey) (05/10/89)

In article <491@bnr-fos.UUCP> schow%BNR.CA.bitnet@relay.cs.net (Stanley Chow) writes:
>In article <GRUNWALD.89May9113443@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes:
>>In his picture, when you have a new architecture with, say, more registers,
>>different delay costs or deeper pipelines, you translate your .o files and/or
>>your final binaries. This is essentially what scoreboarding is doing, albeit
>>dynamically. There are some limits to this approach, some obvious, some not

>As far as object translation as related to comp.arch, I think it is at best
>a kludge. Some architectures may well need it since as you (or Hennesy) 
>point out, different delay costs and lack of scorboarding make life very
>interesting over the long term.

>Basically, the object code (for any architecture) is a very poor medium
>for communicating algorithm or program intent. Most emulators have trouble
>just faithfully micmacing the target system! Optimizing translators sound
>very hard to me. :-)

Actually, a lot of object-code translation has been used already:
	1) MOXIE: MIPs On a vaX Instruction Emulator - 
		xlated MIPS code -> VAX code to let us get fast execution
		before MIPS chips existed (i.e. ,faster than regular
		simulator)
	2) PIXIE - MIPS -> MIPS, adding profiling information and
		address-trace gathering code
	3) Various things at Ardent Computer for debugging and other
		reasons
	4) A PIXIE variant done at Berkeley to convert MIPS code -> SPARC
		code (!)

Needless to say, the MS/DOS emulation is important comemrcially.
Finally, the most important commercial application I know of is
HP's use of various techniques to run HP3000 object code on the HP PA machines.

On the other hand, be careful in interpreting John's remarks as a
claimed intent for what MIPSco is doing.
In particular, Motorola & co are persistent in claiming that the world
will fall apart for MIPS if the timings of the floating-point operations
change, despite the fact that it has clearly been stated many times
that we have complete interlocking on ALL of the multi-cycle operations.
Really, the only things that don't have interlocking are loads and
equivalents (i.e., move-from-coprocessor), and they all have a 1-cycle
delay that is predictable to the compilers.  The (Without) in
Microprocessor (Without) Interlocking Pipeline Stages, which may have
been appropriate for the Stanford MIPS, is pretty much irrelevant
when it comes to MIPSco MIPS.
As I've said here before, if we ended up with loads that had another
cycle of latency, we;d build a machine with an interlock on the extra
cycle.  If we decided to put in load interlocks, that would
be upward-compatible, although we'd likely compile 3rd-party executables
with R3000-style forever. (Of course, if we did add load interlocks at
some point, and if there got to be more of those machines around, at some
point maybe we'd start advising peopel to compile for that, and then do
a reverse-translate on R3000-machines!)
If the timings of floating-point operations
are different (and they are) in forthcoming products, the existing object
code works fine.  However, even with completely interlocked and/or
scoreboarded code, you STILL want the compilers to be as aggressive
as possible.  Fortunately, the way most of these things work, if you
try to optimize for the version with the longest latencies, it usually
works pretty well for ones with shorter latencies as well.  To see this,
suppose you had a 5-cycle FP multiply, and so you'd been generating code
that tried to issue 4 more instructions before using the result of the
multiply.  IF the multiply expanded to 10 cycles, the compiler folks
would try to work harder and find more things to do while the multiply
were running, which wouldn't usually hurt the machine with the 5-cycle
multiply.  It's just a question of the number of stall cycles, and
it's obvious that it almost always pays to spread the computation of a
multi-cycle result, and the use of that result as far apart as possible.

This, of course, is not remotely a new issue: any of the long-lived
computer product lines has faced this, especially those that
cover a range of implementation technologies, such as VAXen or S/360s.
The solutions are the same, except that the simplicity of RISC-style
instructions makes it marginally easier to manipulate object code.
Our experience with these methods tends to make us more willing to
consider object code translation as one more trick to use when it makes
sense, and it's really not that weird once you get used to it.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

grunwald@flute.cs.uiuc.edu (05/10/89)

The object code recompilation is implicitly assumed to be
intra-architecture, only, or intra-architecture-family at best. The
goal would be to take advantage of, e.g., new registers, different
pipe delays and the like. Translation of z80<->6502 is a red herring
in this context.

The reason that this is interesting is that:
	(a) as brian case pointed out, register loads/stores occur at
	    different pipe slots, and thus are advantageous even w.r.t
	    to on-chip cache designs.

	(b) It's been stated that the MIPS design can be justified by
	    ``well someday, we'll have more registers if we need them'',
	    which begs the question of ``oh yeah, well how will old
	    code use them'' (usually mentioned by register window fans).

Things people have pointed out by mail:

(1) Yuke, more versions of binaries? Each re-compiled binary will take more
    disk space.

	Sadly, this is true, but you wouldn't have to recompile *everything*,
	since few programs are really CPU bound anyway.

(2) Software Management Nightmare. Would vendors still support those .o files?
    What about bug reports, warrenty?

	Good point, and probably, from a commerical standpoint, the strongest.

(3) Why not just recompile the source?

	Another good point. Since you're usually not given the .o files
	anyway, and if you are, you're probably also given the source,
	or should be able to convince the .o provider to recompile for you.

Although I realise this might jyst be a case of ``The Boss Expositing
At A Talk,'' would anyone at MIPS care to join the frey?
--
Dirk Grunwald
Univ. of Illinois
grunwald@flute.cs.uiuc.edu
--
Dirk Grunwald
Univ. of Illinois
grunwald@flute.cs.uiuc.edu

aoki@faerie.Berkeley.EDU (Paul M. Aoki) (05/10/89)

In article <19162@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>Actually, a lot of object-code translation has been used already:
 [ three commercial examples ]
>	4) A PIXIE variant done at Berkeley to convert MIPS code -> SPARC
>		code (!)

Actually, the MIPS to SPARC translator (done by Fred Horman and Mike Yang)
chewed on assembler files (.s -> .s, not .o -> .o).

Hmm.  Are there any examples of object code translation that *haven't*
been done by MIPS or using MIPS processors .. ?  :-)
--
Paul M. Aoki		aoki@postgres.Berkeley.EDU	  ...!ucbvax!aoki
CS Division, Dept. of EECS // UCB // Berkeley, CA 94720	  (415) 642-1863

bcase@cup.portal.com (Brian bcase Case) (05/11/89)

>>[Dirk Grunwald's (Hi Dirk!) comments on Hennessey's comments about
>>binary translation]

>To take the impossibles first, I would be really impressed if anyone can
>take the context switching code for one machine and translates it into
>functional context switching code for a different machine.

Hennessey, and anyone else considering binary translation, was talking
about translating the binaries for *application* programs, not things
like OS kernels.

>There are, however, valid uses for object translators. For example, Hunter
>systems will translate MS-DOS programs to run on Unix boxes. From what I
>understand, this is done for support cost reasons! It is cheaper for a
>software house to support one version for MS-DOS than it is to support
>multiple versions for different systems. This is strictly a support cost
>vs. translation cost trade-off. I have not heard performance or optimization
>as a sales pitch for Hunter systems. Anyone with better knowledge?

I have done some insulti, I mean, consulting for Hunter Systems (but only
mundane things).  It is true that some of their customers are interested 
in keeping their support problems minimized.  Translating one version for
many hosts is better than supporting many versions.  XDOS can do quite
well with respect to performance, at least compared to the dynamic schemes
used to Phoenix and Insignia Solutions.

Hunter Systems XDOS is *not* an automatic translator.  It requires "key
files" that are nearly completely human generated (or at least human
supervised).  This is because automatic *static* translation is *impossible*
without more information than is present in executable images.  Some
people appear to be addressing this issue somewhat by retaining more
info in the executable.  I've written about this before; if you want to
know more, let me know.  (The hard problem is indirect branches.)

> It would also be interesting to look at the complexity of an "Optimizing
>Translator". Any OT would first have to discover the real intent of the
>program, then optimize it for the new architecture. This is a much harder
>problem than an optimizing compiler where the source code is available.
>Consider all the work that has gone into language design to make it easy
>for compilers to optimize!

This is why I was calling for one or a few "universal" (boy, is that an
overworked term) intermediate langauges to be used for application
distribution.  This would allow much more freedom for the development of
new architectures.  This would keep us all in business!

>Basically, the object code (for any architecture) is a very poor medium
>for communicating algorithm or program intent. Most emulators have trouble
>just faithfully micmacing the target system! Optimizing translators sound
>very hard to me. :-)

Well, I've written a couple of optimizing translators.  They can be
straightforward or very complex, depending on the architectures in question.

Object code is a poor intermediate language (which is what they are being
asked to be with respect to object code translation).  One way to look at
optimization is that its purpose is to lose information.

mash@mips.COM (John Mashey) (05/11/89)

In article <13553@pasteur.Berkeley.EDU> aoki@faerie.Berkeley.EDU (Paul M. Aoki) writes:
>In article <19162@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>Actually, the MIPS to SPARC translator (done by Fred Horman and Mike Yang)
>chewed on assembler files (.s -> .s, not .o -> .o).
OOPS! thanx for the fix.

>Hmm.  Are there any examples of object code translation that *haven't*
>been done by MIPS or using MIPS processors .. ?  :-)
As noted in earlier mail on this, at least the MSDOS translaters
and the HP PA stuff are practical commercial things.  Many emulation technqiues
often rely on "on-the-fly" object translation to speed emulation.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

wsmith@m.cs.uiuc.edu (05/11/89)

>If we decided to put in load interlocks, that would
>be upward-compatible, although we'd likely compile 3rd-party executables
>with R3000-style forever. (Of course, if we did add load interlocks at
>some point, and if there got to be more of those machines around, at some
>point maybe we'd start advising peopel to compile for that, and then do
>a reverse-translate on R3000-machines!)
>...
>Our experience with these methods tends to make us more willing to
>consider object code translation as one more trick to use when it makes
>sense, and it's really not that weird once you get used to it.
>-- 
>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
>UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
>DDD:  	408-991-0253 or 408-720-1700, x253
>USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
>/* End of text from m.cs.uiuc.edu:comp.arch */

This sounds like a version control nightmare.   It probably requires
more sophistication than current software engineering tools under
UNIX are able to provide.

The instant you have binaries that look like they should work and they
fail in possibly subtle ways, chaos is likely to ensue quickly for the 
system administrator or software developers trying to port to these
systems.   Before you worry about making it super-fast, you have to
guarantee that it will run correctly.

Bill Smith
wsmith@cs.uiuc.edu
uiucdcs!wsmith

brett@neptune.AMD.COM (Brett Stewart) (05/11/89)

In article <491@bnr-fos.UUCP> schow%BNR.CA.bitnet@relay.cs.net (Stanley Chow) writes:
>There are, however, valid uses for object translators. For example, Hunter
>systems will translate MS-DOS programs to run on Unix boxes. From what I
>understand, this is done for support cost reasons! It is cheaper for a
>software house to support one version for MS-DOS than it is to support
>multiple versions for different systems. This is strictly a support cost
>vs. translation cost trade-off. I have not heard performance or optimization
>as a sales pitch for Hunter systems. Anyone with better knowledge?
>  (Stuff omitted)
> It would also be interesting to look at the complexity of an "Optimizing
>Translator". Any OT would first have to discover the real intent of the
>program, then optimize it for the new architecture. This is a much harder
>problem than an optimizing compiler where the source code is available.
>Consider all the work that has gone into language design to make it easy
>for compilers to optimize!
>
>Basically, the object code (for any architecture) is a very poor medium
>for communicating algorithm or program intent. Most emulators have trouble
>just faithfully micmacing the target system! Optimizing translators sound
>very hard to me. :-)
>
>Stanley Chow        BitNet:  schow@BNR.CA
>BNR		    UUCP:    ..!psuvax1!BNR.CA.bitnet!schow
>(613) 763-2831		     ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public

Three papers on emulation technology were presented at COMPCON this
year.  They are:

   The XDOS Binary Code Conversion System, John Banning, Hunter Systems Inc.
   The Design and Development of a Software Emulator, Henry Nash,
	Insignia Solutions Inc.
   Logical Compute Services - An Architecture Extensible Application
	Environment, Luther Johnson, Phoenix Technologies, Ltd.

I can attest that all of the products of these companies 'work' at
some level of performance.  Discussion following the presentation of
the papers was lively.

Although approaches differ, these are arguably "Optimizing
Translators".  Executable code images are the input to each of them.

Mr. Johnson participates in the OSF SIG on ANDF - Architecture
Neutal Distribution Format - in a technical leadership capacity.
OSF recently posted a request for technology in this area.  You
might profitably pursue this discussion with the OSF SIG on the
ANDF.

mash@mips.COM (John Mashey) (05/12/89)

In article <3300066@m.cs.uiuc.edu> wsmith@m.cs.uiuc.edu writes:
(notes on object translation).

>The instant you have binaries that look like they should work and they
>fail in possibly subtle ways, chaos is likely to ensue quickly for the 
>system administrator or software developers trying to port to these
>systems.   Before you worry about making it super-fast, you have to
>guarantee that it will run correctly.

If one wants to do this [remember, I was explaining what I thought Hennessy
was talking about, originally, not what MIPSco necessarily had in mind],
one does things like assign magic numbers appropriately to object
files, to prevent wrongful execution.  Also, really, some of this
stuff, if proper things are done in executables, is no more complicated than
some things that people do, like:
	dynamically-linked libraries
	what debuggers have to do when mucking around in an object
	having multiple versions of FP-handling code for different
	FP units [thank goodness this is rapidly going the way of the
	dodo in microprocessor-land, what with: MIPS, SPARC, 88K,
	80486, and 68040 all having 1 form of FP instructions apiece.]
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086