schow@leibniz.uucp (Stanley Chow) (05/10/89)
In article <GRUNWALD.89May9113443@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: > >In his picture, when you have a new architecture with, say, more registers, >different delay costs or deeper pipelines, you translate your .o files and/or >your final binaries. This is essentially what scoreboarding is doing, albeit >dynamically. There are some limits to this approach, some obvious, some not >so obvious. > >Comments? Well, it would depend a lot on how much you expect the object translator to do. Somethings are clearly impossible, some may be just inefficent and some may even be useful. To take the impossibles first, I would be really impressed if anyone can take the context switching code for one machine and translates it into functional context switching code for a different machine. Similarly, there are many problems for which optimal solution requires different algorithms for different architectures. Yes, I would be impressed if someone can take i860 code and translate it for Z80. Just to show that this comparison of Z80 and i860 is not silly, consider the Chess programs. For quite a long time, the best chess programs were written by a husband & wife team (I forgot their names) for Z80 (or is it 6502?) chips. The commercial products (the Chess challengers, I think) would compete with the programs running on Cray's and other supercomputers. Do you seriously think an object code translator could achive the performance on a 6502/Z80? There are, however, valid uses for object translators. For example, Hunter systems will translate MS-DOS programs to run on Unix boxes. From what I understand, this is done for support cost reasons! It is cheaper for a software house to support one version for MS-DOS than it is to support multiple versions for different systems. This is strictly a support cost vs. translation cost trade-off. I have not heard performance or optimization as a sales pitch for Hunter systems. Anyone with better knowledge? As far as object translation as related to comp.arch, I think it is at best a kludge. Some architectures may well need it since as you (or Hennesy) point out, different delay costs and lack of scorboarding make life very interesting over the long term. It would also be interesting to look at the complexity of an "Optimizing Translator". Any OT would first have to discover the real intent of the program, then optimize it for the new architecture. This is a much harder problem than an optimizing compiler where the source code is available. Consider all the work that has gone into language design to make it easy for compilers to optimize! Basically, the object code (for any architecture) is a very poor medium for communicating algorithm or program intent. Most emulators have trouble just faithfully micmacing the target system! Optimizing translators sound very hard to me. :-) Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow (613) 763-2831 ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public I am just a small cog in a big machine. I don't represent nobody.
mash@mips.COM (John Mashey) (05/10/89)
In article <491@bnr-fos.UUCP> schow%BNR.CA.bitnet@relay.cs.net (Stanley Chow) writes: >In article <GRUNWALD.89May9113443@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: >>In his picture, when you have a new architecture with, say, more registers, >>different delay costs or deeper pipelines, you translate your .o files and/or >>your final binaries. This is essentially what scoreboarding is doing, albeit >>dynamically. There are some limits to this approach, some obvious, some not >As far as object translation as related to comp.arch, I think it is at best >a kludge. Some architectures may well need it since as you (or Hennesy) >point out, different delay costs and lack of scorboarding make life very >interesting over the long term. >Basically, the object code (for any architecture) is a very poor medium >for communicating algorithm or program intent. Most emulators have trouble >just faithfully micmacing the target system! Optimizing translators sound >very hard to me. :-) Actually, a lot of object-code translation has been used already: 1) MOXIE: MIPs On a vaX Instruction Emulator - xlated MIPS code -> VAX code to let us get fast execution before MIPS chips existed (i.e. ,faster than regular simulator) 2) PIXIE - MIPS -> MIPS, adding profiling information and address-trace gathering code 3) Various things at Ardent Computer for debugging and other reasons 4) A PIXIE variant done at Berkeley to convert MIPS code -> SPARC code (!) Needless to say, the MS/DOS emulation is important comemrcially. Finally, the most important commercial application I know of is HP's use of various techniques to run HP3000 object code on the HP PA machines. On the other hand, be careful in interpreting John's remarks as a claimed intent for what MIPSco is doing. In particular, Motorola & co are persistent in claiming that the world will fall apart for MIPS if the timings of the floating-point operations change, despite the fact that it has clearly been stated many times that we have complete interlocking on ALL of the multi-cycle operations. Really, the only things that don't have interlocking are loads and equivalents (i.e., move-from-coprocessor), and they all have a 1-cycle delay that is predictable to the compilers. The (Without) in Microprocessor (Without) Interlocking Pipeline Stages, which may have been appropriate for the Stanford MIPS, is pretty much irrelevant when it comes to MIPSco MIPS. As I've said here before, if we ended up with loads that had another cycle of latency, we;d build a machine with an interlock on the extra cycle. If we decided to put in load interlocks, that would be upward-compatible, although we'd likely compile 3rd-party executables with R3000-style forever. (Of course, if we did add load interlocks at some point, and if there got to be more of those machines around, at some point maybe we'd start advising peopel to compile for that, and then do a reverse-translate on R3000-machines!) If the timings of floating-point operations are different (and they are) in forthcoming products, the existing object code works fine. However, even with completely interlocked and/or scoreboarded code, you STILL want the compilers to be as aggressive as possible. Fortunately, the way most of these things work, if you try to optimize for the version with the longest latencies, it usually works pretty well for ones with shorter latencies as well. To see this, suppose you had a 5-cycle FP multiply, and so you'd been generating code that tried to issue 4 more instructions before using the result of the multiply. IF the multiply expanded to 10 cycles, the compiler folks would try to work harder and find more things to do while the multiply were running, which wouldn't usually hurt the machine with the 5-cycle multiply. It's just a question of the number of stall cycles, and it's obvious that it almost always pays to spread the computation of a multi-cycle result, and the use of that result as far apart as possible. This, of course, is not remotely a new issue: any of the long-lived computer product lines has faced this, especially those that cover a range of implementation technologies, such as VAXen or S/360s. The solutions are the same, except that the simplicity of RISC-style instructions makes it marginally easier to manipulate object code. Our experience with these methods tends to make us more willing to consider object code translation as one more trick to use when it makes sense, and it's really not that weird once you get used to it. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
grunwald@flute.cs.uiuc.edu (05/10/89)
The object code recompilation is implicitly assumed to be intra-architecture, only, or intra-architecture-family at best. The goal would be to take advantage of, e.g., new registers, different pipe delays and the like. Translation of z80<->6502 is a red herring in this context. The reason that this is interesting is that: (a) as brian case pointed out, register loads/stores occur at different pipe slots, and thus are advantageous even w.r.t to on-chip cache designs. (b) It's been stated that the MIPS design can be justified by ``well someday, we'll have more registers if we need them'', which begs the question of ``oh yeah, well how will old code use them'' (usually mentioned by register window fans). Things people have pointed out by mail: (1) Yuke, more versions of binaries? Each re-compiled binary will take more disk space. Sadly, this is true, but you wouldn't have to recompile *everything*, since few programs are really CPU bound anyway. (2) Software Management Nightmare. Would vendors still support those .o files? What about bug reports, warrenty? Good point, and probably, from a commerical standpoint, the strongest. (3) Why not just recompile the source? Another good point. Since you're usually not given the .o files anyway, and if you are, you're probably also given the source, or should be able to convince the .o provider to recompile for you. Although I realise this might jyst be a case of ``The Boss Expositing At A Talk,'' would anyone at MIPS care to join the frey? -- Dirk Grunwald Univ. of Illinois grunwald@flute.cs.uiuc.edu -- Dirk Grunwald Univ. of Illinois grunwald@flute.cs.uiuc.edu
aoki@faerie.Berkeley.EDU (Paul M. Aoki) (05/10/89)
In article <19162@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >Actually, a lot of object-code translation has been used already: [ three commercial examples ] > 4) A PIXIE variant done at Berkeley to convert MIPS code -> SPARC > code (!) Actually, the MIPS to SPARC translator (done by Fred Horman and Mike Yang) chewed on assembler files (.s -> .s, not .o -> .o). Hmm. Are there any examples of object code translation that *haven't* been done by MIPS or using MIPS processors .. ? :-) -- Paul M. Aoki aoki@postgres.Berkeley.EDU ...!ucbvax!aoki CS Division, Dept. of EECS // UCB // Berkeley, CA 94720 (415) 642-1863
bcase@cup.portal.com (Brian bcase Case) (05/11/89)
>>[Dirk Grunwald's (Hi Dirk!) comments on Hennessey's comments about >>binary translation] >To take the impossibles first, I would be really impressed if anyone can >take the context switching code for one machine and translates it into >functional context switching code for a different machine. Hennessey, and anyone else considering binary translation, was talking about translating the binaries for *application* programs, not things like OS kernels. >There are, however, valid uses for object translators. For example, Hunter >systems will translate MS-DOS programs to run on Unix boxes. From what I >understand, this is done for support cost reasons! It is cheaper for a >software house to support one version for MS-DOS than it is to support >multiple versions for different systems. This is strictly a support cost >vs. translation cost trade-off. I have not heard performance or optimization >as a sales pitch for Hunter systems. Anyone with better knowledge? I have done some insulti, I mean, consulting for Hunter Systems (but only mundane things). It is true that some of their customers are interested in keeping their support problems minimized. Translating one version for many hosts is better than supporting many versions. XDOS can do quite well with respect to performance, at least compared to the dynamic schemes used to Phoenix and Insignia Solutions. Hunter Systems XDOS is *not* an automatic translator. It requires "key files" that are nearly completely human generated (or at least human supervised). This is because automatic *static* translation is *impossible* without more information than is present in executable images. Some people appear to be addressing this issue somewhat by retaining more info in the executable. I've written about this before; if you want to know more, let me know. (The hard problem is indirect branches.) > It would also be interesting to look at the complexity of an "Optimizing >Translator". Any OT would first have to discover the real intent of the >program, then optimize it for the new architecture. This is a much harder >problem than an optimizing compiler where the source code is available. >Consider all the work that has gone into language design to make it easy >for compilers to optimize! This is why I was calling for one or a few "universal" (boy, is that an overworked term) intermediate langauges to be used for application distribution. This would allow much more freedom for the development of new architectures. This would keep us all in business! >Basically, the object code (for any architecture) is a very poor medium >for communicating algorithm or program intent. Most emulators have trouble >just faithfully micmacing the target system! Optimizing translators sound >very hard to me. :-) Well, I've written a couple of optimizing translators. They can be straightforward or very complex, depending on the architectures in question. Object code is a poor intermediate language (which is what they are being asked to be with respect to object code translation). One way to look at optimization is that its purpose is to lose information.
mash@mips.COM (John Mashey) (05/11/89)
In article <13553@pasteur.Berkeley.EDU> aoki@faerie.Berkeley.EDU (Paul M. Aoki) writes: >In article <19162@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >Actually, the MIPS to SPARC translator (done by Fred Horman and Mike Yang) >chewed on assembler files (.s -> .s, not .o -> .o). OOPS! thanx for the fix. >Hmm. Are there any examples of object code translation that *haven't* >been done by MIPS or using MIPS processors .. ? :-) As noted in earlier mail on this, at least the MSDOS translaters and the HP PA stuff are practical commercial things. Many emulation technqiues often rely on "on-the-fly" object translation to speed emulation. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
wsmith@m.cs.uiuc.edu (05/11/89)
>If we decided to put in load interlocks, that would >be upward-compatible, although we'd likely compile 3rd-party executables >with R3000-style forever. (Of course, if we did add load interlocks at >some point, and if there got to be more of those machines around, at some >point maybe we'd start advising peopel to compile for that, and then do >a reverse-translate on R3000-machines!) >... >Our experience with these methods tends to make us more willing to >consider object code translation as one more trick to use when it makes >sense, and it's really not that weird once you get used to it. >-- >-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> >UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com >DDD: 408-991-0253 or 408-720-1700, x253 >USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086 >/* End of text from m.cs.uiuc.edu:comp.arch */ This sounds like a version control nightmare. It probably requires more sophistication than current software engineering tools under UNIX are able to provide. The instant you have binaries that look like they should work and they fail in possibly subtle ways, chaos is likely to ensue quickly for the system administrator or software developers trying to port to these systems. Before you worry about making it super-fast, you have to guarantee that it will run correctly. Bill Smith wsmith@cs.uiuc.edu uiucdcs!wsmith
brett@neptune.AMD.COM (Brett Stewart) (05/11/89)
In article <491@bnr-fos.UUCP> schow%BNR.CA.bitnet@relay.cs.net (Stanley Chow) writes: >There are, however, valid uses for object translators. For example, Hunter >systems will translate MS-DOS programs to run on Unix boxes. From what I >understand, this is done for support cost reasons! It is cheaper for a >software house to support one version for MS-DOS than it is to support >multiple versions for different systems. This is strictly a support cost >vs. translation cost trade-off. I have not heard performance or optimization >as a sales pitch for Hunter systems. Anyone with better knowledge? > (Stuff omitted) > It would also be interesting to look at the complexity of an "Optimizing >Translator". Any OT would first have to discover the real intent of the >program, then optimize it for the new architecture. This is a much harder >problem than an optimizing compiler where the source code is available. >Consider all the work that has gone into language design to make it easy >for compilers to optimize! > >Basically, the object code (for any architecture) is a very poor medium >for communicating algorithm or program intent. Most emulators have trouble >just faithfully micmacing the target system! Optimizing translators sound >very hard to me. :-) > >Stanley Chow BitNet: schow@BNR.CA >BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow >(613) 763-2831 ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public Three papers on emulation technology were presented at COMPCON this year. They are: The XDOS Binary Code Conversion System, John Banning, Hunter Systems Inc. The Design and Development of a Software Emulator, Henry Nash, Insignia Solutions Inc. Logical Compute Services - An Architecture Extensible Application Environment, Luther Johnson, Phoenix Technologies, Ltd. I can attest that all of the products of these companies 'work' at some level of performance. Discussion following the presentation of the papers was lively. Although approaches differ, these are arguably "Optimizing Translators". Executable code images are the input to each of them. Mr. Johnson participates in the OSF SIG on ANDF - Architecture Neutal Distribution Format - in a technical leadership capacity. OSF recently posted a request for technology in this area. You might profitably pursue this discussion with the OSF SIG on the ANDF.
mash@mips.COM (John Mashey) (05/12/89)
In article <3300066@m.cs.uiuc.edu> wsmith@m.cs.uiuc.edu writes: (notes on object translation). >The instant you have binaries that look like they should work and they >fail in possibly subtle ways, chaos is likely to ensue quickly for the >system administrator or software developers trying to port to these >systems. Before you worry about making it super-fast, you have to >guarantee that it will run correctly. If one wants to do this [remember, I was explaining what I thought Hennessy was talking about, originally, not what MIPSco necessarily had in mind], one does things like assign magic numbers appropriately to object files, to prevent wrongful execution. Also, really, some of this stuff, if proper things are done in executables, is no more complicated than some things that people do, like: dynamically-linked libraries what debuggers have to do when mucking around in an object having multiple versions of FP-handling code for different FP units [thank goodness this is rapidly going the way of the dodo in microprocessor-land, what with: MIPS, SPARC, 88K, 80486, and 68040 all having 1 form of FP instructions apiece.] -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086