gm@amdahl.amdahl.com (G. M. Harding) (12/09/87)
Rainer Glaschick of Nixdorf Computer AG offers a good summary of the issues involved in deciding on a compiler's output format (assembly source vs. relocatable object). As the Manager of UTS Languages (UTS is our Unix port) at Amdahl Corp., I'm mulling over such issues right now. (No, I'm not announcing any new products, and wouldn't even if I had any, and furthermore am not trying to imply that I either do or don't have any, but realistically, compilers have to evolve just like other programs, and it pays to plan ahead.) My personal bias (for this is a highly subjective and emotion-laden area) favors the generation of assembler source code. For one thing, that is standard Unix practice. For another thing, as Rainer says, you have to offer the option of generating source anyway, and it's redundancy on an immense scale to incorporate assembler-like logic (and possibly even disassembler-like logic) in the compiler's back end. After all, the whole spirit of Unix is: One task, one program. But though I consider these reasons persuasive, I do not consider them compelling. The real reasons for generating assembler source are: (a) It's less confusing for the person who writes the back end. Compilers have enough bugs as it is; it can only make things worse if the engineer responsible for code generation has to deal with an additional level of complexity. Besides, from a development standpoint, separating the two types of translation allows work on the compiler and the assembler to proceed in parallel. (b) If you emit source, you can (and jolly well should) design the compiler to read stdin and write stdout. Then, you can actually run it as a stand-alone, interactive program. Needless to say, this simplifies compiler debugging enormously. For example, on UTS/580, you can type: $ /lib/ccom static int j = 17; .data ds 0f j: dc f'17' .data . . . ^D I agree that compile-time speed is an important design consideration (though nowhere near as important as run-time speed). Since the "cc" program invokes compiler and assembler separately, it requires two separate text translation steps, and thus is less than optimally efficient. But modern compu- tational speeds are making this less and less of an issue, especially (subtle plug) on Amdahl machines. Bear in mind, too, that the C compiler is the most important software generation tool on any Unix system, and a language like C compiles fairly efficiently simply by virtue of its assembler-like constructs. It's quite possible, given today's technology, to produce an optimizing C compiler which will generate code almost as tight as a skilled assembler programmer could write, and once you have a good C compiler, you write all your other compilers in C. Rainer is mistaken about one thing: You can, indeed, pass debugging information from compiler to assembler. The assembler must be prepared to accept this information (usually in the form of special pseudo-ops), but it isn't terribly difficult to modify the assembler accordingly. An 8086 compiler which gives the user grief over a symbol named AL is an extremely poor compiler. It certainly isn't up to Unix standards (which, according to one's religious convic- tions, may or may not be saying much). The usual convention of prepending an underbar to non-local symbols is straightforward enough for me, and completely eliminates symbol clashes. Foregoing opinions (C) 1987 by G. M. Harding; may not be attributed, reattributed, rehashed, or repeated without an express written disavowal of any meaningful content whatsoever. -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
rab@mimsy.UUCP (Bob Bruce) (12/10/87)
An indirect advantage of generating intermediate assembly code is that this encourages the design of fast and simple assemblers. This usually makes them much easier for human users as well. I have written assembly language programs on many machines, under several operating systems, and I am very impressed by the elegant simplicity of unix assemblers. -bob [You would hope so, though Unix assemblers have been no more immune to the feature disease than any other. On the other hand, none of them have been as bad as IBM OS/360 Assember F which made four separate passes over the source. -John] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
preston@BBN.COM (12/12/87)
In article <777@ima.ISC.COM> you write: >My personal bias ...favors the generation of assembler source code. >...you have to offer the option of generating source anyway, and it's >redundancy on an immense scale to incorporate assembler-like logic (and >possibly even disassembler-like logic) in the compiler's back end. The main objection to generating assembler source is fact that a large fraction of the time in the assembler is spent in lexical analysis. Especially in the case of simple, fast assemblers. It seems kind of silly to convert the compiler's notion of the instructions that should be generated into text, and then have the assembler convert the text back into an internal representation. At the same time seems a duplication of effort to have _both_ the assembler and compiler know the grittier details of generating instructions and object files. (You've heard this before). It would seem that you could answer both objections by having the back end of the compiler generate a simple token stream. The token stream would be identical to what the assembler would have gotten from lexical analysis of assembler source text. For example, if you thought of a conventional compiler assembler combination as: compiler: <front end of compiler> <instruction to text conversion> assembler: <text to token stream (lexical analysis)> <back end of assembler> A compiler that generated a token stream could look like: compiler: <front end of compiler> <instruction to token conversion> (assembler:) <back end of assembler> The point is to make the token stream format as simple as possible, essentially identical to what the assembler would have derived from the lexical analysis of equivalent assembler source. This keeps the interface between the compiler and assembler as "narrow" as possible. A simple token stream format would make it trivial to write a token stream to text filter. Other interesting combinations are, to generate assembler source: <front end of compiler> <instruction to token conversion> <token stream to text filter> To debug the assembler front end: <text to token stream (lexical analysis)> <token stream to text filter> I'm not claiming that any of this is original. In fact, I'll bet that some of the compilers out there are implemented in this way. -- Preston L. Bannister USENET : ucbvax!trwrb!felix!preston BIX : plb CompuServe : 71350,3505 GEnie : p.bannister -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (12/20/87)
..> Compilers generating assembly or object code An example: PCC compilers usually generate assembly, so there are utilities like "inline" in BSD 4.3 that perform inline expansion of assembly language functions, and permit access via what seems to be a function call to C, to all the special instructions of your machine, without paying the overhead of an assembly language wrapper. Eg. I just finished inlining all the special instructions on my machine, but notably double precision arithmetic, access to special registers, and a few special arithmetic operations like Shift and Count Zeroes - FFS. All of these instructions fell naturally into our calling sequence, so could be directly inlined to look like a function. Of course, it's nicer when the compiler does inlining, but I haven't seen cross-language inlining yet. Providing assembly language intermediate code permits the creation of special purpose optimizations like BSD inline, without having to learn the internal structure of the compiler. And now I'm going to look at writing an improved assembly to assembly lnaguage pipeline optimizer and code reorganizer. But, I don't want to learn the innards of our compiler to do this. Andy "Krazy" Glew. Gould CSD-Urbana. 1101 E. University, Urbana, IL 61801 aglew@mycroft.gould.com ihnp4!uiucdcs!ccvaxa!aglew aglew@gswd-vms.arpa -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request