[comp.compilers] Compiler Output

gm@amdahl.amdahl.com (G. M. Harding) (12/09/87)

     Rainer Glaschick of Nixdorf Computer AG offers a good
summary of the issues involved in deciding on a compiler's
output format (assembly source vs. relocatable object). As the
Manager of UTS Languages (UTS is our Unix port) at Amdahl
Corp., I'm mulling over such issues right now. (No, I'm not
announcing any new products, and wouldn't even if I had any,
and furthermore am not trying to imply that I either do or
don't have any, but realistically, compilers have to evolve
just like other programs, and it pays to plan ahead.)

     My personal bias (for this is a highly subjective and
emotion-laden area) favors the generation of assembler source
code. For one thing, that is standard Unix practice. For
another thing, as Rainer says, you have to offer the option
of generating source anyway, and it's redundancy on an immense
scale to incorporate assembler-like logic (and possibly even
disassembler-like logic) in the compiler's back end. After all,
the whole spirit of Unix is: One task, one program.

     But though I consider these reasons persuasive, I do not
consider them compelling. The real reasons for generating
assembler source are:

     (a)  It's less confusing for the person who writes the
back end. Compilers have enough bugs as it is; it can only make
things worse if the engineer responsible for code generation
has to deal with an additional level of complexity. Besides,
from a development standpoint, separating the two types of
translation allows work on the compiler and the assembler to
proceed in parallel.

     (b)  If you emit source, you can (and jolly well should)
design the compiler to read stdin and write stdout. Then, you
can actually run it as a stand-alone, interactive program.
Needless to say, this simplifies compiler debugging enormously.
For example, on UTS/580, you can type:

               $ /lib/ccom
               static int j = 17;
               .data
               ds       0f
               j:
               dc f'17'
               .data
                  .
                  .
                  .
               ^D

     I agree that compile-time speed is an important design
consideration (though nowhere near as important as run-time
speed). Since the "cc" program invokes compiler and assembler
separately, it requires two separate text translation steps,
and thus is less than optimally efficient. But modern compu-
tational speeds are making this less and less of an issue,
especially (subtle plug) on Amdahl machines. Bear in mind, too,
that the C compiler is the most important software generation
tool on any Unix system, and a language like C compiles fairly
efficiently simply by virtue of its assembler-like constructs.
It's quite possible, given today's technology, to produce an
optimizing C compiler which will generate code almost as tight
as a skilled assembler programmer could write, and once you
have a good C compiler, you write all your other compilers
in C.

     Rainer is mistaken about one thing: You can, indeed, pass
debugging information from compiler to assembler. The assembler
must be prepared to accept this information (usually in the
form of special pseudo-ops), but it isn't terribly difficult
to modify the assembler accordingly.

     An 8086 compiler which gives the user grief over a symbol
named AL is an extremely poor compiler. It certainly isn't up
to Unix standards (which, according to one's religious convic-
tions, may or may not be saying much). The usual convention of
prepending an underbar to non-local symbols is straightforward
enough for me, and completely eliminates symbol clashes.

Foregoing opinions (C) 1987 by G. M. Harding; may not be
attributed, reattributed, rehashed, or repeated without an
express written disavowal of any meaningful content whatsoever.
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

rab@mimsy.UUCP (Bob Bruce) (12/10/87)

An indirect advantage of generating intermediate assembly code is
that this encourages the design of fast and simple assemblers.
This usually makes them much easier for human users as well.
I have written assembly language programs on many machines, under
several operating systems, and I am very impressed by the
elegant simplicity of unix assemblers.
	-bob

[You would hope so, though Unix assemblers have been no more immune to
the feature disease than any other.  On the other hand, none of them have
been as bad as IBM OS/360 Assember F which made four separate passes over
the source.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

preston@BBN.COM (12/12/87)

In article <777@ima.ISC.COM> you write:

>My personal bias ...favors the generation of assembler source code.
>...you have to offer the option of generating source anyway, and it's
>redundancy on an immense scale to incorporate assembler-like logic (and
>possibly even disassembler-like logic) in the compiler's back end.

The main objection to generating assembler source is fact that a large
fraction of the time in the assembler is spent in lexical analysis.
Especially in the case of simple, fast assemblers.  It seems kind of
silly to convert the compiler's notion of the instructions that should
be generated into text, and then have the assembler convert the text
back into an internal representation.  

At the same time seems a duplication of effort to have _both_ the
assembler and compiler know the grittier details of generating
instructions and object files.  (You've heard this before).

It would seem that you could answer both objections by having the back
end of the compiler generate a simple token stream.  The token stream
would be identical to what the assembler would have gotten from lexical
analysis of assembler source text.

For example, if you thought of a conventional compiler assembler
combination as:

	compiler:
		<front end of compiler>
		<instruction to text conversion>
	assembler:
		<text to token stream (lexical analysis)>
		<back end of assembler>

A compiler that generated a token stream could look like:

	compiler:
		<front end of compiler>
		<instruction to token conversion>
	(assembler:)
		<back end of assembler>

The point is to make the token stream format as simple as possible,
essentially identical to what the assembler would have derived from the
lexical analysis of equivalent assembler source.  This keeps the
interface between the compiler and assembler as "narrow" as possible.

A simple token stream format would make it trivial to write a token
stream to text filter.

Other interesting combinations are, to generate assembler source:

		<front end of compiler>
		<instruction to token conversion>
		<token stream to text filter>

To debug the assembler front end:

		<text to token stream (lexical analysis)>
		<token stream to text filter>

I'm not claiming that any of this is original.  In fact, I'll bet that
some of the compilers out there are implemented in this way.
-- 
Preston L. Bannister
USENET	   :	ucbvax!trwrb!felix!preston
BIX	   :	plb
CompuServe :	71350,3505
GEnie      :	p.bannister
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

johnl@ima.UUCP (12/20/87)

..> Compilers generating assembly or object code

An example: PCC compilers usually generate assembly, so there are
utilities like "inline" in BSD 4.3 that perform inline expansion
of assembly language functions, and permit access via what seems to be
a function call to C, to all the special instructions of your machine,
without paying the overhead of an assembly language wrapper.

Eg. I just finished inlining all the special instructions on my machine, but
notably double precision arithmetic, access to special registers, and a
few special arithmetic operations like Shift and Count Zeroes - FFS.
All of these instructions fell naturally into our calling sequence,
so could be directly inlined to look like a function.

Of course, it's nicer when the compiler does inlining, but I haven't seen
cross-language inlining yet. Providing assembly language intermediate
code permits the creation of special purpose optimizations like BSD inline,
without having to learn the internal structure of the compiler.

And now I'm going to look at writing an improved assembly to assembly
lnaguage pipeline optimizer and code reorganizer. But, I don't want to
learn the innards of our compiler to do this.

Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
aglew@mycroft.gould.com    ihnp4!uiucdcs!ccvaxa!aglew    aglew@gswd-vms.arpa
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request