[comp.compilers] Who should convert literals to integers?

Tom.Lane@ZOG.CS.CMU.EDU (08/22/88)

> Does anyone else think that converting a series of digits into an integer
> is inappropriate for a lexical analyser? It seems to be a very common
> thing to do, but I can see practically no advantages to it, and several
> disadvantages.

The main reason for converting constants to binary is so the compiler
can do arithmetic on them.  Somebody already mentioned constant folding,
but nobody has yet pointed out the most crucial case where the compiler
must do this: where the constants in question are array subscript
bounds.  You *must* do arithmetic at compile time to do storage
allocation!  (Unless you want to use a dope vector and run-time
storage allocation for every array, which is mighty expensive.)

I once worked on a cross-compiler that ran on a 16-bit-integer machine
but produced code for a 32-bit-integer machine.  Integer constants
smaller than 32k were converted to binary, but we left larger ones in
text form until the assembly pass.  Users weren't allowed to declare
arrays of more than 32k elements...  [That compiler also left floating
point constants in text form, mainly for accuracy reasons: the
machines' floating point formats differed.]

-- 
				tom lane
Internet: tgl@zog.cs.cmu.edu
UUCP: <your favorite internet/arpanet gateway>!zog.cs.cmu.edu!tgl
BITNET: tgl%zog.cs.cmu.edu@cmuccvma
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

markhall@pyramid.pyramid.com (Mark Hall) (08/30/88)

In article <2299@ima.ima.isc.com> Tom.Lane@ZOG.CS.CMU.EDU writes:
>
>The main reason for converting constants to binary is so the compiler
>can do arithmetic on them.  Somebody already mentioned constant folding,
>but nobody has yet pointed out the most crucial case where the compiler
>must do this: where the constants in question are array subscript
>bounds.  You *must* do arithmetic at compile time to do storage
>allocation!  (Unless you want to use a dope vector and run-time
>storage allocation for every array, which is mighty expensive.)
>
>[stuff deleted] Users weren't allowed to declare
>arrays of more than 32k elements...  [stuff deleted]
>

If anyone out there is writing a compiler from scratch, please
don't do what is described above.  Whenever compile-time
arithmetic is required, calls should be made to a retargetable
module which can carry out the computation in the same precision
and semantics of the target machine.  If the host has more
precision than the target, then the representation possibly
`must' be strings, but it's far from impossible to do arithmetic
on strings!  Other possibilities exist.  One might be able to
successfully represent target `int's using host double-precision
floating point (this worked for one host-target pair that I
wrote a compiler for).  On pg. 97 of:

	%A William Waite
	%A Gerhard Goos
	%T Compiler Construction
	%I Springer-Verlag
	%C New York, NY
	%D 1984

there is a more elaborate description of how (and why) this can be done.
You might not think your compiler will get targeted to another product
line, but just when you least suspect it, the CEO will drop in and insist
that it be done in 1 month!
[From markhall@pyramid.pyramid.com (Mark Hall)]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

tgl@zog.cs.cmu.edu (Tom Lane) (09/05/88)

In article <2299@ima.ima.isc.com> I wrote:
>I once worked on a cross-compiler that ran on a 16-bit-integer machine
>but produced code for a 32-bit-integer machine.  Integer constants
>smaller than 32k were converted to binary, but we left larger ones in
>text form until the assembly pass.  Users weren't allowed to declare
>arrays of more than 32k elements...

In article <2370@ima.ima.isc.com>, markhall@pyramid.pyramid.com (Mark Hall)
replied:
>If anyone out there is writing a compiler from scratch, please
>don't do what is described above.  Whenever compile-time
>arithmetic is required, calls should be made to a retargetable
>module which can carry out the computation in the same precision
>and semantics of the target machine.  If the host has more   [less? TL]
>precision than the target, then the representation possibly
>`must' be strings, but it's far from impossible to do arithmetic
>on strings!

I agree *in principle*.  In practice there are some other considerations.
The compiler I described was a bootstrap system, which we fully intended to
scrap once we had a stable development platform on the target hardware.
In that context, building a multiple-precision integer arithmetic package
just wasn't worth the effort; the compiler's (not very severe) restrictions
could be lived with.

For a production cross-compiler, it would make sense to do things as
Hall suggests.  Note that the implications are extensive: for example, the
offsets to local variables in a procedure's stack frame would need to be
target-integers.  Thus doing it right impacts the compiler's symbol table,
as well as virtually all aspects of code generation.  In currently popular
systems programming languages, the notational inconveniences alone would be
a tremendous problem ("add(x,convert_int(1))" instead of "x+1").

Dealing with floating-point arithmetic is much harder.  For instance,
I recall reading horror stories about early Fortran systems in which
compile-time conversion of a floating point constant could give a
different result than run-time input of the same character string.
A cross-compiler that does constant-expression folding is going to have
a very hard time ensuring that it gets exactly the same result as run-time
evaluation would.  (This may get easier in future, as more machines
are built to the IEEE floating-point standards.)  The only good aspect
of the situation is that cross-compilers are usually used for
development of systems software, in which optimization of floating point
arithmetic isn't much needed.  Therefore, the problem can be bypassed
by passing F.P. constants through in text form, and not attempting to
precompute any constant F.P. expressions... which is exactly what we did,
as did some other recent posters.  (Then the only problem is correctly
converting F.P. constants to bit strings in the cross-assembler.)

The article that started this discussion proposed the pass-through,
"hands-off" approach for *all* constants, integer as well as floating point.
The point I tried to make is that the semantics of programming languages
often require the compiler to do calculations with integer constants; so the
fully hands-off approach is not workable.  (Compile-time floating-point
operations are never required in Pascal or C; I'm not too sure about Ada.)
Hall's point is that having to do calculations does not mean having to
assume that host-integers are the same as target-integers.  This is a valid
point, and is probably the right attitude to take in a production quality
cross-compiler; but the cost is not trivial.

-- 
				tom lane
Internet: tgl@zog.cs.cmu.edu
UUCP: <your favorite internet/arpanet gateway>!zog.cs.cmu.edu!tgl
BITNET: tgl%zog.cs.cmu.edu@cmuccvma
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

wendyt@pyrps5.pyramid.com (Wendy Thrash) (09/09/88)

In article <2492@ima.ima.isc.com> tgl@zog.cs.cmu.edu (Tom Lane) writes:
>A cross-compiler that does constant-expression folding is going to have
>a very hard time ensuring that it gets exactly the same result as run-time
>evaluation would.  (This may get easier in future, as more machines
>are built to the IEEE floating-point standards.) . . .
>[T]he problem can be bypassed by passing F.P. constants through in text form,
>and not attempting to precompute any constant F.P. expressions . . .  (Then
>the only problem is correctly converting F.P. constants to bit strings in the
>cross-assembler.)

Actually, IEEE 754 raises new questions about compile-time floating point
while it's answering some old ones.  For example, since rounding mode can
affect the value of a conversion, and rounding mode can be set at runtime
(though not easily, in most languages) one could argue that conversions from
character strings (e.g. 1.0) into f.p. numbers (e.g. 0x3f800000) should be
done at runtime, not at compile or assembly time.

As for performing f.p. arithmetic on constants at compile time, I have mixed
feelings.  It's true that constant folding could clear up garbage left over
from the use of #define, but it certainly defeats any attempts I may be making
to control rounding mode.  Moreover, I'm concerned about the application of
arithmetic "identities" at compile time: if I write  y = (x - 1.0) + 1.0;
there's a very good reason for it, and I don't want the compiler to mess it up,
no matter what it is allowed to do by the language definition.  Please, at
least honor my parentheses in floating-point computations.  If you're ignoring
parentheses in the course of optimization, give me a way to stop you from
doing that, without disabling optimization completely.

Remember that f.p. numbers are not quite real numbers.  For example,
	double x;
	if (x != x) do_something();
can result in a call to do_something() if x is a NaN (IEEE 754 not-a-number).

Floating-point code is strange stuff.  Many battles are yet to be fought
between compiler writers and f.p. users.  (See the recent discussion begun
by David Hough's comments on ANSI C in comp.lang.c.)  Please take care not
to optimize f.p. codes into meaninglessness.
[From wendyt@pyrps5.pyramid.com (Wendy Thrash)]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request