[comp.compilers] Why can't we build a C compiler?

guthery@uunet.uu.net (Scott Guthery) (12/19/88)

"C is not a `very high level' language, nor a `big' one ...".   Its syntax 
and semantics are well-defined.  A C compiler is probably a relatively small 
and not a particularly complicated computer program.  We have constructed 
a number of very powerful of tools to help us build C compilers.  We have 
an extensive theoretical and practical compiler literature going back at 
least 30 years.  We have tens of thousands of man-years experience in trying 
to build C compilers.  And yet I claim there does not exist a compiler for
the C programming language.  There are a lot of programs that are close but 
close shouldn't count.  Every program I know of which calls itself a C 
compiler is at curent version level greater than 1.00 and will correct a 
couple more bugs in the next release.

Why is this?  What does it say about us and our craft?  Is it even possible 
to build a C compiler?  If so, when do you think there will be one?   How
will we recognize it if it does happen to come into being? 

Finally, if after hundreds of attempts we can't build a little 10,000 line 
utility for ourselves why in the world do we think we can build all the 
programs we work on every day?  We are certainly kidding the folks that 
pay us and we're also doing a pretty good job of kidding ourselves.

+*+*+*+*+*+*+*+*+*+*+*+*+ Austin Code Works +*+*+*+*+*+*+*+*+*+*+*+*+*+*+**+*+
Domain:  acw!guthery@uunet.uu.net         Snail: 11100 Leafwood Lane          
Office:  guthery%asc@sdr.slb.com          FAX:   +1 (512) 258-1342
FidoNet: 1:382/12                         Voice: +1 (512) 258-0785            
Packet:  N5MDE @ KB5PM                    TELEX: 446370 (austincodewrks)
[This point is well taken.  I would think that with all of the theory
available to compiler writers, compilers would be prime targets for software
engineering techniques such as formal verification, of parts at least.  Beats
me why not.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

worley@EDDIE.MIT.EDU (Dale Worley) (12/20/88)

I think that part of the problem is the C is not all that well
defined.  There are numerous tricky spots in the language that (until
the advent of ANSI C) were not standardized.  Consider the different
tricks that were used to concatenate and stringize tokens.  These were
so horrible that the ANSI committee eliminated them entirely and
invented the # and ## operators out of the whole cloth.

Another cute one is the following:

	typedef int x;
	struct x {
		x x;
		int counter;
		};

Is this legal?  According to Harbison and Steele, "structure and union
field names are in a different overloading class than objects and
typedef names".  As I read this, it means that the fourth occurrence
of "x" above is legitimate.  Am I right?  Who knows?

Again, the requirement that declarations govern not the entire
containing block, but only the portion of the block following the
declaration leads to tricky points.  In Ada this convention led to
many paragraphs defining exactly where the declared object became
accessible.  In C the question was simply ignored.

Etc. etc.  Compounding this is that the de-facto standard, Kernigan
and Ritchie, is not written as a genuine language reference manual,
but a tutorial.  In practice, C has been defined by "what the compiler
does", which leads to numerous ambiguities and inconsistencies.

Compare the definition of C to that of Algol 68 -- In Algol 68 the
compilation task is relentlessly well-defined, although it's not
always clear that it's *possible*.  But a compiler for a language of
C's complexity defined with the formality of Algol 68 would be a
class project, not an engineering feat.

     Finally, if after hundreds of attempts we can't build a little
     10,000 line utility for ourselves why in the world do we think we
     can build all the programs we work on every day?

     We are
     certainly kidding the folks that pay us and we're also doing a
     pretty good job of kidding ourselves.

Actually, the poor customer is doing OK -- He has to get the work out
the door, and a compiler that is 99.5% correct is far more useful to
him than none at all.

But this leads to a concept: Not only should we design a language to
be easily comprehensible to the user, but also easily comprehensible
to the compiler.  (These goals should be synergistic, since things
that are hard to parse are likely to be hard to read as well.)  Some
guidelines are:

Similar-looking tokens should not have different gramatical uses
depending on how they are declared.  Examples are C identifiers
(typedef names and objects) and Algol 68 bold-words (mode names and
operators).

A declaration should be effective throughout the entire block in which
it appears, rather than starting at the point of declaration.  This
makes it impossible to write a one-pass compiler, but it simplifies
the definition of the language semantics, and makes it *far* easier to
formalize the semantics.

Avoid features that can only be defined at the lower levels of
abstraction (e.g., tokenization, parsing).  For instance, the C
preprocessor is *impossible* to define except as a pre-pass before
parsing.  This makes it hard to build, e.g., an incremental compiler
for C.  (Unfortunately, the preprocessor is really great for making
code portable.  There is something that should be studied here...)

Dale
--
Not, of course, the opinions of my employer.
Dale Worley, Compass, Inc.                      mit-eddie!think!compass!worley
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

worley@EDDIE.MIT.EDU (Dale Worley) (12/20/88)

Another rule for decent language design:

If the grammar that you want to present to the user in the reference
manual isn't LALR(1), the language design should be considered
deficient.

Following rules like this in language design should get rid of much of
the work of compiler writing, since they eliminate features of the
language that make compiler writing hard, but don't add anything
useful for the user.

Dale
--
Not, of course, the opinions of my employer.
Dale Worley, Compass, Inc.                      mit-eddie!think!compass!worley
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

pardo@june.cs.washington.edu (David Keppel) (12/20/88)

acw!guthery@uunet.uu.net (Scott Guthery) writes:
>[Why can't we build a C compiler?]

I certainly have to agree with the text of Scott's article.  A couple
things to think about:

* One standard implementation of P() and V() (semaphore operations)
  was proposed, I believe, in the early 60's.  It is 20 lines of code
  in nearly any language (even assembly!), yet it took 15+ years and
  dozens (hundreds?) of implementations before we all found out that
  there was a little itsy-bitsy bug in it.  One line was in the wrong
  procedure and every once in a while you'd lose a race.

* Proofs of correctness are very useful, particularly in things such
  as P() and V(), but they are hard to do straight and hard to
  automate.  My advisor of days gone by told me ``proofs of
  correctness, done by humans, anyway, are subject to bugs just like
  the programs that they are supposed to prove.''  Good point.

* The implementor is often not sure of the specification of some
  detail of the protocol[*] they are trying to implement.  One look
  at comp.std.c should be enough to convince you that there are lots
  of ``gray areas'' in the ``standard'' C.  It is essentially
  impossible to fully-specify all but the simplest protocols (even if
  just to remember to say ``behavior is implementation-defined''),
  and, given current systems, a too-detailed specifcation can
  sometimes lead to performance problems (e.g., using IEEE
  floating-point math vs. Cray floating-point math).

Footnote [*]: essentially all programs can be thought of as protocols.

Realize, also, that a compiler is a bit of an obfuse example.  The
code generator for a given computer is ``useless'' == untestable for
any other computer, and many parts of the compiler depend on the
correctness of the code generator.  In one sense this makes Scott's
points even more relevant: it even more important to ``get it right
the first time''.  On the other hand, the points that Scott made are
all true of machine-independent programs as well.  Well-known examples
are, perhaps, harder to find, but more people might agree about the
nature of some of the bugs.

	;-D on  ( Folowup discussions of SDI to ??? )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

nick@lfcs.ed.ac.uk (Nick Rothwell) (12/20/88)

>"C is not a `very high level' language, nor a `big' one ...".   Its syntax 
>and semantics are well-defined.

I don't think this is the case, although I'm probably out of touch
with C standards. I thought there were problems with aliasing and so
on which weren't properly resolved. And I won't even mention the
hornet's nest that's opened when you start to think of all the
interactions with the macro preprocessor.

John's comment:
>[This point is well taken.  I would think that with all of the theory
>available to compiler writers, compilers would be prime targets for software
>engineering techniques such as formal verification, of parts at least.  Beats
>me why not.  -John]

This is the approach we've taken with the ML project. Standard ML is,
to my knowledge, the only language to have a complete formal semantics
(which is *not* huge - 97 pages). Does Ada have one? How big is it?
The semantics serves as a reference for building compilers - there
are, I believe, 3 compilers respecting the semantics.  I'm not going
to claim that they're all bug-free or absolutely conformant yet - ML
is a language very different from conventional languages like C or
PASCAL, and the implementation techniques are still very new - but the
task of compiler development is possible with small teams of compiler
writers (by small, I mean 1 or 2 persons).

		Nick.
--
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcvax!ukc!lfcs!nick
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

peterd@june.cs.washington.edu (Peter C. Damron) (12/22/88)

I think that Scott has some good points about the state of compiler
technology. Reliability and correctness have generally taken a back seat to
performance, efficiency, and expediency. However, I think that several issues
are mis-stated or mis-understood.

In article <3070@ima.ima.isc.com>, acw!guthery@uunet.uu.net (Scott Guthery) writes:
> "C is not a `very high level' language, nor a `big' one ...".

It is big enough to cause problems.

> Its syntax and semantics are well-defined.

While I would agree that the syntax is well defined, I cannot say the same
about the semantics.  Note the difficulty in standardizing C.  There is
no formal semantic specification for the language.

Likewise, there is no formal semantic specification of the target machines.
This lack of formal semantics means that we cannot verify correctness of the
compiler even if we had the tools to do it. Of course, we cannot prove any but
the smallest of programs correct anyway.

> A C compiler is probably a relatively small and not a particularly
> complicated computer program.

I would argue that compilers are one of the most complex classes of computer
programs in regular use. Operating systems are the only class of programs I
can think of that are probably more complex.  [How about data bases?  -John]

> We have constructed a number of very powerful of tools to help us
> build C compilers.

The purpose of tools is to increase our assurance that the implementation
is correct.  Since no single tool builds the complete compiler,
the interactions between different tools can still cause problems.

We are working on better tools for compilers, but it takes time (and money).
Thirty or so years is not very long in the development of a technology.

> We have an extensive theoretical and practical compiler literature
> going back at least 30 years.

> Is it even possible to build a C compiler?
> How will we recognize it if it does happen to come into being? 

These are good questions. Although we may achieve the first, we may never be
able to prove it. We don't yet have the techniques to do the second, and we
may never.

Writing compilers, as with all software, is an engineering effort.
We try to build something that works within constraints on time and performance.

Testing is the closest we can coming to assuring ourselves of the correctness
of the compiler implementation. But testing is not well formalized either.
Complete testing might be too expensive and time consuming anyway.

I think that we have made enough progress in formal semantics that we can
specify the semantics of the language or the computer, but the specification
is as large and complex as a program, and is thus subject to error.

We do not have tools (or theory?) to compare a sentence in one semantic
domain to a sentence in another semantic domain.  Thus we cannot compare
a translation with the original to verify that we have preserved
the original semantics.

Peter C. Damron
Dept. of Computer Science, FR-35
University of Washington
Seattle, WA  98195

peterd@cs.washington.edu
{ucbvax, decvax, ...}!uw-beaver!uw-june!peterd
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

henry@zoo.toronto.edu (Henry Spencer) (12/22/88)

>  ... My advisor of days gone by told me ``proofs of
>  correctness, done by humans, anyway, are subject to bugs just like
>  the programs that they are supposed to prove.''  Good point.

Note, in particular, an old but still relevant paper:  "Observations
of Fallibility in Applications of Modern Programming Methodologies",
Gerhart and Yelowitz, IEEE Transaction on Software Engineering, Sept 76.
This study found errors in published programs by folks like Wirth, Naur,
London, and Wulf -- some in programs used as examples of how to prove
correctness!  Considering the unquestionable qualifications of their
authors, the multiple reviews involved in publishing in a textbook or
refereed journal, and the relatively small and tractable problems
attacked, this doesn't exactly fill one with confidence about rigorous
verification of a compiler.  At least, not without extensive software
assistance.

                                     Henry Spencer at U of Toronto Zoology
                                 uunet!attcan!utzoo!henry henry@zoo.toronto.edu
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

seanf@sco.uucp (Sean Fagan) (12/24/88)

In article <3070@ima.ima.isc.com> acw!guthery@uunet.uu.net (Scott Guthery) writes:
>And yet I claim there does not exist a compiler for
>the C programming language.  There are a lot of programs that are close but 
>close shouldn't count.  Every program I know of which calls itself a C 
>compiler is at curent version level greater than 1.00 and will correct a 
>couple more bugs in the next release.

Because people like to put in little features, extend it in various ways,
get it to do more and better things.  Code generation is still an art, not
a science, and one man's code generation algorithm might be another man's
nightmare.

pcc does a *real* good job, if you don't allow the optimizer to get in.  It
also is one of the least-featured compilers around.  Small C generates
more-or-less bug-free code for the subset it works with.  Microsoft C, when
it actually generates code instead of getting an assert(*), usually
generates good code.
------------
* I work on Microsoft C, and thus see more of the bugs than most other
people do.  Yes, there are some constructs *guaranteed* to get an assert or
core-dump in the current version, and yes, we do try to get rid of them.  It
is still a decent compiler, generates good code, given the machine it's
generating for, and I like it.  End of comment 8-).
-------------
A non-featured C compiler doesn't have much of a chance any more.  It either
has to be faster, or generate better code, or have its own little
enhancements, or the writers feel it won't sell.  Each of these introduces a
larger possibility of error.

Now, flame time:
  You think it's easy?  Fine, you write a nice, dpANSI-conformat C compiler,
and make sure it has *no* bugs.  Then, you try to sell it for the amount
you'll need to recover the costs you spent developing it.  It ain't easy,
buddy.  Nor is it fun.
End flame.
-- 
Sean Eric Fagan  | "Merry Christmas, drive carefully and have some great sex."
seanf@sco.UUCP   |     -- Art Hoppe
(408) 458-1422   | Any opinions expressed are my own, not my employers'.
[I suppose that pcc does a good job at generating correct code, but it
does such a bad job of generating good code that I gather that fewer and
fewer Unices still use it, except perhaps in heavily modified form.  And
nobody said it was easy, the original question was (more or less) why is
it so hard?  But I don't understand why Microsoft still refuses to give
out their compiler bug list.  It's extremely annoying to call them to report
a bug only to be told that they alreay know about it but there's no way I
could have known that.  Thanks a lot.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

daveb@lethe.uucp (David Collier-Brown) (12/27/88)

> I think that Scott has some good points about the state of compiler
> technology. Reliability and correctness have generally taken a back seat to
> performance, efficiency, and expediency.

  I wonder if we're also still suffering from what the CISC people
initially proposed we were being done in by: a large semantic gap.

  I admit to knowing (a subset) of C relatively well, and a few
machine architectures.  By rights, I should be able to write a code
generator for the language, mapping it into the machine
instruction-sequences which "best" describe it.  Yet I know from
experience that I cannot predict with any degree of correctness a
best translation for various idioms in all contexts, and in my
student days had some fun even getting all the cases I **did** do
correct...

  There are at least three reasons for this:
	1) I'm dumber than I think I am (trivially true),
	2) The case analysis is larger than expected, or
	3) I just made more mistakes than I should have.
  Experience has shown that about half of the problem was #3.  But
what about the other 50%?

  I propose that the case analysis is difficult because it is large,
and that one of the reasons it is large is that there are really a
large number of describable levels of detail between a description
of the machine (formal or informal) and the description of the
language I want to map to the machine.
  This is, as I said, not a new claim.  It was one of the ones that
led to companies like DEC and Hoenywell defining CISC (and vCISC)
machines.  I'm not necessarily sure they narrowed the gap, mind you.

  Taking a leaf from the book of the specification-writers, I wonder
if a series of translations (not just the few intermediate forms we
currently use) would make the problem more tractable.  If you use
levels of the Larch Shared Language to define an implementation of a
task, I wonder if you cannot doe exactly the same thing with the
levels of a translation.

--dave (yes, I recollect leveled programming-language families,
        bu I also recollect they were slightly primitive, too) c-b
-- 
 David Collier-Brown.  | yunexus!lethe!dave
 Interleaf Canada Inc. |
 1550 Enterprise Rd.   | He's so smart he's dumb.
 Mississauga, Ontario  |       --Joyce C-B
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

olender@rachmaninov.CS.ColoState.EDU (Kurt Olender) (12/29/88)

David Collier-Brown makes a good point.  One of the main factors that makes
producing a good+correct code generator so difficult is the huge case analysis
required to map a given programming language construct to the best equivalent
code on most machine architectures.

Wulf makes a good case for reducing this case analysis by proper
considerations when designing the architecture of the machine in the first
place[1].  This does not imply a CISC philosophy.  He argues that it is not
the semantic gap but the irregularities and non-orthogonalities in the
architecture that dramatically inflates the number of cases with which a code
generator must deal.  In fact, he claims that hardware designers' attempts to
reduce the semantic gap have increased the case analysis required.  The higher
level operations provided do not always map directly to every language or are
too general to always be efficient, so that case analysis is required to
determine when the higher level operations are usable or efficient, and when
when they are not, to determine what else to do.

Half the blame, anyway, goes to the hardware designers. (The other half
certainly belongs with language designers and ad hoc semantics.)

[1] William A. Wulf, "Compilers and Computer Architecture", Computer,
July 1981, pp. 41-48.
--------------------------------------------------------
|Kurt Olender               | Computer Science Dept.   |
|olender@cs.colostate.edu   | Colorado State Univ.     |
|303-491-7015               | Fort Collins, CO 80523   |
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

frode@m2cs.naggum.se (Frode Odegard) (12/29/88)

In article <3099@ima.ima.isc.com>, nick@lfcs.ed.ac.uk (Nick Rothwell) writes:
> This is the approach we've taken with the ML project. Standard ML is,
> to my knowledge, the only language to have a complete formal semantics ...
> (which is *not* huge - 97 pages). Does Ada have one? How big is it

In the upcoming ISO standard for Modula-2, VDM is used to define the semantics
[as well as the syntax]. This will increase the safety of compilers. In
England they are working on a system which will produce and prove a compiler
for a language specified in VDM [I think this is in connection with the
Alvey-project]. I expect that there is and will be serious work done on
proving Modula-2 programs, taking advantage of the fact that the language is
defined in VDM.

BTW, I think a VDM standard is on its way :-)

			- Frode Odegard,  M2CS
PS: I think the worst "hack" in the VDM definition of Modula-2
    is how continuity is handled in connection with coroutines.
    I heard that the VDM ";" combinator is being overloaded ;->
    I'm not sure this is really a "hack"....my VDM stinks.
[From Frode Odegard <frode@m2cs.naggum.se>]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

jbs@fenchurch.mit.edu (Jeff Siegal) (01/04/89)

In article <3099@ima.ima.isc.com> Nick Rothwell <nick@lfcs.ed.ac.uk> writes:
>Standard ML is,
>to my knowledge, the only language to have a complete formal semantics
>(which is *not* huge - 97 pages). Does Ada have one? How big is it?

Scheme has a formal semantics which is about 3 or 4 pages long.

It probably doesn't serve much purpose to list other languages which
are formally defined (I'm sure there are some), but I did want to
point that they exist.

Jeff Siegal
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

jc@uunet.uu.net (Juergen Christoffel) (01/05/89)

  [...]
> [as well as the syntax]. This will increase the safety of compilers. In
> England they are working on a system which will produce and prove a compiler
> for a language specified in VDM [I think this is in connection with the
  [...]

Proving a compiler? OK, generating them by machine and checking them
out by machine will make them better or more reliable than a hand
written one once the generator/checker is working properly. But with
all those proofs done by machine: they will be valid only when the
program doing the proofs will be correct as well?! How do you proof
this? Where is the point where this 'bootstrapping' will start? Back
at set theory as the formal definitions of integer numbers?

========================================================================
  Juergen Christoffel, jc@gmdzi.UUCP, (++49 2241) 14-2421
  German National Research Laboratory for Computer Science (GMD)
  Schloss Birlinghoven, Postfach 1240, D-5205 St. Augustin 1, FRG
========================================================================
[Having tried to read the PL/I standard, I have to view with scepticism that
any formal specification of a non-trivial language could be credibly bug free,
at least using the kind of formalisms we have today.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

glcowin@Central.Sun.COM (Greg Cowin) (01/19/89)

Dale Worley writes the following:
> I think that part of the problem is the C is not all that well
> defined.  

And later writes the following: 
> Another rule for decent language design:
> If the grammar that you want to present to the user in the reference
> manual isn't LALR(1), the language design should be considered
> deficient.
> Following rules like this in language design should get rid of much of
> the work of compiler writing, since they eliminate features of the
> language that make compiler writing hard, but don't add anything
> useful for the user.

PROBLEM
-------
Overlooking the real problem of defining a programming language 
seems to be very common.  The problem is providing a formal
semantic definition.  This problem is not easily solved, although
it can be done.  

GRAMMARS
--------
I do not think that limiting ourselves to LALR(1) grammars 
is going to solve the problem.  Also, why limit ourselves to 
only to LALR(1) grammars.  In fact, to some degree the grammar
is insignificant.  Formalization of syntax is a simpler problem
than the formalization of semantics.

C 
-
Is C a (somewhat) portable language, because of the formalization
of semantics?  If it were, we would not be having these querky
problems.  

Could it be portable, because of porting the UN*X operating
to many different architectures?  Yes, isn't the UN*X operating
system a big test suite.  Even though it is big, it is still 
inadequate.  Note: I believe the preprocessor has also helped 
portability; therefore, the portability is somewhat inherent.


SOLUTION
--------  
The solution is not an easy one.  Presently, we usually describe
semantics informally and formal specification of semantics can be
complex and arcane.  Although I do not have a solution, it is 
an area that we should give more attention.  I certainly plan
to give it more attention.  After all, it is interesting and
a major part of the process that we enjoy.

SOMETHING TO THINK ABOUT
------------------------ 
Are C compilers lacking?  Or is it the formal specifications of the
C programming language?  Are not our compilers dependent on the
language specification?

-----------------------------------------------------------------------
Greg Cowin                     occrsh!uokmax!glcowin   or
University of Oklahoma         occrsh!uokmax!gpsun!glc   
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

limonce@pilot.njin.net (Tom Limoncelli) (01/24/89)

In article <3210@ima.ima.isc.com> uokmax!glcowin@Central.Sun.COM (Greg Cowin) writes:

> The solution is not an easy one.  Presently, we usually describe
> semantics informally and formal specification of semantics can be
> complex and arcane.  Although I do not have a solution, it is 
> an area that we should give more attention.  I certainly plan
> to give it more attention.  After all, it is interesting and
> a major part of the process that we enjoy.

One of the major break-throughs of Algol/Pascal/C-like languages is
that their syntax was completely specified in BNF (Backus-Naur Form?)
or a BNF-like syntax notation.  This allowed definitive specifications
and since it was machine readable it was possible to make
code-generators for languages that could be specified in BNF (or
BNF-like) notation.

Specifying *semantics* can be done without English (or a natural
language).  I can't remember the author but there's a book called
"The Denotational Description of Programming Languages" which is a
good beginners guide to the notation.  Yes, it is a bit difficult to
learn and takes quite a talent to actually use you can describe all of
the C language (and the libraries!) in this notation.  Don't ask ME to
do it but I'm sure someone could do it.  Semantics are quite
difficult.

Anyway... the point that I wanted to bring up is that in the next
10 years *someone* is going to come up with a new style of semantic notation.
This will be an easier notation to use, to read, and it will truly be
a break-through if it is machine-readable and can be used to make
automatic code-generators.

Now all we have to do is sit back and wait for someone to invent it!  :-)
-- 
 Tom Limoncelli -- tlimonce@drunivac.Bitnet -- limonce@pilot.njin.net
            Drew University -- Madison, NJ -- 201-408-5389
[One problem that I still haven't seen adequately addressed is the mapping
of abstract semantics onto real hardware.  With grammars, the grammar-
interpreting machine that you build via LR(1) or whatever perfectly implements
the grammar that it interprets.  With semantics, you have ugly problems with
the behavior that the hardware wants to give you, e.g. overflow, alignment,
rounding, and although it is possible to generate code that implements
arbitrary semantics, the cost of doing so if the semantics is very far from
that preferred by the hardware makes most of us compiler writers throw up
our hands and give you what the hardware gives you. -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

rayt@RELAY.CS.NET (R.) (01/26/89)

In article <3210@ima.ima.isc.com> Greg Cowin, in reference to C, 
indicates that semantics is a major problem in ANY language
definition, since the present methods of its specification are
either informal or "complex and arcane"; one can hardly disagree.
However, he continues, "[i]n fact, to some degree the grammar
is insignificant. Formalization of syntax is a simpler problem
than the formalization of semantics." My comment here is that
one method of defining the semantics is THROUGH a grammar; in
my case, Van Wijngaarden grammars, which allow one full recursively
ennumerable capabilities. A major problem with such systems is
efficiency; but techniques are becoming available which allow
the parsing of such grammars in polynomial time. The theoretical
power is available, but it seems the trick is to actually FIND such
a grammar definition for normal programming languages. This is
an area of present research.


Ray Tigg               3755 Riverside Dr.
Cognos Incorporated    Ottawa, Ontario
(613) 738-1440 x5013   CANADA  K1G 3Z4

UUCP: uunet!mitel!sce!cognos!rayt
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

kurt@tc.fluke.com (Kurt Guntheroth) (01/26/89)

A Dissenting Opinion

I view this whole argument as specious.  Why can't we build a C compiler?
Who shall say that we have not?  The compilers we build for C are as perfect
(and often more perfect) as any other artifice of man.  Do we build skyscrapers
with no mistakes?  Of course not.  There are thousands of missing rivets,
wiring that is incorrect, panels that don't quite meet.  The result is still
arguably a skyscraper, even though it is not the ideal one.  Do we not find
'bugs' in a new car that must be returned to the dealer for warranty service?
Can we find no examples of great art that nevertheless have weak passages,
awkward angles or colors that seem ill-considered?

Formal semantics come and go.  They are like programming languages.  They
capture parts of the language but ignore other parts.  Semantics themselves
are subject to incorrectness and inappropriateness.  To be understandable
and provable (and provable should appear in quotes since there is a whole
branch of philosophy dedicated to what that word means) they must be simple
and brief.  But then they compromise on the fidelity with which they represent
the real world.  Most semantics capture the complexity of a language compiler
about as completely as a simple parabola captures the motion of a baseball
through the air; very well sometimes, and very poorly other times.

Let us argue about which semantic models are most useful for generating
compilers that work as we expect.  Let us debate the merits of methods for
generating code without mistakes.  Let us not waste our breath whining about
how man can never achieve the perfect parabola of the ideal, but instead talk
about how to make our practice better.

Can we move the subject "Why can't we build a C compiler?" to talk.philosophy
and concentrate on what is important?
Kurt Guntheroth
John Fluke Mfg. Co., Inc.
{uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,ssc-vax}!fluke!kurt
kurt@tc.fluke.COM
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request