[comp.lang.misc] Cheap implementations of languages

peter@ficc.uu.net (Peter da Silva) (04/09/90)

In article <14312@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> Non-segmented byte-addressible machine don't exhibit the failing that you
> give as the reason for limiting pointer values to only lie within the
> allocated object!  Apparently, you are willing to reinterpret the context
> of this discussion to fit your preconceptions any time you feel like it.

Nah, I'm not *in* this discussion. I'm just sitting on the side pointing out
things you guys miss. Like, the fact that C can run at all in such a minimal
implementation has proven to be one of it's strengths. How many PD Fortran
compilers are there?

> You imply that many implementations actually
> allocate an extra element at the end for the 'dispensation' on pointer
> values at one past the end.

No, that was someone else. But, yes, many low-end implementations on machines
like the IBM-PC have done so. And the IBM-PC is probably the major C user base
right now... and it's a segmented byte-addressible machine. C doesn't fit so
well into it, but it's easy enough to make it look like a flat address space,
subject to some limitations, so long as you allocate those extra bytes for
breathing room.

> Are you saying that this implementation should be regarded as anything but
> poor?

No. But it's also cheap. And look at what it's got to work with.

> By the way, pascal doesn't _have_ pointer arithmetic.  So this whole issue
> doesn't arise there.

Pascal is also such a limited language that it's heavily fragmanted.
Portability between compilers... let along machines... is practically non-
existent. Because to do much of anything interesting in it you need to
use extensions. You don't write in "Pascal". You write in Turbo-pascal, or
Oregon pascal/2, or UCSD Pascal, or JRT Pascal, or Alice, or whatever...
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

jlg@lambda.UUCP (Jim Giles) (04/10/90)

From article <YAT2FR7ggpc2@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> [...]                 Like, the fact that C can run at all in such a minimal
> implementation has proven to be one of it's strengths. How many PD Fortran
> compilers are there?

I've heard this old saw for years now.  But, strangely, not often from
a compiler writer (and, on such rare occasions, said compiler can't
supply any specifics).  The fact is that C is not simpler to implement
than the other procedural languages in general use.  In some ways it
is harder to implement.  Since you pick Fortran as the point of comparison,
I will look at both.

I)  The Compiler Frontend

  a)  Lexical scanning and parsing.

      Fortran requires extra work in the lexical scanner to distinguish
      keywords in some contexts.  This is because blanks are not significant
      and are optional.  The Fortran lexer must be able to tell Do10I=5.10
      from Do 10 I=5,10.  The method for doing this is simple and widely
      known, but still, you'd have to say that C was easier to scan and
      parse.

      Note that the two languages are equally easy to parse from all
      other points of view.  Both are LR(1) and both have LALR or SLR
      grammers which can be pressed into service (usually by arbitrary
      elimination of shift-reduce conflicts by hand).

  b)  Symbol table manipulation.

      In Fortran, there are only two types of symbol: local and global.
      The local symbols are divided into the usual intrinsic data types,
      and the global symbols are either external modules or are common
      blocks.  The symbol table in Fortran compiling is built from
      scratch for each module compiled and is a simple object.

      C (like Pascal) has a nested scoping mechanism.  Each object
      declares _some_ local symbols and inherits any defined in an
      outer scope that do not conflict with the new local names.
      This requires C compilers to maintain a somewhat more complicated
      symbol table that Fortran requires.  Once again, the method for
      doing this is simple and widely known, but still, you'd have to
      say that Fortran was easier to implement in this regard.

This is pretty much all the difference (in difficulty) to implement
the compiler frontend for the two languages.  The two languages are
identical feature-for-feature to the rest of the compiler.  That is,
features which both languages have are equally hard to implement (and
might be done identically in the case of shared backend compiler
developements).  So, the remaining differences have to do only with
the differences in features.

II) Feature differences.

  a) Pointer call by reference vs. Fortran parameter passing.

     C allows parameter passing by reference through pointers.
     The usual Fortran implementation is to pass all args by
     reference (although all by value/result is also allowed
     by the standard).  In C, references are explicitly pointers
     and are allowed to be aliased.  In Fortran, the references
     are implicit and aliasing is NOT permitted.  This means
     that C procedures contain a lot of spurious dependencies
     in the intermediate code that Fortran intermediate code
     doesn't have.  This makes C much harder to optimize as
     well.

     Now, since this discussion is about simplicity of implementation,
     there may be NO optimization performed and so, no difference in
     implementation difficulty.  But, even a simple compiler may want
     to do SOME optimization - so this gives a slight not toward
     Fortran as simpler.

  b) Structs, Unions, dynamic memory, pointers, recursive procedures ....

     C has em, Fortran doesn't.  Fortran is easier to implement
     on all of these counts.

Slight reflection will suffice to find even more things is category
IIb.  In fact, from an unbiased examination, Fortran - not C- looks
to be the simpler to implement.  Actually, Fortran is among the
simplest to implement of all procedural languages.

This is not to say that Fortran is better off without IIb features
or nested scoping.  In fact Fortran _should_ have structs, unions,
dynamic memory, and recursive procedures (to name a few).  But the
lack of these features _does_ make Fortran easier to implement.

Note: you actually picked the worst example for your argument.  If
you had picked Pascal or Modula as your comparison, I would have
had to conclude that the implementations were about the same in
term of difficulty.  But the fact is, I've just been through this
same argument over the net with someone who claimed Fortran 90 was
too big a language to ever be considered.  I pointed out that the
new standard will have a _subset_ of the features in C.  They are
somewhat better designed and more consistently expressed - but there
are no new Fortran features which C doesn't already have a version
of.  (In fact, Fortran 90 makes better use of many features:  for
example, both C and Fortran 90 have function prototypes and the
attendant implementation difficulties.  But the Fortran 90 standard
makes use of these to provide generic functions which C doesn't have.
Both language have the same implementation overhead but Fortran gets
more out of it.)

J. Giles

diamond@tkou02.enet.dec.com (diamond@tkovoa) (04/10/90)

In article <YAT2FR7ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:

>I'm not *in* this discussion. I'm just sitting on the side pointing out
>things you guys miss.

Gee, everyone can play this game.

>How many PD Fortran compilers are there?

Hundreds.  Once upon a time, software had no legal protection.
And those compilers ran on machines a lot smaller* than PCs, too.
Anyone want to port some of them?  If you can find manuals for
the machine languages they were implemented in, that is.

*well, not smaller in physical size, but....

-- 
Norman Diamond, Nihon DEC     diamond@tkou02.enet.dec.com
This_blank_intentionally_left_underlined________________________________________

peter@ficc.uu.net (Peter da Silva) (04/10/90)

> > [...]                 Like, the fact that C can run at all in such a minimal
> > implementation has proven to be one of it's strengths. How many PD Fortran
> > compilers are there?

> I've heard this old saw for years now.  But, strangely, not often from
> a compiler writer (and, on such rare occasions, said compiler can't
> supply any specifics).

Well, I took my compiler construction course from the man who wrote the UNIX
fortran-77 compiler. He spent some time discussing the drawbacks of
Fortran to a compiler writer. I subsequently spent many pleasurable hours
playing with a public domain C compiler. I have never seen a public domain
Fortran compiler.

If Fortran is so easy, where are the compilers for PCs? There are several
times as many C compilers available.

C is not just a "little" easier to parse, it's a LOT easier to parse. And
if you're just going to do a dumb code-generator the parser and symbol
table are about the only things to worry about.

You also brushed aside lots of problems with the Fortran compiler. How
about equivalences? How about commons? How about compiling the good old
FORMAT statement (you can't just push it off to a subroutine, because you
need to have smarts in there to handle stuff like "WRITE (6,*) ...")? How
do you *prevent* arrays from being aliased (answer, you don't... not for a
dumb compiler)? How about ...

Fortran's completely ad-hoc nature really doesn't make it much fun.

Tell you what... you write a PD Fortran compiler and we can compare it
in complexity with a PD C compiler and see who wins.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

mph@lion.inmos.co.uk (Mike Harrison) (04/10/90)

In article <14317@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>...                    The fact is that C is not simpler to implement
>than the other procedural languages in general use.  In some ways it
>is harder to implement.  Since you pick Fortran as the point of comparison,
>I will look at both.
>[Lots of detail deleted]

While I think that your comparison is fair, you have missed out some tricky 
bits of Fortran, which *might* change the perspective.
For example:

  - DATA statements, with implied DO, which can be quite complex when you 
    have nested cases.

  - EQUIVALENCE statements, which (given overlapping EQUIVALENCEd arrays) can 
    set up quite complex constraint problems.

As you say, these are all well understood etc..., but they DO make Fortran a
bit more complex than you implied.

Mike,



Michael P. Harrison - Software Group - Inmos Ltd. UK.
-----------------------------------------------------------
UK : mph@inmos.co.uk             with STANDARD_DISCLAIMERS;
US : mph@inmos.com               use  STANDARD_DISCLAIMERS;

mjs@hpfcso.HP.COM (Marc Sabatella) (04/10/90)

Normally, even when I don't agree with Jim Giles (which is to say,
most of the time) he at least makes sense.  This time however,...

>II) Feature differences.
>
>  a) Pointer call by reference vs. Fortran parameter passing.
>
>  b) Structs, Unions, dynamic memory, pointers, recursive procedures

What about

c) math intrinsics and built-in I/O (including implied do-loops)

d) equivalence

e) run-time dimensionable arrays

f) entry statements



Then there are the curious two statements

>but there
>are no new Fortran features which C doesn't already have a version
>of.

and

>the Fortran 90 standard
>makes use of these to provide generic functions which C doesn't have.

Well, the first statement is just plain wrong, and the second statement
is a counterexample.  Fortran 90 a subset of C?  (Barternder, give me one of
whatever he's having!)  Or did we also forget about array slices and the
bizarre control constructs which can be used to manipulate them?

--------------
Marc Sabatella (marc@hpmonk.fc.hp.com)
Disclaimers:
	2 + 2 = 3, for suitably small values of 2
	Bill and Dave may not always agree with me

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <5486@ganymede.inmos.co.uk>, by mph@lion.inmos.co.uk (Mike Harrison):
> [...]
> While I think that your comparison is fair, you have missed out some tricky 
> bits of Fortran, which *might* change the perspective.
> For example:
>   - DATA statements, with implied DO, which can be quite complex when you 
>     have nested cases.

Why should these be any harder to implement than nested run-time loops?
Actually, I've implemented a parser for do-loop style data initialization
for an I/O library.  It was not particularly difficult.

> [...]
>   - EQUIVALENCE statements, which (given overlapping EQUIVALENCEd arrays) can 
>     set up quite complex constraint problems.

Equivalence is about as difficult (or easy) to implement as C's untagged
unions are.  At least with respect to constraint problems, the two features
are nearly identical.  After all, C also has to anticipate that arrays
might be aliased in an overlapping fashion through the union construct.

When comparing languages, one must have an eye for what features are
really correspondent.  For the most part, procedural languages tend
to all have pretty much the same features.  From a human perspective
these language features may _seem_ vastly different in form and
convenience.  From the compiler's perspective, they are all just
about the same.

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <15U2GTBggpc2@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> [...]
> Well, I took my compiler construction course from the man who wrote the UNIX
> fortran-77 compiler. He spent some time discussing the drawbacks of
> Fortran to a compiler writer. I subsequently spent many pleasurable hours
> playing with a public domain C compiler. I have never seen a public domain
> Fortran compiler.

It is completely irrelevant who tought your compiler construction course.
Unless you can give _explicit_ features that are hard to implement in
Fortran but are easy to implement in C, your argument is vacuous heresay.

As for puiblic domain software - its worth about as much as you pay for
it.  There are few Fortran compilers in the public domain for PC's (I've
seen two), because Fortran users are a demanding lot.  They require
performance.  Only now do most of them feel that PC's are _starting_
to become _real_ machines.  Up to now, the main market for PC's has
been word processing and/or spreadsheet-business applications - neither
one really in the class of problems that Fortran users are involved in.

> [...]
> If Fortran is so easy, where are the compilers for PCs? There are several
> times as many C compilers available.

That's the other major application for PC's - experimental and educational
use.  It is no coincidence that the the current fad language among comp-
sci types is the most common on the inexpensive hardware.  Up till about
3 years ago, the leader was Pascal (the previous fad language).  It's
interesting that in terms of language design, Pascal and C fall on the
spectrum on opposite sides of Fortran - fads seem to seek the extremes.

In light of the fact that PC's are not taken seriously by the Fortran
community and the applications of PC's are not very Fortran-ish, it's
surprising that there as many Fortran compilers as there are.  It can't
be THAT big a market.

> [...]
> C is not just a "little" easier to parse, it's a LOT easier to parse.

Oh?  Both have LR(1) grammars.  The grammars are of about the same
complexity.  They both take about the same ammount of time to shove
through the automatic parser generator.  So what's the problem?

Oh, Fortran's lexer is about half a page longer.  Big deal.

> [...]                                                          And
> if you're just going to do a dumb code-generator the parser and symbol
> table are about the only things to worry about.

In which case, C is definitely harder since implementing a multilevel
symbol table for C's nested scope rules MORE than offsets the lexer
problem with Fortran.

> [...]
> You also brushed aside lots of problems with the Fortran compiler. How
> about equivalences?

Semantically, the same problems are presented by C's unions.

> [...]               How about commons?

What about them?  All data within common blocks are locally declared
in every procedure.  The _only_ differrence is that common references
are relocated relative to a different block than static local variables
are.

> [...]                                   How about compiling the good old
> FORMAT statement (you can't just push it off to a subroutine, because you
> need to have smarts in there to handle stuff like "WRITE (6,*) ...")?

I usually parse it by calling the parse routine in the Fortran I/O
library.  The compiler only needs to know whether the format had an
error or not.  The compiler _could_ be built with the format grammar
and on corresponding action routines.  Besides, it's not particularly
harder (or even different) than readf/printf formats in C.  If your
claim is that C is simple to implement, then the I/O support library
must be something you claim is easy too.

> [...]                                                              How
> do you *prevent* arrays from being aliased (answer, you don't... not for a
> dumb compiler)? How about ...

Right, a dumb compiler lets aliased arrays be used unsafely - just like
a C compiler.  The difference is that it's illegal in Fortran, it's
just dangerous in C.  On a dumb compiler, there's no optimization anyway,
so the Fortran aliased code breaks _exactly_ where the C code does.  Of
course, if your compiler _does_ do optimization, Fortran will run more
efficiently because Fortran is willing to let illegal code break.

> [...]
> Fortran's completely ad-hoc nature really doesn't make it much fun.

And C's completely ad-hoc nature _does_ make it fun?  Remember, C is a
language which started out with obvious syntax ambiguities (ie. A=-5;).
And now you're trying to tell me that it's _NOT_ ad-hoc?  Give me a
break!

> [...]
> Tell you what... you write a PD Fortran compiler and we can compare it
> in complexity with a PD C compiler and see who wins.

No bet.  I'm writing a compiler (but not for Fortran).  I'm not going
to make my compiler public domain (I intend to get paid for my work -
or go bust and be unsuccessful :-).  In terms of code speed, the Fortran
would probably win.  In terms of compiler complexity, they'd be about
the same since any compiler I write will be based on a flexible compiler
"skeleton" which could be used for most languages (Fortran, C, Pascal,
etc.).  The reason for this is that language design is a favorite
study of mine - I want any compiler I write to be flexible enough
to try out new features easily.

In terms of popularity, the C compiler would win - until it's
fad runs out.  But, even then, the Fortran wouldn't be popular, some
other fad language will take the spotlight.  Languages are seldom
popular for their technical merits (which is why I'm perfectly willing
to consider the possibility that my Nemesis compiler will not be
successful).  If languages _were_ popular because of merit, I can
think of a _lot_ of languages (some _very_ old) that deserve the
spotlight more that C or Fortran (or Pascal, Modula, Ada, etc.).

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <8960014@hpfcso.HP.COM>, by mjs@hpfcso.HP.COM (Marc Sabatella):
> [...]
>>II) Feature differences.
> [...]
> What about
> c) math intrinsics and built-in I/O (including implied do-loops)

This is a discussion of implementation difficulty.  Math intrinsics CAN
be implemented as procedure calls -just as C does.  Built-in I/O _IS_
implemented with procedure calls on every system I've ever worked with
- just as C does.  Implied do-loops are a red herring in this discussion:
they are simply do loops that are completely simulated at compile time.
What's so hard about that?

> [...]
> d) equivalence

Already discussed in other postings.  In C, can you say 'union'?  I thought
so.

> [...]
> e) run-time dimensionable arrays

Again, in the context of simplicity of implementation, these are no harder
than statically declared arrays.  The dimension values are just variables
instead of constants.  Now, in an _optimizing_ compiler, these dynamic
array bounds limit constant folding to an extent (the bounds aren't constant).

> [...]
> f) entry statements

Again, not hard to implement in a _simple_ compiler.  It is the same
implementation as the normal procedure entry but with a jump around the
entry sequence code.

> [...]
> Then there are the curious two statements
> 
>>but there
>>are no new Fortran features which C doesn't already have a version
>>of.
> 
> and
> 
>>the Fortran 90 standard
>>makes use of these to provide generic functions which C doesn't have.
> 
> Well, the first statement is just plain wrong, and the second statement
> is a counterexample.  Fortran 90 a subset of C?

Am I the only one in this discussion who pays attention to the _context_
of the conversation.  In terms of implementation, the _new_ features
are similar to features already found in ANSI C.  Generic functions
require the same _implementation_ effort that function protocols do.
So, you have to add argument type tags to the symbol table entries
of generic functions, and the loader has to obey these type tags.
You have to do the same thing with function protocols.  The difference
is that the loader for Fortran 90 keeps looking if the type tags don't
match, the C loader issues a fatal error message.  The same difference
applies to matching function _references_ from within the code.

> [...]                   Or did we also forget about array slices and the
> bizarre control constructs which can be used to manipulate them?

No, I didn't forget.  The effort to turn all these into the equivalent
simple loops is trivial.  Such a dumb implementation would not be widely
approved (being slow).  But, we're talking about how difficult the feature
is to implement _simply_.

J. Giles

seanf@sco.COM (Sean Fagan) (04/11/90)

In article <14317@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>The fact is that C is not simpler to implement
>than the other procedural languages in general use.  In some ways it
>is harder to implement.  

I beg to differ.  One of the reasons C is easier to deal with is the lack of
the various keywords and semantic "gotchas" that other languages have.

>  a)  Lexical scanning and parsing.

Not a problem, really, for either language.  In fact, for any sane language
(i.e., anything other than PL/I 8-)), this isn't too difficult.

>  b)  Symbol table manipulation.

Same thing.  This is pretty easy.  Once you differentiate between local and
global, going the extra step for nested levels is fairly simple.

>     In C, references are explicitly pointers
>     and are allowed to be aliased.  In Fortran, the references
>     are implicit and aliasing is NOT permitted.  This means
>     that C procedures contain a lot of spurious dependencies
>     in the intermediate code that Fortran intermediate code
>     doesn't have.

This does not make the implementation any more difficult; it makes
implementing it *well* a bit more difficult.  But, then, the kinds of
optimization needed to improve the code are fairly easy.

>  b) Structs, Unions, dynamic memory, pointers, recursive procedures ....
>
>     C has em, Fortran doesn't.  Fortran is easier to implement
>     on all of these counts.

Really?  What happened to EQUIVALENCEs?  Sounds like unions to me.  Pretty
disgusting, actually.  COMMON blocks are also disgusting.

struct's and union's are very easy to implement:  you just reuse your symbol
table stuff, mentioned above, and voila!  Dynamic memory is a feature of the
pointers (the compiler most likely knows nothing about malloc et al).
I don't understand your point about recursive procedures.  There is no
difference, on most modern machines, between a function calling itself and a
function calling another procedure.  In FORTRAN *or* C.  The major
exceptions tend to be machines designed by Seymour Cray, but that argument
really belongs in comp.arch 8-).

There are a lot of strange things in fortran.  Wish I had any of my books on
it so I could write some code to really confuse people, but, off the top of
my head, I can think of one:  the various parameters to open or close.
Unix's model is *much* cleaner, and it's pretty ugly.

Implementing a simple C compiler, which still is largely ANSI compliant
really is not that difficult.  Say, a semester project for an undergrad.
Implementing a FORTRAN compiler which is also ANSI compliant is considerably
more difficult (note that FORTRAN doesn't make a distinction between
language and library, so you have to write all of the routines, while you
don't in C), call it 2-4 for the same student.

-- 
-----------------+
Sean Eric Fagan  | "It's a pity the universe doesn't use [a] segmented 
seanf@sco.COM    |  architecture with a protected mode."
uunet!sco!seanf  |         -- Rich Cook, _Wizard's Bane_
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

mph@lion.inmos.co.uk (Mike Harrison) (04/11/90)

In article <14323@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>From article <5486@ganymede.inmos.co.uk>, by mph@lion.inmos.co.uk (Mike Harrison):
>> ...
>>   - DATA statements, with implied DO, which can be quite complex when you 
>>     have nested cases.
>
>Why should these be any harder to implement than nested run-time loops?
>Actually, I've implemented a parser for do-loop style data initialization
>for an I/O library.  It was not particularly difficult.
I should have given a bit more context here.
I recently did some work on this for the Transputer, which exposed two 
connected problems:

  - the way our loader works, we don't usually load initialised data areas
    direct, rather we load code to contruct the initialised area,

  - an implied DO type DATA statement such as:

      DATA ((X(J,I), I = 1, J), J = 1, 5) / 1 * 0, 2 * 1, 3 * 2, 4 * 3, 5 * 4  /
    
    initialises a triangular segment of the array thus:

      0
      1 1
      2 2 2
      3 3 3 3
      4 4 4 4 4

Now it clearly easy to write down good code to do this, but what if we change
the statement to:

      DATA ((X(J,I), I = 1, J), J = 1, 5) / 5 * 0, 4 * 1, 3 * 2, 2 * 3, 1 * 4 /
    
so that the initialisation is:

      0
      0 0
      0 0 1
      1 1 1 2
      2 2 3 3 4

then the generation of 'good' code is much harder. (And that example is not a
*particularly* bad one - it is easy to write difficult cases).

The real problem is analysing the statement to find the common (sensible) cases
for which you should generate good code.

> ...
>When comparing languages, one must have an eye for what features are
>really correspondent.  For the most part, procedural languages tend
>to all have pretty much the same features.  
It's often the way such features may be combined in the language (together with
interactions with the target architecture) which makes things more difficult in
some cases than others.

On the more general topic of 'pointers v arrays' (or even Fortran v C), I agree
with most of what you have said. 
The problem is not pointers 'per se', it is:
    - how pointer 'values' can be generated (ie. what you can 'point to'),
    - how pointer 'values' can be manipulated.

Thus, 'typed' pointer values (cf. Ada 'access type' or Algol68 'refs') with
restricted operations are not (usually) dangerous.
(and in languages which incorporated the 'recursive combinator' or 'letrec'
style declarations, explicit pointers are unnecessary).

Mike,


Michael P. Harrison - Software Group - Inmos Ltd. UK.
-----------------------------------------------------------
UK : mph@inmos.co.uk             with STANDARD_DISCLAIMERS;
US : mph@inmos.com               use  STANDARD_DISCLAIMERS;

peter@ficc.uu.net (Peter da Silva) (04/11/90)

In article <14323@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> Equivalence is about as difficult (or easy) to implement as C's untagged
> unions are.

Not so. There is no requirement I'm aware of that unions actually overlap.
It's legal to effectively "#define union struct".

Also, even if you do have overlapping unions, C doesn't require the
implementation to overlap them in any particular way. Fortran does.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

meissner@osf.org (Michael Meissner) (04/11/90)

In article <14334@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:

| From article <8960014@hpfcso.HP.COM>, by mjs@hpfcso.HP.COM (Marc Sabatella):
| > [...]
| >>II) Feature differences.
| > [...]
| > What about
| > c) math intrinsics and built-in I/O (including implied do-loops)
| 
| This is a discussion of implementation difficulty.  Math intrinsics CAN
| be implemented as procedure calls -just as C does.  Built-in I/O _IS_
| implemented with procedure calls on every system I've ever worked with
| - just as C does.  Implied do-loops are a red herring in this discussion:
| they are simply do loops that are completely simulated at compile time.
| What's so hard about that?
| 
| > [...]
| > d) equivalence
| 
| Already discussed in other postings.  In C, can you say 'union'?  I thought
| so.

NO.  Fortran equivalences are much harder to implement than unions.
Unions always start off at offset 0 from the start of the union.  In
FORTRAN, equivalences require much more support within the compiler,
in order to support things like equivalencing A[1] to B[100] and
equivalencing C to A[20].  Check out section 7.9 in the Red Dragon
book for more details.

| > [...]
| > e) run-time dimensionable arrays
| 
| Again, in the context of simplicity of implementation, these are no harder
| than statically declared arrays.  The dimension values are just variables
| instead of constants.  Now, in an _optimizing_ compiler, these dynamic
| array bounds limit constant folding to an extent (the bounds aren't constant).

Again no.  To implement arrays with static bounds, you only need to
keep the bounds (as integer constants) within the symbol table.  For
full run-time dimensionable arrays, you must keep around an indication
that the array is dynamic, and where the bound is stored.  You must
expand this bound expression within the tree as you are evaluating
arrays and such.  PL/I is much harder in this respect, since it allows
variable sized arrays within structures, and all members following the
array must be appropriately offset.

--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so

meissner@osf.org (Michael Meissner) (04/11/90)

In article <5610@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes:

| In article <14317@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
| >The fact is that C is not simpler to implement
| >than the other procedural languages in general use.  In some ways it
| >is harder to implement.  
| 
| I beg to differ.  One of the reasons C is easier to deal with is the lack of
| the various keywords and semantic "gotchas" that other languages have.
| 
| >  a)  Lexical scanning and parsing.
| 
| Not a problem, really, for either language.  In fact, for any sane language
| (i.e., anything other than PL/I 8-)), this isn't too difficult.

No Fortran 77 is much harder to lex, and somewhat easier to parse.  In
Fortran, spaces outside of hollerith constants and strings are
immaterial, which means that you must have a smart lexer, and can't
use the normal lex-type tools.  The classic case is:

	DO 10 I=1.15		/* Assign 1.15 to var DO10I */

		vs.

	DO 10 I=1,15		/* Loop 15 times */

In C the lexer must be a little smart, because it has to know about
typedef names and return a different token.  C++ is somewhat more
difficult than C on this regard.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so

jlg@lambda.UUCP (Jim Giles) (04/12/90)

From article <MEISSNER.90Apr11112529@curley.osf.org>, by meissner@osf.org (Michael Meissner):
> [...]
> Again no.  To implement arrays with static bounds, you only need to
> keep the bounds (as integer constants) within the symbol table.  For
> full run-time dimensionable arrays, you must keep around an indication
> that the array is dynamic, and where the bound is stored.  You must
> expand this bound expression within the tree as you are evaluating
> arrays and such.

Hmm.  That's how I had in mind doing ALL array references.  I expect the
constant folding optimization to find out later that the bounds of
a static array are fixed.  I don't like to do things twice.

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/12/90)

From article <MEISSNER.90Apr11112947@curley.osf.org>, by meissner@osf.org (Michael Meissner):
> [...]
> No Fortran 77 is much harder to lex, and somewhat easier to parse.  In
> Fortran, spaces outside of hollerith constants and strings are
> immaterial, which means that you must have a smart lexer, and can't
> use the normal lex-type tools.  The classic case is:
> 	DO 10 I=1.15		/* Assign 1.15 to var DO10I */
> 		vs.
> 	DO 10 I=1,15		/* Loop 15 times */


Hmm. Seems like I mentioned this very problem in my original 'cheap
implementation' article.  No, I take that back: I think my loop had
some other bound than 15.

If this is the extent to which Fortran is 'harder' than C, then I
claim that there's no important differences at all.  Lexers (even
ad-hoc ones) are _easy_.


J. Giles

pcg@aber-cs.UUCP (Piercarlo Grandi) (04/13/90)

In article <15U2GTBggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:

  I have never seen a public domain Fortran compiler. [ ...]  If Fortran is
  so easy, where are the compilers for PCs? There are several times as many
  C compilers available. [ ... ] Tell you what... you write a PD Fortran
  compiler and we can compare it in complexity with a PD C compiler and see
  who wins.

I hate to say it, but for the sake of exactness, I understand that there is
a good quality fortran to C compiler/translator (f2c), just like p2c is a
good quality pascal/modula 2 to C translator. I also think that the MIPS
Fortran compiler is actually a fortran to C translator (but of course it is
proprietary).

On the more general subject of the thread: I have been amused by this
discussion of Fortran vs. C optimization; first of all C was not born as an
application language, but as a system implementation language (MOHLL).  As
such attempts to convert/misrepresent it into a safe, application oriented
good-for-all tool are just funny. Fortran is the C of numerical calculations; as far
as these are concerned, it is more application oriented, but always ata
pretty low level.

To me, having a low level program in C or Fortran, an optimizer that tries
to understand it and decompile it into an higher level form, apply program
transformations to the high level form, and then compile it to machine
language, is just a laughable waste of time and talent. Were it not for the
'dusty deck' problem, which means lots of dollars. For example, if you have
a classic a matrix multiply 3 nested loop, on a vector machine the most
efficient nesting order of such loops is different from a non vector machine.
You want to say 'matrix multiply', not give one possible implementation, and
then have the optimizer "understand" it is a matrix multiply, and then
generate the code corresponding to a more efficient (for that machine)
equivalent version.

As a research endeavour, or for points of principle, there do exist higher
level, application oriented languages, and surely others could be designed,
that provide an optimizer with the opportunity to devise intelligent
strategies for efficient execution, without the decompilation phase, which is
always fraught with hazards and inefficiencies.

My position on this is:

1) If you want to delegate optimization to a program, have a language
designed for this. Decompilation is hazardous. Code generation is already
difficult enough.

2) If you want to turn dusty decks into higher level, more easily optimizable
language, use a programmer's assistant, source rewriting tools, whatever.
Don't confuse a code reorganizer with a compiler's code generator.

3) Leave alone low level languages like C which have been designed to give
programmer control over the generated code, where the programmer knows the
compiler will be a faithful, dumb, *obedient* servant of the source code.

Discussing whether C or Fortran can more easily be decompiled into an higher
level more application oriented language is IMNHO very silly, especially if
the discussion centers around such microscopic issues as to whether element
access to vectors should be via pointers or subscripts, when there is almost
never the need to access each element one by one to express numeric codes
(oh APL!).
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

budd@mist.cs.orst.edu (Tim Budd) (04/14/90)

I have been following this discussion only loosely, but Peter Grandi's
comment on APL permits me an opening to climb on my soapbox.

I have been for several years convinced that APL should be a wonderful
language for parallel processors, particularly SIMD machines but maybe also
MIMD machines (see my article in TOPLAS in 1984).  I am convinced that 
the cost of parallel hardware is coming down, and the accessibility of
such hardware to ``normal'' programmers will increase dramatically in the
next few years, and yet acceptance of such hardware will be slow as long as
programmers need to think about parallelism.  That is, programmers have
enough to think about as it is, and if they need to provide explicit
parallel directives in their language then only the committed will do it.
Ergo we need languages that have lots of parallelism implicit, such as APL.

The catch - making a compiler for a high level language such as APL you can 
never get all the efficiency of a low level language, such as C or Assembly.
Currently the people using parallel hardware are very committed to getting
the last iota of speed out of their machines, and if you say something like
``I can write my program in APL and spend a tenth the time in writing, but
it executes only half as fast as your C version'' you're going to get
laughed out of the room.  I feel like I'm in the 1950's again with the
assembly/HLL debate.  I'm waiting for ten years to go by, so I can say ``I
told you so''.  In the meantime, I'll keep plugging away at my algorithms
for APL and parallel hardware.
--tim budd, budd@cs.orst.edu