[comp.lang.modula2] Translating M2 -> C

dcw@doc.ic.ac.uk (Duncan C White) (07/16/87)

I have a proposal which I would like people to consider, and comment on:
I will outline it as a series of observations, leading to the proposal:

1).	every system worth talking about has a C compiler.

2).	often, the C compiler is heavily optimised.

3).	C is often referred to as a high-level assembler.
	What people omit is the important word 'portable'.

4).	Compilers conventionally produce assembler code.

5).	Why not bring all these together, and write a portable M2 compiler
	which generates C as it's intermediate code.  This would imply no
	tedious mucking around with new backends for different processors.
	This is the approach used by C++...

So, there is the proposal...  is the translation likely to be incredibly
difficult ?  What about processes ?  Libraries ?  Any other pitfalls I should
watch out for ?  Anyone else interested in cooperating on such a project ?
[Is it completely irrevelent, or crackpot... all suggestions considered]

Personally, I would much prefer to write such a compiler in C [on the grounds
that it then becomes easier to port the compiler to other systems]

Obviously, you'd get a free M2->C translator for the one-off conversion...

	Duncan.

-----------------------------------------------------------------------------
JANET address : dcw@uk.ac.ic.doc| Snail Mail :  Duncan White,
--------------------------------|               Dept of Computing,
  This space intentionally      |               Imperial College,
  left blank......              |               180 Queen's Gate,
  (paradoxical excerpt from     |               South Kensington,
  IBM manuals)                  |               London SW7
----------------------------------------------------------------------------
Tel: UK 01-589-5111 x 4982/4991
----------------------------------------------------------------------------

crb@SUN.COM (Chuck Bilbe) (07/17/87)

In regard to Duncan White's recent note proposing a Modula-2 to C translation
scheme --- no, it isn't a crackpot scheme.  That would make me a crackpot.

I envisioned and supervised just such a project at Hewlett-Packard
Logic Systems Division in 1984.  We built the compiler by "retargeting"
the Zurich M2M compiler (written in Pascal) and driving it with a shell script
under HP-UX.

Here are some of the things we found out:

PRO
  Up and running in two (!) months.
  Object code quality not shabby at all.  C makes a fine assembly language.
  Use of C register variables provided good support for a simulated Stack Pointer
     and simulated Frame Pointer.
  "Intermediate" object code (e.g. C) transportable to virtually any machine.
  Semantic complexity of the "code generator" (c producer) is quite low.
  It was easy to arrange for inter-language ( C <==> Modula-2 ) calling.

CON
  Extremely slow (we're talking molasses in January) compilation rates.
  Virtually no support for decent high-level debugging (info lost in translation)
  Difficult, politically, to convince people it was a "real" compiler.
  Since C doesn't support nested scopes, activation records couldn't be on C stack,
    and a "parallel" stack had to be implemented as a linked list of structures.
  Since C doesn't support name conflicts at outer level, symbolic info was lost because
    names had to be hoked-up (e.g. InOut.WriteLn  might be "InOut00002WriteLn") so the
    linker wouldn't complain.
  We had some difficulty in translating function within expressions and still preserving
    the original evaluation ordering.
  We never did figure out (nor were we really interested in) a decent implementation of
    coroutines.
  We had some difficulties with the C compiler, stretching it in ways it wasn't used to.
    (overflowing name tables, too many labels, ... etc.)
  Don't expect the resulting C code to be understandable or readable by humans.  It isn't.
    Of course, that's often true of human-produced C code  (:-)  

Overall, it was worth doing;  I do not consider it a way to achieve a production 
compiler but not a bad way to bootstrap a real one.

-- Chuck Bilbe

lmjm@doc.ic.ac.UK (07/17/87)

   Date: 16 Jul 87 16:31:05 GMT
   From: Duncan C White <eagle!icdoc!dcw@ucbvax.berkeley.edu>
   Organization: Dept. of Computing, Imperial College, London, UK.

   1).	every system worth talking about has a C compiler.
True, true.

   2).	often, the C compiler is heavily optimised.
Not quite so true - but getting better - eg: GCC.

   3).	C is often referred to as a high-level assembler.
	   What people omit is the important word 'portable'.
Hmmmmmmmm

   4).	Compilers conventionally produce assembler code.
Most Unix compilers do anyway.

   5).	Why not bring all these together, and write a portable M2 compiler
	   which generates C as it's intermediate code.  This would imply no
	   tedious mucking around with new backends for different processors.
	   This is the approach used by C++...

When I last had to port a M2 compiler to a new machine I thought about
this.  The recent messge from Chuck Bilbe covers the problems I was
able to think of.  I instead chose a slightly different route.

Most Unix machines (not all) support a two pass C compiler commonly
called PCC (Portable C Compiler).  The second pass is the code
generator and is also used by a pascal and fortran compiler.  I took
the ETH 4 pass compiler and converted its 4th pass to output a file
suitable for PCC's code generator to use.  This means the M2 compiler is
pretty portable - so long as you have PCC on your target machine - but
dodges many of the problems going to C.  Since the machine it was
targeted for was blindingly fast I never bothered with optimisation
but I get some for free by using the standard C optimiser (which works
on the code generated by PCC).  Given the structure of the ETH
compiler adding in simple lifetime analysis is relatively straight
forward.

This is a simplified description but should give you the gist of it.
Obviously this approach is not as generally useful as Duncan's.  But
lets face it, there are a lot of Unix boxes out there now.

A idea I was toying with would be to do something similar to the above
scheme except going to the GNU optimising C compiler as this is likely
to become available on a wide range of machines.

	Lee.

--
UKUUCP SUPPORT  Lee McLoughlin
	"What you once thought was only a nightmare is now a reality!"

Janet: lmjm@uk.ac.ic.doc, lmcl@uk.ac.ukc
DARPA: lmjm@doc.ic.ac.uk (or lmjm%uk.ac.ic.doc@cs.ucl.ac.uk)
Uucp:  lmjm@icdoc.UUCP, ukc!icdoc!lmjm

steve@hubcap.UUCP (Steve ) (07/17/87)

In article <8707171628.AA02971@odysseus.sun.com>, crb@SUN.COM (Chuck Bilbe) writes:
> In regard to Duncan White's recent note proposing a Modula-2 to C translation
> scheme --- no, it isn't a crackpot scheme.  That would make me a crackpot.

I had two graduate students do a modula to C preprocessor in the spring of
'86.  Our experiences were much the same as Chuck described.  It was a great
exercise for the students (Two very good ones).


Steve                                      steve@hubcap.clemson.edu
(aka D. E. Stevenson),                     dsteven@clemson.csnet
Department of Computer Science,            (803)656-5880.mabell
Clemson Univeristy, Clemson, SC 29634-1906

bpendlet@esunix.UUCP (Bob Pendleton) (07/20/87)

in article <482@ivax.doc.ic.ac.uk>, dcw@doc.ic.ac.uk (Duncan C White) says:
> 
> 4).	Compilers conventionally produce assembler code.
> 
This statement is only true in the UNIX(TM) world. It still just plain blows
me away when I encounter people who have never used a compiler that didn't
generate assembly code. My experience, measurments made on assemblers I've
written and used, is that 40 to 60 percent of assembly time is lexical 
analysis. Formatting and writing assembly code in a compiler can add 10 or
more percent to compilation times, in non optimizing compilers it can be 25%.

Generating a linkable file directly is a major performance win. Why does the
UNIX world tolerate such slow compilation?

What Duncan suggests will work. But it will be sloooow.
> 
> 	Duncan.
> 
-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,ihnp4,allegra}!decwrl!esunix!bpendlet
Alternate:     {ihnp4,seismo}!utah-cs!utah-gr!uplherc!esunix!bpendlet
        I am solely responsible for what I say.

abbas@CORWIN.CCS.NORTHEASTERN.EDU (07/20/87)

I saw your note regarding the M2 to C translator.  This is to inform you, that one
of my student has developed such a system and we are putting the final touches 
to it.

There will be also a technical report on that discussing the issues and how  we resolved them.

For example how would you unnest modules and procedures?!!

If you are interested I will send you a copy of the Tech. report when it is
ready ( very soon).  The system runs on Macintosh and translates to LightspeedC,
the design enables us to get translation to other flavors of C very easily without
modifying the code at all!

--Abbas Birjandi

US Mail: College of Computer Science
	 Northeastern University
	 360 Huntington Ave,
	 Boston MA, 02115
	tel:(617) 964-3077

ronald@csuchico.UUCP (Ronald Cole) (08/07/87)

In article <482@ivax.doc.ic.ac.uk>, dcw@doc.ic.ac.uk (Duncan C White) writes:
> 
> I have a proposal which I would like people to consider, and comment on:
> I will outline it as a series of observations, leading to the proposal:
  ...
> 5).	Why not bring all these together, and write a portable M2 compiler
> 	which generates C as it's intermediate code.

Duncan,

	I have been working on this project in my spare time for the last
year now.  My compiler is a four pass compiler implemented using yacc and
lex in C.  Translating Modula-2 to C is a lot easier than translating C
to Modula-2.  I am generating a very portable C as per J.E. Lapin's new
portable C book.  If you are interested in more information send me email.

							Ron

-- 
Ronald Cole				| uucp:     ihnp4!csun!csuchic!ronald
AT&T 3B5 System Administrator		| PhoneNet: ronald@csuchico.edu
@ the #_1_ party school in the nation:	| voice     (916) 895-4635
California State University, Chico	"It's O.K." -Hal Landon Jr., Eraserhead