[comp.os.minix] Minix needs a C compiler with source

greg@ait.trl.oz (Greg Aumann) (02/19/90)

It is becoming more and more obvoius that Minix needs a C compiler with
source that can be distributed without restrictions and modified
easily.  Problems caused by the current ACK compiler are that it is
difficult to get bugs fixed.  There is little hope of seeing desirable
extensions such as ANSI conformance etc.  Also you cannot look at the
source and learn about compilers.  The original intention of using
minix to teach OS courses or for self study could apply equally well to
compilers if the source were available and readable.  Note when I write
compiler I also mean to include an assembler and a loader.

The ideal minix compiler would be small so that it would fit in 64k
(probably with multiple passes),  it would have the same front end for
the ST and the PC, be distributed on the minix disks in source form (or
at least on the net in source form), and it would be easy to modify to
add say floating point and ANSI conforamce for a start and to port to
new architectures.

The ACK compiler kit is out as it is much too big and expensive.  The
source generated by the kit is also of little or no use as it is
unmodifiable.  Gcc is also too big (although this may not apply in a
few more years).

It seems that there are really two good candidates.  One is for some
minix fiend to write one and release it.  This may the intention of
Bruce Evans who has released binaries for a PC compiler that he is
working on.  A second alternative is the Sozoban C compiler.  I don't
know a great deal about it but my understanding is that generates code
for the 68000 is fairly portable and it fits the above criteria.

Would someone who knows more about the Sozoban compiler please comment
on its suitability for minix and how easy it would be to modify the
backend to generate code for the 8086 and 80386 etc.  Also could Bruce
please comment on what he intends for his compiler.

Personally I think the best solution would be to find an existing
compiler that is close to what we want and modify it.  This is because
writing a compiler from scratch is a very large task and shouldn't be
underestimated.  Would anyone who know of other possibly suitable
compilers please also comment.

This article is an invitation for comments.  Hopefully it will end in a
consensus of how to get the sort of compiler that we want and need for 
minix.

Greg Aumann
-------------------------------------------------------------------------
Artificial Intelligence Systems         ACSnet:    greg@trlamct.trl.oz
Telecom Research Laboratories           Internet:  greg@trlamct.trl.oz.au
Melbourne, AUSTRALIA                    Voice:     +61 3 541 6222
-------------------------------------------------------------------------
Artificial Intelligence Systems         ACSnet:    greg@trlamct.trl.oz
Telecom Research Laboratories           Internet:  greg@trlamct.trl.oz.au
Melbourne, AUSTRALIA                    Voice:     +61 3 541 6222

croes@fwi.uva.nl (Felix A. Croes) (02/19/90)

In article <1050@trlluna.trl.oz> greg@ait.trl.oz (Greg Aumann) writes:
>It is becoming more and more obvoius that Minix needs a C compiler with
>source that can be distributed without restrictions and modified
>easily.  Problems caused by the current ACK compiler are that it is
>difficult to get bugs fixed.  There is little hope of seeing desirable
>extensions such as ANSI conformance etc.  Also you cannot look at the
>source and learn about compilers.  The original intention of using
>minix to teach OS courses or for self study could apply equally well to
>compilers if the source were available and readable.  Note when I write
>compiler I also mean to include an assembler and a loader.
The current compiler will be replaced by an ANSI C compiler in 2.0 - again, not
in source. On all other points, I agree.

[description of ideal Minix compiler deleted]
The ideal Minix compiler would be public domain (of course).

>The ACK compiler kit is out as it is much too big and expensive.  The
>source generated by the kit is also of little or no use as it is
>unmodifiable.  Gcc is also too big (although this may not apply in a
>few more years).
Gcc is too big, period. The ACK idea is fine, when trimmed down to what it
really is all about: using EM as an intermediate language.

[description of possible candidates deleted]
>
>Personally I think the best solution would be to find an existing
>compiler that is close to what we want and modify it.  This is because
>writing a compiler from scratch is a very large task and shouldn't be
>underestimated.  Would anyone who know of other possibly suitable
>compilers please also comment.
It seems to me that writing a compiler is never underestimated by anyone in this
newsgroup, rather the reverse applies. I propose the following: step by step
replace the existing ACK compiler by a public domain version.
A friend of mine is presently working on a ANSI C front end. Another friend is
working on a 68000 code generator. I have already written a loader for Minix ST
(shouldn't be too difficult to port it to the PC, once asld is split in a loader
and an assembler), and I an thinking about writing a C++ front end.

Comments?

--

Felix Croes    (croes@fwi.uva.nl)

HBO043%DJUKFA11.BITNET@CUNYVM.CUNY.EDU (Christoph van Wuellen) (02/19/90)

As to compilers with source:
The sozobon compiler/optimizer/assembler/linker is freely available, I
have tested it and found only two errors (but: they were catastrophic when
hosting the compiler on a Sun386i workstation).
BUT: I see little chance to hack a INTEL 8088 version from it.
(and: I wont bet it fits into 64K).

2.) I have written an 68000 compiler derived from some raw Material I got
by email (original Author: M. Brandt). It now handles signed and unsigned
char/short/long and float/double, but double being a synonym for float.
I have compiled MINIX with it successfully during a MINIX port I've
completed now, but I feel there are some problems left. In a few weeks
I will send the compiler to the referees. It does everything in core,
avoiding intermediate files, it is optimizing and maps frequently occuring
expressions on registers, thus yielding 1008 (Version 2.1) dhrystones
with or without the register attribute.

3.) I agree, we should have the source code of our compilers

4.) Perhaps the ACK code generators are not the best, implementing
    virtual stack machines on register CPU's

/Christoph van Wuellen

DN5@psuvm.psu.edu (02/19/90)

There was a book recently released called (I believe) _Compiler Construction
in C_.  It contains source for a YACC clone, a LEX clone, and a C compiler.
As this is a book for a compiler course, perhaps the author would be
agreeable to having his compiler ported over to Minix?

Note: I have not seen the book, only saw mention of it in comp.compilers
and people there seemed to be impressed with it.  As soon as I can afford
it, I plan to get a copy.

                       D. Jay Newman
                       dn5@psuvm.psu.edu

stailey@iris613.gsfc.nasa.gov (Ken Stailey) (02/19/90)

In article <429@fwi.uva.nl> croes@fwi.uva.nl (Felix A. Croes) writes:
>In article <1050@trlluna.trl.oz> greg@ait.trl.oz (Greg Aumann) writes:
>The current compiler will be replaced by an ANSI C compiler in 2.0 - again, not
>in source. On all other points, I agree.
>
Will the new compiler be available for ST MINIX too?
>

INET stailey@iris613.gsfc.nasa.gov
UUCP {backbone}!dftsrv!iris613!stailey

ZZASSGL%cms.manchester-computing-centre.ac.uk@nsfnet-relay.ac.uk (02/20/90)

 
Prehaps a first step would be to port one of the many Small C
compilers onto Minix. OK, you would not be able to compile Minix but
at least it gives everyone a base to work from.
 
Geoff.
UTS Sys Admin
mcc

nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (02/20/90)

I'm going to batch some comments regarding this thread.

|In article <429@fwi.uva.nl> croes@fwi.uva.nl (Felix A. Croes) writes:
|In article <1050@trlluna.trl.oz> greg@ait.trl.oz (Greg Aumann) writes:
|>It is becoming more and more obvoius that Minix needs a C compiler with
|>source that can be distributed without restrictions and modified
|>easily.  Problems caused by the current ACK compiler are that it is
|>difficult to get bugs fixed.  There is little hope of seeing desirable
|>extensions such as ANSI conformance etc.  Also you cannot look at the
|>source and learn about compilers.  The original intention of using
|>minix to teach OS courses or for self study could apply equally well to
|>compilers if the source were available and readable.  Note when I write
|>compiler I also mean to include an assembler and a loader.
|
|The current compiler will be replaced by an ANSI C compiler in 2.0 - again, not
|in source. On all other points, I agree.
|
|[description of ideal Minix compiler deleted]
|The ideal Minix compiler would be public domain (of course).

Here's another vote in favour of all of the above.  As for the ANSIness of the
2.0 compiler, that is a secondary consideration.  The big problem with the 
Minix compilers is that source is not really available and that there is no
real facility for bug fixes.  I have always received polite responses to my
bug reports from Andy and/or Ceriel; almost invariably, the bugs have been
reported previously by others and are fixed in the next release.  BUT the next
release is just too far away, much too far away.  I have almost resorted to 
cross-compilation under DOS, but have resisted so far.  I know that many others
have simply given up on the ACK compiler.

|Gcc is too big, period. The ACK idea is fine, when trimmed down to what it
|really is all about: using EM as an intermediate language.

Agreed.

|...
|A friend of mine is presently working on a ANSI C front end. Another friend is
|working on a 68000 code generator. I have already written a loader for Minix ST
|(shouldn't be too difficult to port it to the PC, once asld is split in a loader
|and an assembler), and I an thinking about writing a C++ front end.
|...
|Felix Croes    (croes@fwi.uva.nl)

Fine ideas.  I believe that the 2.0 cc will have separate assembler and loader,
after which I think the process becomes much simpler.  Felix's loader would be
ported fairly easily (at least it looked that way to me when I saw it).  An
assembler, while by no means trivial in a general sense (e.g. if you want useful
macros), isn't hard once you have an object file format to translate to.  And
once we have that, a code generator that maps EM code to assembly cannot be too
far behind (aren't all you 80386 owners using Minix tired of the inability to
use 32 bit facilities?).  As for cpp/cem, I would happily leave them as is, not
wanting personally to get involved in all that grotty stuff.

I can easily imagine Minix cc (PC version) being almost entirely replaced in
short (and reverse) order.

In article <11528@nigel.udel.EDU> INFO-MINIX@UDEL.EDU writes:
|As to compilers with source:
|The sozobon compiler/optimizer/assembler/linker is freely available, I
|have tested it and found only two errors (but: they were catastrophic when
|hosting the compiler on a Sun386i workstation).
|BUT: I see little chance to hack a INTEL 8088 version from it.
|(and: I wont bet it fits into 64K).

Having looked at the Sozobon compiler myself, I have to say that I see
little hope as well.  I had hopes of using it, but the code generation and
register allocation, nominally cleanly separated in the code, are actually
spread throughout the compiler.  There also seems to be heavy dependence on
the large number of registers and orthogonality of the 680x0 instruction set
(not that I think those are bad things - but 80x86 CPU's don't qualify in
either regard).

The Sozobon compiler is also not ANSI, and making it ANSI looks like a lot of
work.

With regard to the size, that's not really a problem.  I couldn't get PC Minix
cc to compile the Sozobon compiler, but there is no problem compiling it under
DOS (using Microsoft C) and the total executable size is only ~90K (with all the
debugging code included).  Quite a creditable piece of work.

|2.) I have written an 68000 compiler derived from some raw Material I got
|by email (original Author: M. Brandt). It now handles signed and unsigned
|char/short/long and float/double, but double being a synonym for float.
|I have compiled MINIX with it successfully during a MINIX port I've
|completed now, but I feel there are some problems left. In a few weeks
|I will send the compiler to the referees. It does everything in core,
|avoiding intermediate files, it is optimizing and maps frequently occuring
|expressions on registers, thus yielding 1008 (Version 2.1) dhrystones
|with or without the register attribute.

Interesting.  How hard would it be to add code generation for the 80x86 family?

Norbert

R21014%UQAM.bitnet@ugw.utcs.utoronto.ca (Luc Dupuy) (02/20/90)

On Mon, 19 Feb 90 13:24:22 GMT <DN5@PSUVM.PSU.EDU> said:
>There was a book recently released called (I believe) _Compiler Construction
>in C_.  It contains source for a YACC clone, a LEX clone, and a C compiler.
>As this is a book for a compiler course, perhaps the author would be
>agreeable to having his compiler ported over to Minix?
>
>Note: I have not seen the book, only saw mention of it in comp.compilers
>and people there seemed to be impressed with it.  As soon as I can afford
>it, I plan to get a copy.
>
>                       D. Jay Newman
>                       dn5@psuvm.psu.edu


Could you give a more precise reference to the book :
Author
Editor
Publisher
etc.

I would appreciate, thank you very much

Salutations amicales,
luc dupuy
Centre d'analyse de textes par ordinateur
universite du quebec a montreal
r21014@uqam.bitnet

ghelmer@dsuvax.uucp (Guy Helmer) (02/20/90)

In article <24304@princeton.Princeton.EDU>, nfs@notecnirp.Princeton.EDU (Norbert Schlenker) writes:
> I'm going to batch some comments regarding this thread.
> ....
> 
> |2.) I have written an 68000 compiler derived from some raw Material I got
> |by email (original Author: M. Brandt). It now handles signed and unsigned
> |char/short/long and float/double, but double being a synonym for float.
> |I have compiled MINIX with it successfully during a MINIX port I've
> |completed now, but I feel there are some problems left. In a few weeks
> |I will send the compiler to the referees. It does everything in core,
> |avoiding intermediate files, it is optimizing and maps frequently occuring
> |expressions on registers, thus yielding 1008 (Version 2.1) dhrystones
> |with or without the register attribute.
> 
> Interesting.  How hard would it be to add code generation for the 80x86
> family?
> 
> Norbert

It would be a good challenge.  The code generator is well separated from the
rest of the compiler.  I think it would be tough to get really hot code,
but with the 80x86 register set it's always been hard to get hot code out
of a compiler.  I'm waiting for the compiler to come through the referees
list, and as soon as it does, I'll merge the changes into my copy and
get an Intel code generator in it.
-- 
Guy Helmer                              ...!uunet!loft386!dsuvax!ghelmer
Dakota State University Computing Services           helmer@sdnet.bitnet
Software Engineering: "'How to program if you cannot.'" - Dijkstra

HBO043%DJUKFA11.BITNET@cunyvm.cuny.edu (Christoph van Wuellen) (02/21/90)

(about how difficult it would be to add 80x86 code generation for my
compiler)

I dislike Intel processors very much, so I won't do anything on it.
I have put some assumptions in the expression parser that longs are more
or less compatible with pointers (when doing pointer additions, the ints
are casted to long, the longs are multiplied and the result is supposed
to be an acceptable offset for a pointer).

My personal opinion is: Forget about 8088,8086,80286 and use 80386 in
32-bit mode.

P.S. I would like to send the compiler to the referees and have to uuencode
the compressed tar file. How many compression bits may I use (13?).

/Christoph van Wuellen, Bochum, Germany.

HBO043%DJUKFA11.BITNET@cunyvm.cuny.edu (Christoph van Wuellen) (02/21/90)

To those who write a new code generator (68000) for the ACK compiler:

Recently I tuned a code generator for an 68000 C-compiler and so I expect
a leap of 150..200 Dhrystones/sec from the following single change:

- There is frequently the situation of multiplying two longs which were
  cast from short. The discussion on the speedup by changin _mli.s makes
  clear how important this pattern is (e.g. with pointer additions)
  I found it trivial to map this on a 68000 muls instruction, avoiding
  a library call (there is a similar pattern for the mulu instruction).

This change let my compiler jump from 850 to 1000 dhrystones, an optimal
structure assignment strategy let it jump from 700 to 850 dhrystones (but
structure assignments are not that important outside of dhrystone).

So only in the case the guy hacking on a new code generator could miss this
will he (she?) take care of muls instructions?

(The code generator is the only place where to do it since it is the only
part that knows about CPU instructions)
/Christoph van Wuellen

hbetel@watserv1.waterloo.edu (Heather) (02/23/90)

In article <11550@nigel.udel.EDU> ZZASSGL%cms.manchester-computing-centre.ac.uk@nsfnet-relay.ac.uk writes:
>
> 
>Prehaps a first step would be to port one of the many Small C
>compilers onto Minix. OK, you would not be able to compile Minix but
>at least it gives everyone a base to work from.

 The problem with that is that it seems to me that one of the worst parts of
a compiler is its lexical box, and that is the fundamental difference between
an ANSI C and a small C compiler.  This is a pretty messy part to change. It
tends to involve very large and complex finite state machines. (read "not nice
to change, esp. when you didn't write it in the first place") The difference in
code generators should be fairly minor, so by porting a small-C we have 
not won much.
 The other point in my mind may be closeminded, but I think that if you can't
make a compiler of your own, then you probably can't do too good a job of 
heavily modifying someone elses. Then again, I can see how one might say the
same of operating systems...

evans@ditsyda.oz (Bruce Evans) (02/25/90)

In article <24304@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert Schlenker) writes:
>Here's another vote in favour of all of the above.  As for the ANSIness of the
>2.0 compiler, that is a secondary consideration.  The big problem with the 

Leave the ANSI compilers to the big boys, or use gcc. It is a lot of work to
write a compiler, and much harder when a detailed standard has to be met.

>Minix compilers is that source is not really available and that there is no
>real facility for bug fixes.  I have always received polite responses to my
>bug reports from Andy and/or Ceriel; almost invariably, the bugs have been
>reported previously by others and are fixed in the next release.  BUT the next

Gcc is impressive:) in this respect as in others. There have been about 4
versions in the last year and you can read a 180K list of changes for bugs
and improvements over that period. I doubt gcc had relatively more bugs
than ACK a year ago.

>release is just too far away, much too far away.  I have almost resorted to 
>cross-compilation under DOS, but have resisted so far.  I know that many others
>have simply given up on the ACK compiler.

Felix Croes writes:
>|Gcc is too big, period. The ACK idea is fine, when trimmed down to what it
>|really is all about: using EM as an intermediate language.

The small compilers don't seem to use much technology (lex, yacc, rtl or
intermediate languages). This keeps them small at the expense of flexibility
and time to write. Mine avoids intermediate steps for speed too.
---

Here is a short review of various worthwhile "free" compilers that I know
about. I run a 32-bit Minix system on a 386. I wrote a compiler for this
(bcc) and use it for most things. I use gcc for code with ANSI C, bitfields
or floating point, and to get better error reports and (rarely) faster
binaries.

Most of the free compilers are for the 68000. I ran them but could not test
the output. Quite likely I did not set them up to best advantage.

PDC (68000)
-----------
/* PDC Compiler - A Freely Distributable C Compiler for the Amiga
 *                Based upon prior work by Matthew Brandt and Jeff Lydiatt.
 *
 * PDC Compiler 3.30 Copyright (C) 1989 Paul Petersen and Lionel Hummel.
 * PDC Software Distribution (C) 1989 Lionel Hummel and Paul Petersen.
 *
 * This code is freely redistributable upon the conditions that this 
 * notice remains intact and that modified versions of this file not be 
 * distributed as part of the PDC Software Distribution without the express
 * consent of the copyright holders.
 */
 
This appears tp have the same base as Cristoph van Wuellen's compile. I
got it in a zoo package with a lot of other stuff: (output from du)
 
30	PDC/Bind
65	PDC/CCX
44	PDC/Dasm
154	PDC/LibSrc/Math
16	PDC/LibSrc/Misc
18	PDC/LibSrc/Startup
46	PDC/LibSrc/StdIO
42	PDC/LibSrc/StdLib
62	PDC/LibSrc/StringLib
63	PDC/LibSrc/SysIO
417	PDC/LibSrc
26	PDC/Libr
69	PDC/Make
1	PDC/PDC/manx_include
492	PDC/PDC
148	PDC/bin
1300	PDC

PDC/bin contains a compiler (PDC) (only). This contains a preprocessor
but not a compiler driver. It was compiled with bcc after a little editing.
 
Sozobon (68000)
---------------
/* Copyright (c) 1988 by Sozobon, Limited.  Author: Johann Ruegg
 *
 * Permission is granted to anyone to use this software for any purpose
 * on any computer system, and to redistribute it freely, with the
 * following restrictions:
 * 1) No charge may be made other than reasonable charges for reproduction.
 * 2) Modified versions must be clearly marked as such.
 * 3) The authors are not responsible for any harmful consequences
 *    of using this software, even if they result from defects in it.
 */
 
I have an incomplete version from comp.sources.atari.st. 
 
208	soz/hcc
136	soz/top
139	soz/bin
486	soz

The bin directory contains a compiler (hcc) and a peephole optimizer (top).
These were compiled a while ago; I forget how.

gcc (alliant, convex, i386, i860, m68k, m88k, mips, nsc32k, pyr, sparc,
     spur, tahoe, vax)
----------------------
Gcc was written by Richard Stallman and many elves. The copyright is
readily available and too big to include here.

1770	gcc/config
8309	gcc

This includes about 1MB of objects. It is a really good compiler, but too
big to run on an 80286 or worse.

bcc (6809, 8086, 80386)
-----------------------
This is not exactly free (yet). Binaries are free.

24	as/work
17	as/bin
17	as/obj
12	as/6809
413	as

2	ld/6809
93	ld

56	sc/.examples
2	sc/6809
549	sc

The sizes include objects and some junk. This once ran and compiled itself
(in 4 minutes) in 40K text+data and 16K heap+stack on a 6809. It should be
easily portable to a 68000 at the expense of poor code generation (1 data
register. The 80*86 code is harmed surprisingly little by this).
---

To see how much space these take, I compiled kernel/tty.c - the biggest
program in the kernel. I reduced the stack allocation for everything
to find the minimum.

  text	  data	   bss	 stack
127852	 13248	  5200	160000	PDC
 81128	  3688	  2652	225000	hcc	(sozobon)
498384	  7336	 18224	270000	cc1	(main pass of gcc)
 50316	  2072	 13484	170000	cpp	(preprocessing pass of gcc)
 64656	  5472	 11700	135000	sc	(mine)
 58864	  5100	  6832	 30000	sc	(16-bit, needs separate cpp to fit)

Everything except gcc was compiled with bcc, so these size would shrink
20% with a better compiler or one making more space optimizations. Gcc
was compiled with itself and suffers a 10% size penalty from my assembler
not being able to determine branch lengths.

Compile times and word counts for the output (tty.s) (with no optimization)
were

 real	user+sys lines  words chars
 15     13.89	 2774   5319  40168   PDC
  9      6.46	 2383   4641  33872   hcc	(might have -O)
 16      9.11	 2279   5809  39672   cpp+cc1
  3:-)   2.41	 3121   5831  34140   sc	(mine)
  3      1.64	                      sc	(16-bit, on preprocessed file)
  8      5.22                         cpp	(ACK cpp pass for 16-bit sc)
 27	 22.34                        Minix cc

Differences in the disk cache size and state make the real times untrustworthy.
-- 
Bruce Evans		evans@ditsyda.oz.au