roman@gaudi.ccsf.caltech.edu (Roman Salvador) (11/01/90)
I would like to find out the time it takes (more or less) to write a compiler (i.e. a Fortran 90 one). Does anybody have any statistics or experiences?. And also, about the time for the different parts of it (i.e. lexical analysis, parsing, error checking, code generation, debugging + testing). Thanks, Roman (internet: roman@gaudi.caltech.edu) [It took me about a year part-time to write an F77 compiler based on the old Ritchie PDP-11 C compiler, but with no significant optimization beyond that needed to compile i=i+1 into one instruction. I suspect F90 would be considerably harder. By far the most work was the I/O system and all the nit picking to catch all the special cases I forgot. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
meissner@osf.org (11/06/90)
| From: roman@gaudi.ccsf.caltech.edu (Roman Salvador) | | I would like to find out the time it takes (more or less) to write a | compiler (i.e. a Fortran 90 one). It depends on lots of different factors, such as: the skill level and experience of the person(s) doing the work; the complexity of the language proper; the complexity of the library (if provided); the tools available; doing 'just' a front end and using an existing back end, or writing the whole ball of wax; the target machine(s); the desired optimization level(s). As another data point, I wrote the front end for the Data General MV/Eclipse C compiler from scratch to plug into the common language code generator and optimizer. It took about 9 months from the start of the project until I had a reasonable compiler (and another couple of years until it was rock solid). I seem to recall the library took about 4-6 months for the first cut, but since I didn't do much of this work, and we eventually rewrote a lot of it to tie in with a UNIX emulation library developed elsewhere, I may be off on this. I had a free hand in writing the front end, since the previous person on the project had left before doing much code development and I nuked what was left. Some things that I remember about this: I had to develop the compiler in PL/1, and it took some time to learn the language. I had to relearn C, since I had last used the V6 C compiler, and the language had changed since then. I was still a new programmer (I had graduated about 1-2 years previously). The parser generator that I used had some severe restrictions in the early days, since it was originally run in 16-bit mode, and could not handle 16 levels of precedence + the rest of C, so I had to do operator precedence in PL/1 rather than in the grammar. The reference manual (K&R) had some subtle flaws in it (you couldn't derive a function which returned a pointer to a function and took some arguments from the grammar). Convincing the code generator and optimizer to add new features that C needed (notably the ?: operator). Convincing the debugger to expand the common symbol table format, to handle user defined types, and C specific items. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.
holliday@csgrad.cs.vt.edu (Glenn Holliday) (11/06/90)
An a truly wierd data point, after completing a design for a graphics - to - database compiler, the funding org forced us to do the coding in Cobol. Yes, it _is_ possible to write a compiler in any language, but it took us more than a year to get it up and running in Cobol. Glenn Holliday holliday@csgrad.cs.vt.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.
marcel@vlsi.cs.caltech.edu (Marcel van der Goot) (11/06/90)
In <1990Oct31.180632.3201@nntp-server.caltech.edu> Roman Salvador (roman@gaudi.ccsf.caltech.edu) asks > I would like to find out the time it takes (more or less) to write a > compiler (i.e. a Fortran 90 one). Does anybody have any statistics or > experiences?. During the past one-and-a-half year (well, actually almost two years already) I have been working on a C-based programming language for message-passing multi-computers, ``mcc'' (soon to be released ...). The project involved the design of the language, writing a compiler for it, and writing a thesis about it. Unfortunately, I haven't really kept track of the time the different parts took me (shame on me), so my statistics are rather imprecise, but I thought my experiences might be of some help. The following account is rather long. My apologies for that. * Introduction The language is an extension of ansi C, with new constructs for defining, creating, and connecting processes, for sending and receiving messages, and for selecting communications. C was chosen because it is relatively small (we thought) and widely used. The compiler is supposed to be rather machine-independent and produces (standard) C rather than machine-code, hence no time was spent on building machine-code generators or code-optimizers --- I assume you have to actually generate assembly- (or machine-) code. * My background when I started the project 4 years of undergraduate study in computer science in the Netherlands (which rather emphasized formal methods, compared with US schools). Included was a two-term class about compiler construction, one term of which was spent extending a simple compiler. I was familiar with lexical scanners and parser generators, but had used different ones than lex and yacc. At Caltech I had taken a one-term class about ``logic programming and compiler construction,'' which was mostly devoted to the (construction of a) compiler for ``strand'' (a parallel programming language, sort-of but not quite a logic language). I guess my specialty is concurrent programming, not compiler writing. * Prototype Coming up with the basic ideas for the new language constructs, plus writing a prototype took me about 2 months. The prototype implemented only a limited part of the language (e.g., only integers could be sent between processes), and was mostly intended to demonstrate that one could indeed compile the language reasonably well into standard C. The prototype was written using lex and yacc, with most of the program flow controlled by the code statements included in the grammar. Basically no type-checking was done, and the output was produced in a rather ad-hoc fashion. * The real compiler I then started to write the real compiler, for the whole language. I had decided to do all the parsing and type-checking etc., rather than just some pattern-matching with a cpp-type preprocessor. I had also decided to do everything from scratch, rather than take, say, gcc, and hack it. Long discussions about the wisdom of these decisions are possible. (I have no idea whether there are F90 compilers around that you could change to suit your needs, or whether you also have to do everything from scratch.) My main experiences with this: - C is much bigger than I thought. Parsing C is hard (C is somewhat context-sensitive with respect to type names), type-checking is complicated by many automatic conversions and special cases. - Most books on compiler construction don't explain enough. Lexical scanning and parsing are very well understood, everyone knows how to do it, where to start, what to be careful about, etc. But now you have to do type-checking. It sounds so trivial, and is always displayed as such, but I soon found that it isn't. With relatively large programs it is very easy to choose a data-structure and then find out a month later that you don't have enough information available to make one particular check for one particular operator. - Lack of tools. We only had lex and yacc. I quickly found that putting all the compiler code in the grammar led to inconvenient control-flow, and decided to just build a parse-tree. I then wrote my own tool for describing operations in terms of this tree (when I have time, I may one day put this tool in the public domain). However, I think it is worth quite some effort to find out about other compiler writing tools that already exist. (And, if you have enough code, writing your own tools pays off.) - Compiling to C instead of assembly didn't help as much as expected. The main reason, I think, is that the extensions all have to do with parallel programming, and therefore almost any new construct is more complicated to translate than any of the standard C constructs. Writing this compiler (including the above-mentioned tool) took me about 7.5 months. Very roughly, I'd say 2.5 months parsing, 3.5 months type-checking, 1 month for the run-time system (0.5 ?). The compiler was then used in a class about concurrent programming, which more or less served to debug it. I'm proud to say that actually very few programming errors were found. Most of them were in the run-time system, which isn't very big but involves important things like process scheduling. The run-time system is not really related to compiler construction, I don't think you need anything like it for F90 (on the other hand, you need a library). With about 6000 lines, the compiler is much bigger than anything I wrote before (considering that I wrote it all, rather than modifying existing code), and also bigger than I had expected. * A better compiler I did find some errors and short-comings in the compiler that were not easy to fix. Things like inconvenient data structures, with not quite the right information; sometimes awkward modularization (use of program-generating tools sometimes makes this worse); etc. I therefore basically rewrote the compiler, although I copied substantial parts. This version is supposed to be more or less final. It took (well, it isn't quite finished yet) me about 5 months, interwoven with about 3 months of writing my master's thesis. * Advice (If you conclude from the above that I obviously did everything completely wrong then probably this advice isn't much worth --- let me know if that's the case, though.) - Spend time doing serious planning (that's rather obvious; I knew it, but still the sheer size of the project and the program took me by surprise). - Look around for other tools than lexical scanner-generators and parser-generators. Quite likely it pays off. - You can use a prototype to get some idea of what modules, data-structures, etc. you need, but write the actual program from scratch (for heated discussions about this, write to comp.software-eng). - Be prepared to rewrite the whole program --- almost certainly you'll find at the end, after having used the compiler somewhat, that there are some things you'd like to add that don't fit in, or that parts of the compiler are needlessly complicated, etc. - Make detailed documentation, even for parts that only you are going to look at. For instance, write down exactly what code you generate for each construct, for all different situations. Without such detailed descriptions it is hard to convince yourself (or others) that each part of the program is correct, and that all the parts fit well together. - If possible, don't do it alone. Although a program this size can be written by a single person (I've been told that real programs are at least 10^5 lines --- how do they do that?), I don't think it is a good idea. It helps if there is someone with whom you can really discuss all the details of the program, rather than just general ideas. It helps to get you through slumps when someone else needs your code now. I think that with two persons the additional burden of having to cooperate shouldn't be too large, and is in fact probably beneficial. Well, that was quite a lot. I'm sorry that I cannot give a more precise account of the time spent on each of the parts. Hope it helps. Marcel van der Goot marcel@vlsi.cs.caltech.edu [When writing my F77 compiler, I found the library, particularly the I/O library, to be as much work as the compiler itself. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.
ge@sci.kun.nl (Ge' Weijers) (11/07/90)
>[When writing my F77 compiler, I found the library, particularly the I/O >library, to be as much work as the compiler itself. -John] On the subject of runtime support I'd like to point out a very interesting article on RTS for a functional language by Andrew Appel, in "A Runtime System" Andrew W. Appel Lisp and Symbolic Computation, no. 3 1990, pp. 343-380 Especially the size is impressive, < 2000 lines of C+assembler including garbage collection. I/O is done by writing the library in the functional language itself (Standard ML). -- Ge' Weijers Internet/UUCP: ge@cs.kun.nl Faculty of Mathematics and Computer Science, (uunet.uu.net!cs.kun.nl!ge) University of Nijmegen, Toernooiveld 1 6525 ED Nijmegen, the Netherlands tel. +3180652483 (UTC-2) -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.
bright@nazgul.UUCP (Walter Bright) (11/08/90)
In article <1990Oct31.180632.3201@nntp-server.caltech.edu> roman@gaudi.ccsf.caltech.edu (Roman Salvador) writes:
< I would like to find out the time it takes (more or less) to write a
<compiler (i.e. a Fortran 90 one). ...
It pretty much depends on if you are writing a research toy or plan to
sell the compiler. If you plan on selling it, triple all times you expect.
From personal experience, I started writing a C compiler in 1982. 2 years
later, it was shipped (Datalight C). I have been working on it steadilly
ever since. It will never be finished.
A global optimizer takes 1 year to write, and then 1 year to debug out in
the field. Some companies never get it to work reliably :-).
A C compiler is about twice as complex as a Pascal compiler. C++ is twice
as complex as C. Starting from a working C compiler, it takes 1 to 2 years
to add C++ capability to it.
--
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.
pohua@polaris.crhc.uiuc.edu (Pohua P. Chang) (11/09/90)
I have been involved with the IMPACT-I C compiler project in the past 3+ years. Exact due dates have not been recorded. But here is a very imprecise estimate. The frond-end, including semantic analysis, took about 4 man-months to code, but subsequently, more than another 6 months, to accommodate some program/system specifics, e.g. initializing a typedef struct. A friend of mine coded a code generator for the DECstation (MIPS-R2000) in about 6 months. He has spent much of his time on instruction-set specific optimizations, e.g. constant preloading, convert some compare-branch to compare-against-zero and to faster branches (beq, bne). Fine-tuning the register allocator was also a very difficult task, because it required us to modify the decision component of some code optimizations. So, if we add up the time to implement the front-end and a code generator, it takes about 1 man-year to establish a reasonable compiler framework. Implementing local code optimization is a simple task and may be done in just few weeks. Implementing global code optimization is quite time-consuming because of the need to build a large set of FAST dataflow analysis modules. If getting the compiler out quickly is the main concern, I would say that you skip global code optimization and go straight to loop optimizations. Given more time, spend some time on machine-dependent code optimizations. Be prepared to spend as much time on debugging as on coding the code optimizer. Debugging assembly code can be a great pain, especially after mid-night. 6 man-months is a very conservative estimate of the time that is needed for building a naive code optimizer that does not produce embarrasing code. If your project requires your compiler to produce high quality code, I would recommand that the register allocator be carefully fine-tuned. On top of a graph-coloring algorithm, reuse parameter registers, frame pointer, spill registers, and other reserved registers whenever possible. Also, take the most advantage from the caller/callee-save convention. Register resource is a limiting resource (at least in MIPS-R2000). Many code optimizations have to be tuned-down, so they don't introduce too many live-ranges. If possible, integrate the constant preloading optimization into the register allocation algorithm. In order to achieve a performance level that is close to the best commercial C compilers, such as the MIPS C compiler, many other implicit costs become apparent. For example, without a fairly powerful memory disambiguator, accesses to some global scalar variables and fields cannot be moved out of loops. For another example, general loop unrolling (non-constant loop bounds) can introduce some overhead cost, and should be applied only if the number of iterations is large enough. But, how do we get that information at compile-time? A code optimizer that is as powerful as that of the MIPS C compiler would take man-years. Other than classical code optimizations, there are some interesting features that may be useful. 1. function inline expansion. (make sure the register allocator is good enough) 2. integrate a feedback loop in the compiler, in the form of an automatic profiler. 3. selective code duplication that allows frequently executed regions to be customized. We have implemented an automatic profiler in three forms: 1. source code preprocessing, 2. high-level intermediate form, and 3. assembly code level. They are equally effective. For a person who totally understands the structure of the compiler, it takes just weeks to implement the profiler and integrate it into the compiler. However, modifying the decision components of the code optimizer to use the profile information, as well as that from loop analysis, can be tricky. The time that is required to implement function inline expansion depends on whether you want an intra-file or an inter-file expander. Implementing an inter-file function inline expansion optimization is difficult. The benefit from function inline expansion for conventional processors is not dramatic. So don't try this early in your project. Well, that's my 2 cents worth. Pohua Paul Chang -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.
beckmann@das.harvard.edu (Gary Beckmann) (11/10/90)
This should maybe go into alt.hack, but since I didn't do it myself and since I know how long it took I'm sending it to comp.compilers. This was not a production quality compiler. I introduced a friend of mine to the wonders of C a few years back. His company (whose name and location will remain secret--for obvious reasons as you read on) would not support the buying of a compiler for his ibm-pc clone. All he had was basic. So he wrote a C compiler in basic. Took him seven months working on the side. He was dissatisfied with the performance of the compiler so he then proceeded to write a basic compiler in C !! (My neck still aches from the head-shaking I did when he informed me of his plans.) That took him only three months -- he claimed that the speed was due to his familiarity with basic and the relative simple flavor of basic with which he was working. He was pleased with this "project" and decided to port it all to an Apple II which he had. This port was never completed-- I believed he failed due to the radical differences in basic. To repeat: this was NOT a production quality compiler (I think he even had some restrictions on the syntax. Hope I haven't horrified to many . . . Gary Beckmann beckmann@das.harvard.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.
Chuck_Lins.SIAC_QMAIL@gateway.qm.apple.com (Chuck Lins) (11/13/90)
I have been working on an Object Oberon compiler in my spare time. While Object Oberon is a much simplier language than Fortran90, I have kept track of the hours invested in this project. First some background and a brief status report. Background ---------- The front-end and the shell for a back-end was obtained from ETH Zurich at the cost of 1000 Swiss francs. It contained a portable scanner, lexical analyzer, parser, and abstract syntax tree builder. The front-end covers the language Oberon. Object Oberon is a superset of Oberon. The back-end shell consists of an abstract syntax tree traversal module and stubs for calls to a lower-level code generator. The job of the compiler writer is to fill in these stubs as well as adding a machine dependent low-level code generator. The compiler itself is written in Oberon. The target processor is the Motorola MC68000 family. Target hardware is the Macintosh (tm). Target programming environment is MPW (Macintosh Programmers Workshop). Status ------ I started with a copy of the MacOberon system from ETH. This is a standalone application running on the Macintosh. It contains an Oberon compiler in object form. Thus the first task was to write a cross-compiler running under MacOberon but producing object code for MPW. Once this was done, I created another copy of the compiler source code (based on the cross-compiler). This version was modified to run under MPW and produce code for MPW. The main task here was replacing calls to the Oberon environment/libraries with calls to semantically equivalent or similar MPW libraries. As of today (11/12/90) I have completed work on the cross-compiler and have a working compiler running under MPW. The compiler compiles itself under MPW. How Long it Took ---------------- To date, I have spent 399.5 hours on both the cross-compiler and the native compiler. I started on July 24, 1990. After 286 hours of work, on Oct 15 I got the native compiler to link successfully. After another 44 hours of debugging I was able to start using the native compiler to compile itself. 35 more hours of debugging and the native compiler was able to compile itself. Since then I've been using the native compiler to add enhancements to itself as well as improve its code generation. What Gave Me Problems --------------------- It took me quite a while to get the comparison and branching instructions correct. Mapping the source language operands onto the hardware instructions was very time-consuming. I remember spending quite a bit of my time in MacsBug (the machine level debugger) single-stepping machine instructions from a known 'good' point in the compiler. You had to do this when you jumped off into space somewhere. I had one nasty bug for referencing VAR parameters. Unfortunately, you'd sometimes overwrite part of the return address (or destroy a pointer) and jump into the twilight zone. The good part was that I found four other bugs while searching for this one. Another source of errors was my failure to check for special cases of variable addressing modes. Specifically things like VAR-parameters and accessing parameters at outer nesting levels. What Went Well -------------- When I did the initial (paper) design, I reduced the 68000 addressing modes to four categories: immediate operand, register value, memory operand, and condition code. Then I created a table for each AST node (eg integer ADD, logical AND, procedure CALL, etc). Each AST node has (at most) two operands and one result operand. I then made N rows covering all the operands as input and output values. I was influenced here by Horspool's ideas given in "An Alternative to the Graham-Glanville Code-Generation Method", IEEE Software, May 1987. An example table for integer addition would be as follows (where op = operator, res = result, l = left subtree, r = right subtree, R = register, O = operand, I = immediate, and CC = condition code): op res l r + R R O + R O O + R R R + R O R + R R I + R I R etc. Each row was then filled in with the assembly language source statements necessary to produce the proper result. This was where I forgot the special cases. The generic operand sometimes needed object code to produce a simplier addressing mode. It was easy to fix once I recognised the problem (even though I had all the code written for the code generator). I think using trees worked out well. It was relatively easy to generate correct code from them. It may not be the most highly optimized object code, but it's not terrible. At present, the compiler compiles the largest of its source modules (63K) in about 75 seconds. Improvements remain in the area of array indexing on parameters. My current code involves a multiply. I can take advantage of type information to eliminate most of these. This should yield a good performance improvement. Now I can concentrate on optimizations, adding the object oriented extensions, and the other capabilities necessary for a production quality compiler. Chuck Lins lins@apple.com -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.
rst@cs.hull.ac.uk (Rob Turner) (11/15/90)
Chuck Lins writes: > [stuff about Object Oberon compiler he has written] > >Now I can concentrate on optimizations, adding the object oriented >extensions, and the other capabilities necessary for a production quality >compiler. Does this mean that many of the optimizations that will eventually appear in the compiler have not been thought about yet? In my experience, most compilers have been written in this way. That is, they are designed without consideration of optimizations and/or language enhancements, and these are then tagged onto the end later, often resulting in a less then elegant end-product. Maybe if things like optimization are taken into consideration at the outset, a lot of compilers may be designed differently. This is in no way a criticism of Chuck's compiler. It is just a thought. Rob -- Robert Turner | rst@cs.hull.ac.uk Department of Computer Science University of Hull Hull HU6 7RX England -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.