roman@gaudi.ccsf.caltech.edu (Roman Salvador) (11/01/90)
I would like to find out the time it takes (more or less) to write a compiler (i.e. a Fortran 90 one). Does anybody have any statistics or experiences?. And also, about the time for the different parts of it (i.e. lexical analysis, parsing, error checking, code generation, debugging + testing). Thanks, Roman (internet: roman@gaudi.caltech.edu) [It took me about a year part-time to write an F77 compiler based on the old Ritchie PDP-11 C compiler, but with no significant optimization beyond that needed to compile i=i+1 into one instruction. I suspect F90 would be considerably harder. By far the most work was the I/O system and all the nit picking to catch all the special cases I forgot. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
marcel@vlsi.cs.caltech.edu (Marcel van der Goot) (11/06/90)
In <1990Oct31.180632.3201@nntp-server.caltech.edu> Roman Salvador (roman@gaudi.ccsf.caltech.edu) asks > I would like to find out the time it takes (more or less) to write a > compiler (i.e. a Fortran 90 one). Does anybody have any statistics or > experiences?. During the past one-and-a-half year (well, actually almost two years already) I have been working on a C-based programming language for message-passing multi-computers, ``mcc'' (soon to be released ...). The project involved the design of the language, writing a compiler for it, and writing a thesis about it. Unfortunately, I haven't really kept track of the time the different parts took me (shame on me), so my statistics are rather imprecise, but I thought my experiences might be of some help. The following account is rather long. My apologies for that. * Introduction The language is an extension of ansi C, with new constructs for defining, creating, and connecting processes, for sending and receiving messages, and for selecting communications. C was chosen because it is relatively small (we thought) and widely used. The compiler is supposed to be rather machine-independent and produces (standard) C rather than machine-code, hence no time was spent on building machine-code generators or code-optimizers --- I assume you have to actually generate assembly- (or machine-) code. * My background when I started the project 4 years of undergraduate study in computer science in the Netherlands (which rather emphasized formal methods, compared with US schools). Included was a two-term class about compiler construction, one term of which was spent extending a simple compiler. I was familiar with lexical scanners and parser generators, but had used different ones than lex and yacc. At Caltech I had taken a one-term class about ``logic programming and compiler construction,'' which was mostly devoted to the (construction of a) compiler for ``strand'' (a parallel programming language, sort-of but not quite a logic language). I guess my specialty is concurrent programming, not compiler writing. * Prototype Coming up with the basic ideas for the new language constructs, plus writing a prototype took me about 2 months. The prototype implemented only a limited part of the language (e.g., only integers could be sent between processes), and was mostly intended to demonstrate that one could indeed compile the language reasonably well into standard C. The prototype was written using lex and yacc, with most of the program flow controlled by the code statements included in the grammar. Basically no type-checking was done, and the output was produced in a rather ad-hoc fashion. * The real compiler I then started to write the real compiler, for the whole language. I had decided to do all the parsing and type-checking etc., rather than just some pattern-matching with a cpp-type preprocessor. I had also decided to do everything from scratch, rather than take, say, gcc, and hack it. Long discussions about the wisdom of these decisions are possible. (I have no idea whether there are F90 compilers around that you could change to suit your needs, or whether you also have to do everything from scratch.) My main experiences with this: - C is much bigger than I thought. Parsing C is hard (C is somewhat context-sensitive with respect to type names), type-checking is complicated by many automatic conversions and special cases. - Most books on compiler construction don't explain enough. Lexical scanning and parsing are very well understood, everyone knows how to do it, where to start, what to be careful about, etc. But now you have to do type-checking. It sounds so trivial, and is always displayed as such, but I soon found that it isn't. With relatively large programs it is very easy to choose a data-structure and then find out a month later that you don't have enough information available to make one particular check for one particular operator. - Lack of tools. We only had lex and yacc. I quickly found that putting all the compiler code in the grammar led to inconvenient control-flow, and decided to just build a parse-tree. I then wrote my own tool for describing operations in terms of this tree (when I have time, I may one day put this tool in the public domain). However, I think it is worth quite some effort to find out about other compiler writing tools that already exist. (And, if you have enough code, writing your own tools pays off.) - Compiling to C instead of assembly didn't help as much as expected. The main reason, I think, is that the extensions all have to do with parallel programming, and therefore almost any new construct is more complicated to translate than any of the standard C constructs. Writing this compiler (including the above-mentioned tool) took me about 7.5 months. Very roughly, I'd say 2.5 months parsing, 3.5 months type-checking, 1 month for the run-time system (0.5 ?). The compiler was then used in a class about concurrent programming, which more or less served to debug it. I'm proud to say that actually very few programming errors were found. Most of them were in the run-time system, which isn't very big but involves important things like process scheduling. The run-time system is not really related to compiler construction, I don't think you need anything like it for F90 (on the other hand, you need a library). With about 6000 lines, the compiler is much bigger than anything I wrote before (considering that I wrote it all, rather than modifying existing code), and also bigger than I had expected. * A better compiler I did find some errors and short-comings in the compiler that were not easy to fix. Things like inconvenient data structures, with not quite the right information; sometimes awkward modularization (use of program-generating tools sometimes makes this worse); etc. I therefore basically rewrote the compiler, although I copied substantial parts. This version is supposed to be more or less final. It took (well, it isn't quite finished yet) me about 5 months, interwoven with about 3 months of writing my master's thesis. * Advice (If you conclude from the above that I obviously did everything completely wrong then probably this advice isn't much worth --- let me know if that's the case, though.) - Spend time doing serious planning (that's rather obvious; I knew it, but still the sheer size of the project and the program took me by surprise). - Look around for other tools than lexical scanner-generators and parser-generators. Quite likely it pays off. - You can use a prototype to get some idea of what modules, data-structures, etc. you need, but write the actual program from scratch (for heated discussions about this, write to comp.software-eng). - Be prepared to rewrite the whole program --- almost certainly you'll find at the end, after having used the compiler somewhat, that there are some things you'd like to add that don't fit in, or that parts of the compiler are needlessly complicated, etc. - Make detailed documentation, even for parts that only you are going to look at. For instance, write down exactly what code you generate for each construct, for all different situations. Without such detailed descriptions it is hard to convince yourself (or others) that each part of the program is correct, and that all the parts fit well together. - If possible, don't do it alone. Although a program this size can be written by a single person (I've been told that real programs are at least 10^5 lines --- how do they do that?), I don't think it is a good idea. It helps if there is someone with whom you can really discuss all the details of the program, rather than just general ideas. It helps to get you through slumps when someone else needs your code now. I think that with two persons the additional burden of having to cooperate shouldn't be too large, and is in fact probably beneficial. Well, that was quite a lot. I'm sorry that I cannot give a more precise account of the time spent on each of the parts. Hope it helps. Marcel van der Goot marcel@vlsi.cs.caltech.edu [When writing my F77 compiler, I found the library, particularly the I/O library, to be as much work as the compiler itself. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!esegue!compilers. Meta-mail to compilers-request.