eric@snark.uu.net (Eric S. Raymond) (05/07/90)
Yes, you probably thought it would never happen. But here it is, in all its hideous glory...C-INTERCAL! That's right. A real, live INTERCAL compiler written in portable C. Took me a bit more than two days. See, it generates C...I let cc do the *hard* part. I caught an error in the INTERCAL manual's sample program with this compiler (missing #). It even has an optimizer that does constant folding. For your enlightenment and edification, here are three of the doc files in my release: READ.ME: -------------------------------------------------------------------------- C-INTERCAL (v 0.1) This package is an implementation of the language INTERCAL designed by Don Woods and James Lyon, who have since spent most of twenty years trying to live it down. The implementation was created by Eric S. Raymond (...!uunet!snark!eric) during a fit of lunacy from which he has since mostly recovered. The files included are: READ.ME -- this file intercal.man -- The INTERCAL manual (read this next!) THEORY -- some notes on the internals of the INTERCAL compiler BUGS -- notes pertaining to this release Makefile -- makefile for the INTERCAL compiler lexer.l -- the lexical analyzer specification (in LEX) ick.y -- the grammar specification (in YACC) ick.h -- compilation types and defines feh.c -- INTERCAL-to-C code generator fiddle.c -- the INTERCAL operators lose.[ch] -- INTERCAL compile- and run-time error handling ick-wrapper.c -- the driver for generated C-from-INTERCAL code cesspool.c -- the INTERCAL runtime support code cesspool.h -- interface fr. generated code to the INTERCAL runtime support sample.i -- a sample INTERCAL program (from the manual) You want a man page? Man pages are for wimps. To compile an INTERCAL program `foo.i' to executable code, just do ick foo.i There's a -d option that leaves the generated `foo.c' in place for inspection (suppressing compilation to machine code), and an -O option that enables the (hah!) optimizer. Other than that, yer on yer own. Report bugs, if you absolutely must, to the author. Or post them to alt.intercal.bugs. Or something. -------------------------------------------------------------------------- THEORY: -------------------------------------------------------------------------- INTERCAL IMPLEMENTOR'S NOTES This C-INTERCAL compiler is a very conventional implementation using YACC and LEX. Each line of INTERCAL is translated into a C if()-then; the guard part is used to implement abstentions and RESUMES, and the arm part translates the `body' of the corresponding INTERCAL statement. The generated C code is plugged into the template file ick-wrapper.c inside main(). It needs to be linked with cesspool.o, fiddle.o and lose.o. Cesspool.o is the code that implements the storage manager; fiddle.o implements the INTERCAL operators; and lose.o is the code that generates INTERCAL's error messages. The abstain[] array in the generated C is used to track line and label abstentions; if member i is on, the statement on line i is being abstained from. Labels are mapped to line numbers in the code checker, just before optimization. The gerund variables at the top of the template file are used to implement gerund abstentions (ABSTAIN FROM CALCULATING etc). The guard part of each generated C statement checks the appropriate variable. RESUMES are implemented by setting skipto to the location at which execution is to resume, then jumping to the top of the program and skipping all the guard/arm pairs before the proper NEXT. The parser builds an array of tuples, one for each INTERCAL statement. Most tuples have node trees attached. Once all tuples have been generated, the compile-time checker and optimizer phases can do consistency checks and expression-tree rewrites. Finally, the tuples are ground out as C code by the emit() function. The optimizer does constant folding for all five operators. It also checks for the idiom for `test for nonzeroness'. Calculations are fully type-checked at compile time; they have to be because (as I read the manual) the 16- and 32-bit versions of the unary ops do different things. The only potential problem here is that the typechecker has to assume that :m ~ :n has the type of :n (32-bit) even though the result might fit in 16 bits. At run-time everything is calculated in 32 bits. When INTERCAL-72 was designed 32 bits was expensive; now it's cheap. Really, the only reason for retaining a 16-bit type at all is for the irritation value of it (yes, C-INTERCAL *does* enforce the 16-bit limit on constants). -------------------------------------------------------------------------- BUGS: -------------------------------------------------------------------------- NEW FEATURES IN C-INTERCAL 1. As a convenience to all you junior birdmen out there, `NINER' is accepted as a synonym for `NINE' in INTERCAL input. 2. The COME FROM statement is now compiled. You may write PLEASE COME FROM (n) and the effect will be that whenever execution reaches statement label n, it will immediately transfer control to the statement following the COME FROM. Conditional COME FROM is possible; ABSTAIN FROM COMING FROM and REINSTATE COMING FROM are both valid. DON'T COME FROM is a no-op (until reinstated). Finally; NEXTING to the label `target' of an un-abstained COME FROM *will* cause control to be transferred to the statement following the COME FROM. BUGS 1) INTERCAL would be intrinsically a crock even if it worked right. 2) The INTERCAL-72 * syntax is not implemented, because I couldn't figure out how the frotz it's supposed to work. This means C-INTERCAL can't compile the INTERCAL-72 system library. Isn't compatibility wonderful? 3) Error-checking could be improved. Not all the errors listed in the manual are actually detected (of those listed in lose.h, E123, E621, E632, E579, E436, E017, E275, E182, E129, E139, and E778 are implemented). In this respect C-INTERCAL follows nobly in the tradition of many production compilers. 4) Some of the runtime library is stubbed out. I'll let someone *else* implement the "butchered Roman numerals" and the fleepin' dynamic array handling. TO DO 1. Test this loser on something other than the sample program. 2. Add more optimization templates, esp. the idioms for &, |, ^, ~. 3. Switch for output in "clockface" mode, for superstitious users who believe writing "IV" upside-down offends IVPITER and would rather see IIII. 4. Input format internationalization -- allow WRITE IN input digits in major languages such as Nahuatl, Tagalog, Sanskrit, and Basque. 5. Forget this @!%$#! crock and take a long vacation. -------------------------------------------------------------------------- I will release this puppy (and entirely wash my hands of it) once I've cleared up a few technical questions with the language's originators. -- Eric S. Raymond = ...!uunet!snark!eric (mad mastermind of TMN-Netnews)