[comp.misc] C-INTERCAL progress report

eric@snark.uu.net (Eric S. Raymond) (05/07/90)

Yes, you probably thought it would never happen. But here it is, in all its
hideous glory...C-INTERCAL!

That's right. A real, live INTERCAL compiler written in portable C. Took me
a bit more than two days. See, it generates C...I let cc do the *hard* part.

I caught an error in the INTERCAL manual's sample program with this compiler
(missing #). It even has an optimizer that does constant folding.

For your enlightenment and edification, here are three of the doc files in
my release:

READ.ME:
--------------------------------------------------------------------------
			C-INTERCAL (v 0.1)

This package is an implementation of the language INTERCAL designed by Don
Woods and James Lyon, who have since spent most of twenty years trying to
live it down.

The implementation was created by Eric S. Raymond (...!uunet!snark!eric)
during a fit of lunacy from which he has since mostly recovered. The files
included are:

READ.ME		-- this file
intercal.man	-- The INTERCAL manual (read this next!)
THEORY		-- some notes on the internals of the INTERCAL compiler
BUGS		-- notes pertaining to this release

Makefile	-- makefile for the INTERCAL compiler
lexer.l		-- the lexical analyzer specification (in LEX)
ick.y		-- the grammar specification (in YACC)
ick.h		-- compilation types and defines
feh.c		-- INTERCAL-to-C code generator
fiddle.c	-- the INTERCAL operators
lose.[ch]	-- INTERCAL compile- and run-time error handling
ick-wrapper.c	-- the driver for generated C-from-INTERCAL code
cesspool.c	-- the INTERCAL runtime support code
cesspool.h	-- interface fr. generated code to the INTERCAL runtime support

sample.i	-- a sample INTERCAL program (from the manual)

You want a man page? Man pages are for wimps. To compile an INTERCAL
program `foo.i' to executable code, just do

	ick foo.i

There's a -d option that leaves the generated `foo.c' in place for
inspection (suppressing compilation to machine code), and an -O option
that enables the (hah!) optimizer. Other than that, yer on yer own.

Report bugs, if you absolutely must, to the author. Or post them to
alt.intercal.bugs. Or something.
--------------------------------------------------------------------------
THEORY:
--------------------------------------------------------------------------
		INTERCAL IMPLEMENTOR'S NOTES

This C-INTERCAL compiler is a very conventional implementation using YACC and
LEX. Each line of INTERCAL is translated into a C if()-then; the guard part
is used to implement abstentions and RESUMES, and the arm part translates the
`body' of the corresponding INTERCAL statement.

The generated C code is plugged into the template file ick-wrapper.c inside
main(). It needs to be linked with cesspool.o, fiddle.o and lose.o. Cesspool.o
is the code that implements the storage manager; fiddle.o implements the
INTERCAL operators; and lose.o is the code that generates INTERCAL's error
messages.

The abstain[] array in the generated C is used to track line and label
abstentions; if member i is on, the statement on line i is being abstained
from. Labels are mapped to line numbers in the code checker, just before
optimization.

The gerund variables at the top of the template file are used to implement
gerund abstentions (ABSTAIN FROM CALCULATING etc). The guard part of each
generated C statement checks the appropriate variable.

RESUMES are implemented  by setting skipto to the location at which execution
is to resume, then jumping to the top of the program and skipping all the
guard/arm pairs before the proper NEXT.

The parser builds an array of tuples, one for each INTERCAL statement. Most
tuples have node trees attached. Once all tuples have been generated,
the compile-time checker and optimizer phases can do consistency checks
and expression-tree rewrites. Finally, the tuples are ground out as C code
by the emit() function.

The optimizer does constant folding for all five operators. It also checks
for the idiom for `test for nonzeroness'.

Calculations are fully type-checked at compile time; they have to be because
(as I read the manual) the 16- and 32-bit versions of the unary ops do
different things. The only potential problem here is that the typechecker
has to assume that :m ~ :n has the type of :n (32-bit) even though the
result might fit in 16 bits. At run-time everything is calculated in 32
bits. When INTERCAL-72 was designed 32 bits was expensive; now it's cheap.
Really, the only reason for retaining a 16-bit type at all is for the
irritation value of it (yes, C-INTERCAL *does* enforce the 16-bit limit
on constants).

--------------------------------------------------------------------------
BUGS:
--------------------------------------------------------------------------
		NEW FEATURES IN C-INTERCAL

1. As a convenience to all you junior birdmen out there, `NINER' is accepted as
   a synonym for `NINE' in INTERCAL input.

2. The COME FROM statement is now compiled. You may write

   PLEASE COME FROM (n) 

   and the effect will be that whenever execution reaches statement label n,
   it will immediately transfer control to the statement following the
   COME FROM. Conditional COME FROM is possible; ABSTAIN FROM COMING FROM and
   REINSTATE COMING FROM are both valid. DON'T COME FROM is a no-op (until
   reinstated). Finally; NEXTING to the label `target' of an un-abstained
   COME FROM *will* cause control to be transferred to the statement following
   the COME FROM.

			BUGS

1) INTERCAL would be intrinsically a crock even if it worked right.

2) The INTERCAL-72 * syntax is not implemented, because I couldn't figure out
   how the frotz it's supposed to work. This means C-INTERCAL can't
   compile the INTERCAL-72 system library. Isn't compatibility wonderful?

3) Error-checking could be improved. Not all the errors listed in the
   manual are actually detected (of those listed in lose.h, E123, E621, E632,
   E579, E436, E017, E275, E182, E129, E139, and E778 are implemented). In this
   respect C-INTERCAL follows nobly in the tradition of many production
   compilers.

4) Some of the runtime library is stubbed out. I'll let someone *else*
   implement the "butchered Roman numerals" and the fleepin' dynamic
   array handling.

			TO DO

1. Test this loser on something other than the sample program.

2. Add more optimization templates, esp. the idioms for &, |, ^, ~.

3. Switch for output in "clockface" mode, for superstitious users who
   believe writing "IV" upside-down offends IVPITER and would rather
   see IIII.

4. Input format internationalization -- allow WRITE IN input digits in
   major languages such as Nahuatl, Tagalog, Sanskrit, and Basque.

5. Forget this @!%$#! crock and take a long vacation.
--------------------------------------------------------------------------

I will release this puppy (and entirely wash my hands of it) once I've
cleared up a few technical questions with the language's originators.
-- 
      Eric S. Raymond = ...!uunet!snark!eric  (mad mastermind of TMN-Netnews)