[comp.sys.atari.8bit] Repost: CC8 documentation

smk@cbterra.ATT.COM (Stephen Kennedy) (11/04/87)

[ This is a repost of the V2.3 CC8 documentation.  I've fixed some
  grammatical and spelling errors and appended the notes for V2.3b,
  the last posted version of the compiler. - Steve ]

CC8 DOCUMENTATION

This compiler is an upgraded version of the Deep Blue C (DBC) compiler
which supports more features of standard C and is much faster.  It
generates the same kind of pseudo code as DBC and ACE C and can
be used with either's linker and runtime engine.  I recommend using ACE
C for its speed and expanded features.  This document gives a brief
overview of the extended features of the compiler along with its
limitations.

This document is not intended as a C tutorial, nor is it intended
as a tutorial for DBC or ACE C.  Features such as assembly language
protocol, built-in routines, and floating point depend solely upon the
type of linker and runtime system you are using, and NOT upon this compiler.

PREPROCESSING

CC8 supports all preprocessor functions except for #line.  This
includes macro functions, #ifdef, #ifndef, #if, and #undef.  In addition,

    o #include files may be nested (currently only 2 deep)

    o no restriction on placement of comments

    o the TAB character may be used as white space.

    o predefined symbols __LINE__ (replaced by current line number) and
      __FILE__ (replaced by current file name).

    o #ifdef, #ifndef, and #if may be nested (5 deep maximum).

Caveats:

    - "bizarre" usage of #define macros may not work as expected (or
      at all)

    - in the "skipped" section of an #ifdef-#else-#endif group, comments
      are not processed.  Thus it is not a good idea to "comment out"
      #else's or #endif's.

VARIABLE DECLARATIONS

CC8 supports more complex type constructions than DBC or ACE C, e.g.,
pointers to pointers.  In addition, the types struct, union, and enum 
have been added and variables can be declared static.

    o multidimensional arrays may be declared.

    o functions may be declared as returning char, int, or pointer
      to anything.  Pointers to functions, arrays of pointers to
      functions, etc, may be declared.  Note:  a pointer to a
      function must now be used correctly:

	(*pf)();

	not

	(pf)();

    o variables declared as type "enum x" are really declared as
      type "int"; the name "x" is ignored.  Note that user
      assignment of numbering scheme is implemented, e.g.,

	enum colors $( red = 2, white = 0, blue $);

      assigns red = 2, white = 0 and blue = 1.

    o external array declarations of the form

	extern int array_name[];

      are now valid.

    o in expressions, the sizeof and type cast operators may be
      used.

Caveats:

    - no float/double declarations (but a limited floating point is
      available when ACE C's linker and runtime engine are used)

    - all struct/unions must have a tag name

    - all global symbols, struct/union tags and components are drawn
      from the same name pool.

    - no bit fields

    - identical declarations of a variable neither cause an error
      nor do they work reasonably.  For example,

      int a; int a;

      will silently declare "a" twice and confuse the linker.

COMPILE TIME EVALUATION OF CONSTANT EXPRESSIONS

CC8 supports a limited form of compile time evaluation of
constant expressions.  A constant expression is defined as
any expression containing integer constants, character constants,
sizeof, or a constant expression separated by one or more operators
in one of the following groups, "|", "^", "&", "==", "!=", ">> <<",
"+ -", "* / %", or prefixed by one of the unary operators, "-",
"!", "$-".  Evaluation follows normal rules of operator precedence
and can be overridden by parentheses.  Examples:

    x = 12*24 + 13;	=>	x = 301;

    x = 12 + 13 + y;	=>	x = 25 + y;

    x = y + 12 + 13;	=>	x = y + 12 + 13; (1)

    x = y + 12*13;	=>	x = y + 156; (2)

    x = y + (12 + 13)	=>	x = y + 15; (3)

    1. compiler "gives up" after finding non-constant 'y'.

    2. '*' is in a different precedence group than '+' and thus "12*13"
       appears to the compiler as a separate expression.

    3. Parentheses force compiler to treat "12 + 13" as a subexpression.

Constant expressions may be used in array declarations, initializers,
and #if.

INITIALIZERS

CC8 supports initialization of global variables of type char, int,
pointer, array, struct/union, and enum.  The syntax is a subset of that
allowed by standard C.

Rules:

    1.  Simple types (char, int, pointer) can be initialized by
	a constant expression.  The constant expression must not
	be enclosed inside $( $).  Absolutely no type checking is
	done.  The definition of constant expression is extended
	to include:

	    o address of static variable
	    o static array names
	    o function names
	    o "character string"

    2.  Aggregate types can be initialized by the following
	construct:

	$( init-element-1, init-element-2, ..., init-element-m $)

	a. if fewer initializers than elements appear, the rest of
	   the aggregate is initialized to zero.

	b. if the number of elements of an array is not specified,
	   then it will be set to the number of initializers.

	c.  a character array may be initialized by a string.

    Examples:

    int x = 23*22;

    char *p = "hello";
    char q[10] = "hello";	/* q[6] ... q[9] are set to '\0' */

    int f();
    int (*pf)() = f;

				/* array bound set to 5 */
    char *weekdays[] = $( "Mon", "Tues", "Wed", "Thu", "Fri" $);

    struct x $(
	char *symbol;
	int  value;
	$);
    struct x tab[] = $(
	$( "word1", 4 $),
	$( "word2", 8 $),
	$( "word3", 13 $)
	$);

Caveats:

    - local variables cannot be initialized (bad programming practice
      anyway)

    - unions cannot be initialized (properly, that is)

GOTOS AND LABELS

CC8 supports the dreaded goto statement and labels.  Note:  the compiler
cannot distinguish between labels and local variables.

ERROR HANDLING

Error messages are printed in the typical Unix C fashion:

    file-name, line #: error message

In addition, the line causing the error is printed.  Note:  if the error
occurs near the end of a line, the compiler will usually flag the next line.

This compiler, like DBC and ACE C, does not know how to intelligently
recover from syntax errors.  For this reason, the compiler will halt
compilation after 8 errors.

The compiler no longer exits directly to DOS on fatal errors (e.g., too
many errors) so that the user may jump directly to an editor using
the command line feature described below.

I would appreciate knowing if the compiler hangs after finding an error
or a certain sequence of errors.

INTERNAL LIMITS AND TABLE SIZES

Lest we forget this is not a 48K 8 bit implementation of C, here are
some of the more important limits:

1.  7000 bytes of "global space" -- shared by:

    - macro definitions
    - each unique non-macro symbol name (15 bytes + length of symbol)
    - global symbol type info

2.  256 bytes of "local space"

    - local symbol type info
    - only 128 local symbols may be declared.

3.  A source line may not be longer than 254 characters after macro
    expansion.  WARNING:  this limit is not checked!

4.  An expression cannot generate more than 512 bytes of p-code
    instructions.  An expression violating this limit would be
    ridiculously huge.  WARNING:  this limit is not checked!

5.  512 bytes of "string space"

    - This table was 3000 bytes in DBC and probably ACE C too.
    - A way around the size constraint is to use initializers:

	char dummy1[] = " ... ";
	char dummy2[] = " ... ";
	     ...

6.  2300 bytes of stack space

    The compiler has been written to conserve stack space, so stack
    overflows should be a rarity.  This is fortunate because --
    WARNING:  this limit is not checked!  Stack overflow will overwrite
    screen memory (but it's possible compilation will complete normally).

7.  #define macros may have up to 128 arguments.

CC8 uses memory from MEMLO to MEMTOP and locations $480 to $6FF.  MEMLO
must not exceed $2BFF.

SPEED

CC8 is much faster than DBC and slightly faster than ACE C (15% -
30% depending on type of program).  Compilation time is affected
most by the presence of #defines.  You should expect compilation
times of 15 to 30 seconds for small programs, and up to 2:00 for large
programs with many #defines.  Larger source files are possible, but run
the risk of running out of internal table space.  Some sample times:

    Program Description		SC	ACEC	DBC
    -------------------		--	-----	---
    238 lines, no #defines	58	80 (1)	551
    same, one #define		66 (2)	77 (1)	-

    482 lines + 222 #include'd
    many #defines + comments	107	-	-

Note:  these timings were made on an 800xl with a 1050 running DOS 2.5
       with write verify on.

(1) I can't explain this anomaly.

(2) The reason one #define makes so much difference is that the compiler
    takes advantage of the zero #defines to skip part of the pre-
    processing phase.

COMMAND LINE FEATURES

For users of DOS' that take a long time to reload (such as DOS 2.X), a
way to load and run files directly from the command line has been
provided.

    o ^L<RETURN>           runs file called "D1:LINK.COM"
    o ^Lfilename<RETURN>   runs file called "filename"
			   ("D1:" and ".COM" default)
    o ^E<RETURN>           runs file called "D1:EDIT.COM"

Note this feature is in its infancy.

    1. The control characters print as their equivalent graphic characters.

    2. ^L runs the ACE C and DBC linkers, and ^E runs SpeedScript.  Nothing
       else has been tested.

    3. "Theoretically" the load routine understands INIT addresses.

MISCELLANEOUS

Switch statement handling has been reworked.  The last clause need not
end in "break", "continue", or "return".  In addition, the "default"
clause need not be the last (this was a bug in DBC).  On the down side,
you are limited to 100 "case"s per switch statement, and switch statements
may be safely nested only four or five deep.  I'll wait until someone
complains before I attempt to fix this.

More efficient code is now generated for the && and || operators.

Local variables may be declared at the beginning (not just anywhere
as in DBC) of compound statements.  However no two local variables
declared inside a function may have the same name (this is a
departure from standard C).  Variables declared with a storage class of
"register" are the same as "auto".

All C reserved words are recognized by the compiler.  Thus a variable
cannot be named "float" or "entry."

Characters with the 8th bit set that are not part of character constants, 
strings, or comments will cause the compiler to crash.

FASTC.COM should work on CC8 output.

I have been told that the compiler also works with the LightSpeed C
linker and runtime engine.

FUTURE

1. Improve error handling

2. Add ability to pass files to compile in command line for SpartaDOS
   and compatible DOS.

ACKNOWLEDGEMENTS

Thanks to Harald Striepe and Marc Appelbaum for their beta testing.

Thanks to Mark VandeWettering and Greg Koolbeck for their input.

AUTHOR

Steve Kennedy

1895 Fountainview Ct
Columbus, OH  43232

cbosgd!smk		(seismo, decwrl, and ihpn4 know about cbosgd)

--------------------------------------------------------------------------

The following is a list of fixed bugs, enhancements, and known problems:

Bug fixes:

    1.  Empty expressions in for statement accepted, i.e.,
    
	    for(;;;) $( $)

    2.  Constant expression are now valid after "case", i.e.,
        "case 2*3."  (fixes "case -1:" problem)

    3.  "*a" if a is an array no longer generates an error.

    4.  A function argument declaration of "type var[]" is now
        converted to "type *var."  Previously, the compiler generated
	bad code which caused a lock-up when the resulting program
	was run (e.g., programs using "getname()" in ACECIO.C).

Enhancements:

    1.  The expressions "a[x]" and "x[a]" are both valid and
	equivalent provided one of x or a is a pointer or array.

    2.  structs without explicit tag names are now legal, i.e.,
    
	    "struct $( .... $) y;"

    3.  The compiler now recognizes the keywords "short" and "long."
	Note that "int" = "short" = "short int" = "long" = "long int"
	= 2 bytes.  "Long" declarations produce the warning
	"long == short."

    4.  The compiler now recognizes the keyword "unsigned" and
	will generate unsigned comparison code for <, <=, >, or >=
	when one or both operands are unsigned.

	Caveats:

	    - You'll have to write your own routines to print these
	      correctly (no %u in ACE C or DBC printf).

	    - I'm not sure *, /, or % work properly on unsigned numbers.

Known problems (send mail or post if you want to add to the list):

    1.  Not all escape sequences recognized by ACE C are recognized by
	CC8.  (\u, \d, \l, \r, \e)

    2.  Compiler doesn't like "register x" although it will accept
	"register int x."  Note that "register" doesn't do anything
	special anyway.