smk@cbterra.ATT.COM (Stephen Kennedy) (08/04/87)
[ This is the CC8 documentation. Uuencoded version of compiler binary follows. The decoded compiler should be 24346 bytes in length or 194 single density sectors + 96 bytes -- smk ] CC8 DOCUMENTATION This compiler is an upgraded version of the Deep Blue C (DBC) compiler which supports more features of standard C and is much faster. It generates the same kind of pseudo code as DBC and ACE C and can be used with either's linker and runtime engine. I recommend using ACE C for its speed and expanded features. This document gives a brief overview of the extended features of the compiler along with its limitations. This document is not intended as a C tutorial, nor is it intended as a tutorial for DBC or ACE C. Features such as assembly language protocol, built-in routines, and floating point depend solely upon the type of linker and runtime system you are using, and NOT upon this compiler. PREPROCESSING CC8 supports all preprocessor functions except for #line. This includes macro functions, #ifdef, #ifndef, #if, and #undef. In addition, o #include files may be nested (currently only 2 deep) o no restriction on placement of comments o the TAB character may be used as white space. o predefined symbols __LINE__ (replaced by current line number) and __FILE__ (replaced by current file name). o #ifdef, #ifndef, and #if may be nested (5 deep maximum). Caveats: - "bizarre" usage of #define macros may not work as expected (or at all) - in the "skipped" section of an #ifdef-#else-#endif group, comments are not processed. Thus it is not a good idea to "comment out" #else's or #endif's. VARIABLE DECLARATIONS CC8 supports more complex type constructions than DBC or ACE C, e.g., pointers to pointers. In addition, the types struct, union, and enum have been added and variables can be declared static. o multidimensional arrays may be declared. o functions may be declared as returning char, int, or pointer to anything. Pointers to functions, arrays of pointers to functions, etc, may be declared. Note: a pointer to a function must now be used correctly: (*pf)(); not (pf)(); o variables declared as type "enum x" are really declared as type "int"; the name "x" is ignored. Note that user assignment of numbering scheme is implemented, e.g., enum colors $( red = 2, white = 0, blue $); assigns red = 2, white = 0 and blue = 1. o external array declarations of the form extern int array_name[]; are now valid. o in expressions, the sizeof and type cast operators may be used. Caveats: - no float/double declarations (but a limited floating point is available when ACE C's linker and runtime engine are used) - all struct/unions must have a tag name - all global symbols, struct/union tags and components are drawn from the same name pool. - no bit fields - identical declarations of a variable neither cause and error nor do they work reasonably. For example, int a; int a; will silently declare "a" twice and confuse the linker. COMPILE TIME EVALUATION OF CONSTANT EXPRESSIONS CC8 supports a limited form of compile time evaluation of constant expressions. A constant expression is defined as any expression containing integer constants, character constants, sizeof, or a constant expression separated by one or more operators in one of the following groups, "|", "^", "&", "==", "!=", ">> <<", "+ -", "* / %", or prefixed by one of the unary operators, "-", "!", "$-". Evaluation follows normal rules of operator precedence and can be overridden by parentheses. Examples: x = 12*24 + 13; => x = 301; x = 12 + 13 + y; => x = 25 + y; x = y + 12 + 13; => x = y + 12 + 13; (1) x = y + 12*13; => x = y + 156; (2) x = y + (12 + 13) => x = y + 15; (3) 1. compiler "gives up" after finding non-constant 'y'. 2. '*' is in a different group than '+' and appears to the compiler as a separate expression. 3. Parentheses force compiler to treat "12 + 13" as a subexpression. Constant expressions may be used in array declarations, initializers, and #if. INITIALIZERS CC8 supports initialization of global variables of type char, int, pointer, array, struct/union, and enum. The syntax is a subset of that allowed by standard C. Rules: 1. Simple types (char, int, pointer) can be initialized by a constant expression. The constant expression must not be enclosed inside $( $). Absolutely no type checking is done. The definition of constant expression is extended to include: o address of static variable o static array names o function names o "character string" 2. Aggregate types can be initialized by the following construct: $( init-element-1, init-element-2, ..., init-element-m $) a. if fewer initializers than elements appear, the rest of the aggregate is initialized to zero. b. if the number of elements of an array is not specified, then it will be set to the number of initializers. c. a character array may be initialized by a string. Examples: int x = 23*22; char *p = "hello"; char q[10] = "hello"; /* q[6] ... q[9] are set to '\0' */ int f(); int (*pf)() = f; /* array bound set to 5 */ char *weekdays[] = $( "Mon", "Tues", "Wed", "Thu", "Fri" $); struct x $( char *symbol; int value; $); struct x tab[] = $( $( "word1", 4 $), $( "word2", 8 $), $( "word3", 13 $) $); Caveats: local variables cannot be initialized (bad programming practice anyway) GOTOS AND LABELS CC8 suports the dreaded goto statement and labels. The compiler does not distinguish between labels and ERROR HANDLING Error messages are printed in the typical Unix C fashion: file-name, line #: error message In addition, the line causing the error is printed. Note: if the error occurs near the end of a line, the compiler will usually flag the next line. This compiler, like DBC and ACE C, does not know how to intelligently recover from syntax errors. For this reason, the compiler will halt compilation after 8 errors. The compiler no longer exits directly to DOS on fatal errors (e.g., too many errors) so that the user may jump directly to an editor using the command line feature described below. I would appreciate knowing if the compiler hangs after finding an error or a certain sequence of errors. INTERNAL LIMITS AND TABLE SIZES Lest we forget this is not a 48K 8 bit implementation of C, here are some of the more important limits: 1. 7000 bytes of "global space" -- shared by: - macro definitions - each unique non-macro symbol name (15 bytes + length of symbol) - global symbol type info 2. 256 bytes of "local space" - local symbol type info - only 128 local symbols may be declared. 3. A source line may not be longer than 254 characters after macro expansion. WARNING: this limit is not checked! 4. An expression cannot generate more than 512 bytes of p-code instructions. An expression violating this limit would be ridiculously huge. WARNING: this limit is not checked! 5. 512 bytes of "string space" - This table was 3000 bytes in DBC and probably ACE C too. - A way around the size constraint is to use initializers: char dummy1[] = " ... "; char dummy2[] = " ... "; ... 6. 2300 bytes of stack space The compiler has been written to conserve stack space, so stack overflows should be a rarity. This is fortunate because -- WARNING: this limit is not checked! Stack overflow will overwrite screen memory (but it's possible the compile will complete normally). 7. #define macros may have up to 128 arguments. CC8 uses memory from MEMLO to MEMTOP and locations $480 to $6FF. MEMLO must not exceed $2BFF. SPEED CC8 is much faster than DBC and slightly faster than ACE C (15% - 30% depending on type of program). Compilation time is affected most by the presence of #defines. You should expect compilation times of 15 to 30 seconds for small programs, and up to 2:00 for large programs with many #defines. Larger source files are possible, but run the risk of running out of internal table space. Some sample times: Program Description SC ACEC DBC ------------------- -- ----- --- 238 lines, no #defines 58 80 (1) 551 same, one #define 66 (2) 77 (1) - 482 lines + 222 #include'd many #defines + comments 107 - - Note: these timings were made on an 800xl with a 1050 running DOS 2.5 with write verify on. (1) I can't explain this anomaly. (2) The reason one #define makes so much difference is that the compiler takes advantage of the zero #defines to skip part of the pre- processing phase. COMMAND LINE FEATURES For users of DOS' that take a long time to reload (such as DOS 2.X), a way to load and run files directly from the command line has been provided. o ^L<RETURN> runs file called "D1:LINK.COM" o ^Lfilename<RETURN> runs file called "filename" ("D1:" and ".COM" default) o ^E<RETURN> runs file called "D1:EDIT.COM" Note this feature is in its infancy. 1. The control characters print as their equivalent graphic characters. 2. ^L runs the ACE C and DBC linkers, and ^E runs SpeedScript. Nothing else has been tested. 3. "Theoretically" the load routine understands INIT addresses. MISCELLANEOUS Switch statement handling has been reworked. The last clause need not end in "break", "continue", or "return". In addition, the "default" clause need not be the last (this was a bug in DBC). On the down side, you are limited to 100 "case"s per switch statement, and switch statements may be safely nested only four or five deep. I'll wait until someone complains before I attempt to fix this. More efficient code is now generated for the && and || operators. Local variables may be declared at the beginning (not just anywhere as in DBC) of compound statements. However no two local variables declared inside a function may have the same name (this is a departure from standard C). Variables declared with a storage class of "register" are the same as "auto". All C reserved words are recognized by the compiler. Thus a variable cannot be named "float" or "entry." Characters with the 8th bit set not part of character constants, strings, or comments will cause the compiler to crash. FASTC.COM should work on CC8 output. I have been told that the compiler also works with the LightSpeed C linker and runtime engine. FUTURE 1. Improve error handling 2. Add ability to pass files to compile in command line for SpartaDOS and compatible DOS. ACKNOWLEDGEMENTS Thanks to Harald Striepe and Marc Appelbaum for their beta testing. Thanks to Mark VandeWettering and Greg Koolbeck for their input. AUTHOR Steve Kennedy 1895 Fountainview Ct Columbus, OH 43232 cbosgd!smk (seismo, decwrl, and ihpn4 know about cbosgd)