cagney@chook.ua.oz (Andrew Cagney - aka Noid) (10/12/89)
[I've been asked to repost this mail to the net (by my 'boses bos' :-) so here it is. :-] The problem is using preprocessors other than MINIX ACK CPP for compiling minix. In particular ANSI C type preprocessors. It is especially important in that the next version of minix probably use a preprocessor to run over the assembler files to provide a macro facility. Unfortunatly the semantics if CPP are poorly defined. This also highlights other potential problems when changing from old to ANSI C. [The original problem] > #if x == y > << An error message that's annoying for TurboC >> > #endif > [Mr X has noted that TURBO's C looses characters because of the single quote.] Hm I've noticed this also. Compare it's quote with a correct character quote's. That is 's'. I beleive that TC is grabing the first quote then eating every thing until the second of the character literal. Some other examples: What about '\ ' instead of '\n', or how about '\000000000000000000040'. (1) Looking at the LRM :-) Under the description of a character. 'A character constant is a sequence of one or more characters encolsed in single-quotes, as in 'x' or 'ab'. Hence 'a' 'abc' 'abcedfghijklmnopqurestavasdfkljasepoifjaposie' are all valid characters! The support of the last one is a little `implementation defined'. What about `\ ` Well there are two contrary (?) notes. The first Constraints A new-line character shall not appear in a character constant. The second [.... \t \a \n ....] If an escape sequence other than one of these is encountered, the behavour is implementation-defined. I'd say that the first rule is the one to go by. Ok so enough of the rules. To conclude, Turbo-C's preprocessor is correct in flaging the line above as an error. The error recovery it uses as a consequence of encountering an open quote, discarding every thing until the next quote, is terrible but LRM's are not in the business of defining error recovery :-). [Gcc's cccp flags the open quote as an error but does not discard the text] In the case of MINIX having lines such as the above (or more typically) for assembler before it is fed to the preprocessor: jmp .fat | that's a dot in a label are invalid for parsing with an ansi CPP. The quote is not closed. :-( Problem 2 How CPP handles undefined characters. ========= The '#' character. eg mov bx, #CLICK_SIZE ! that # character for immediate mode. The '#' character is not a valid character in ANSI C. But what should preprocessors do with it? TURBO's CPP Discards the character :-(. Gnu's cccp simply ignores (echo's a #) I can't find any reference on this one in the LRM. Any idea's? Problem 3 Macro expansion and ## ========= This example is included for compleatness. It highlights another potential problem for using different CPP's. It was not part of the original piece of mail I sent. In fsck there are the macros: #define quote(x) x #define nextarg(t) (*argp.quote(u_)t++) they are then used as: nextarg(char) the macro expansion using ACK cpp produces: (*argp.u_char++) For this to work with ansi c the macro needs to be defined as #define nextarg(t) (*argp.u_ ## t ++) Andrew Cagney cagney@cs.ua.oz [REF Draft programming language C, WORKING DOUMENT, Feb 19, *1986* I'm afraid I can't find a more recent one. :-(] (1) If your wondering that is actualy the character \000 followed by the characters 000.....40 :-)