[comp.os.minix] Problems with ANSI C vs ACK C for preprocessing.

cagney@chook.ua.oz (Andrew Cagney - aka Noid) (10/12/89)

[I've been asked to repost this mail to the net (by my 'boses bos' :-)
so here it is. :-]

The problem is using preprocessors other than MINIX ACK CPP for compiling
minix. In particular ANSI C type preprocessors. It is especially important
in that the next version of minix probably use a preprocessor to run over
the assembler files to provide a macro facility. Unfortunatly the semantics
if CPP are poorly defined. This also highlights other
potential problems when changing from old to ANSI C.

[The original problem]
> #if	x == y
>   << An error message that's annoying for TurboC >>
> #endif
>
[Mr X has noted that TURBO's C looses characters because of the single
quote.]

Hm I've noticed this also. Compare it's quote with a correct character
quote's. That is 's'. I beleive that TC is grabing the first quote then
eating every thing until the second of the character literal.
Some other examples:
What about '\
' instead of '\n', or how about '\000000000000000000040'. (1)
 
Looking at the LRM :-) Under the description of a character.

	'A character constant is a sequence of one or more characters encolsed
	in single-quotes, as in 'x' or 'ab'.

Hence 'a' 'abc' 'abcedfghijklmnopqurestavasdfkljasepoifjaposie' are
all valid characters! The support of the last one is a little `implementation
defined'.

What about `\
` Well there are two contrary (?) notes. The first

	Constraints
		A new-line character shall not appear in a character
		constant.

The second

	[.... \t \a \n ....]
	If an escape sequence other than one of these is encountered,
	the behavour is implementation-defined.

I'd say that the first rule is the one to go by.

Ok so enough of the rules.

To conclude, Turbo-C's preprocessor is correct in flaging the line
above as an error. The error recovery it uses as a consequence of
encountering an open quote, discarding every thing until the next quote,
is terrible but LRM's are not in the business of defining error recovery
:-).

[Gcc's cccp flags the open quote as an error but does not discard the text]

In the case of MINIX having lines such as the above (or more typically)
for assembler before it is fed to the preprocessor:

	jmp .fat	| that's a dot in a label

are invalid for parsing with an ansi CPP. The quote is not closed. :-(

Problem 2	How CPP handles undefined characters.
=========

The '#' character.

eg
	mov bx, #CLICK_SIZE	! that # character for immediate mode.

The '#' character is not a valid character
in ANSI C. But what should preprocessors do with it? TURBO's CPP
Discards the character :-(. Gnu's cccp simply ignores (echo's a #)
I can't find any reference on this one in the LRM.

Any idea's?

Problem 3	Macro expansion and ##
=========

This example is included for compleatness. It highlights another potential
problem for using different CPP's. It was not part of the original piece
of mail I sent.

In fsck there are the macros:
	#define quote(x)	x
	#define nextarg(t)	(*argp.quote(u_)t++)
they are then used as:
	nextarg(char)
the macro expansion using ACK cpp produces:
	(*argp.u_char++)
For this to work with ansi c the macro needs to be defined as
	#define nextarg(t)	(*argp.u_ ## t ++)

				Andrew Cagney
				cagney@cs.ua.oz


[REF Draft programming language C, WORKING DOUMENT, Feb 19, *1986*
     I'm afraid I can't find a more recent one. :-(]

(1)	If your wondering that is actualy the character
	\000 followed by the characters 000.....40 :-)