osd7@homxa.UUCP (Orlando Sotomayor-Diaz) (05/13/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c> mod.std.c Digest Sun, 12 May 85 Volume 6 : Issue 8 Today's Topics: Alternate character sets How does ## work... ---------------------------------------------------------------------- Date: Thu, 9 May 85 21:48:10 edt From: Kevin Martin <ihnp4!watmath!kpmartin> Subject: Alternate character sets To: std-c@cbosgd Has there been any thought of supporting alternate character sets (i.e. other than the character set used for 'c's and "string"s)? At least one C compiler, the Bell Labs GCOS compiler, has them (BCD `string`s), and many relatives of this compiler "know" about grave accents and complain if you use them. This would allow simpler use of Honeywell's BCD, CDC's funny 64-character set, and also the dreaded rad50 character set used on many 16 and 32 bit machines. It's *not* something new, and it's *not* 'syntactic sugar'. Kevin Martin, UofW Software Development Group. (They'll only re-write their linkers if the new ones can read old object files, and the old files say neat things like "$ object" in BCD :-)) [ The source character set must contain 52 letters (English alphabet, upper and lower case), the ten decimal digits, and the following 29 graphic characters: ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } - Notice that the grave accent (`) is not required, though a particular implementation may add it. Some of the 29 graphic characters above can be mapped from trigraphs, as discussed here some time ago. -- Mod. -- ] ------------------------------ Date: Thu, 9 May 85 21:36:25 edt From: Kevin Martin <ihnp4!watmath!kpmartin> Subject: How does ## work... To: std-c@cbosgd I have only heard vague descriptions of what ## does: It concatenates tokens. However, it appears to do so after formal parameter replacement in a macro. But the body of a macro is already tokenized (see the definition for #define). So ## must un-tokenize (back into a character stream) and re-tokenize. The question is: How many tokens after the ## does it un-tokenize? Consider: #define foo(n) n ## 32Ugly The macro prototype consists of the tokens: (formal parameter 1) '##' (unsigned constant 32) (identifier 'gly') Now we call the macro: foo(22.) The new token sequence becomes: (floating constant '22.') '##' (unsigned constant 32) (id 'gly') Now, what is the resulting token sequence? (floating constant '22.32') (identifier 'Ugly') ? A clean way of avoiding these problems is to give a stricter definition of '##': It joins *exactly* two tokens into *exactly* one. No leftovers. This would make my example erroneous, since the 'U' would be left over after re-scanning '22.32'. Or does the draft standard already say this? Kevin Martin, UofW Software Development Group [ "Macro names found in a macro argument are replaced appropriately. A comma in the replacement token sequence does not change the actual number of arguments to the macro. After all replacements have taken place, each instance in the definition of a ## token is deleted, and the tokens preceeding and following it are concatenated to a single token." Section C.8.2, p. 49, Draft 85-008. I'm not sure the paragraph above is suffient to answer the question. Any comments? -- Mod -- ] ------------------------------ End of mod.std.c Digest - Sun, 12 May 85 20:01:49 EDT ****************************** USENET -> posting only through cbosgd!std-c. ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C) In all cases, you may also reply to the author(s) above.