osd@hou2d.UUCP (Orlando Sotomayor-Diaz) (10/28/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c> mod.std.c Digest Mon, 28 Oct 85 Volume 11 : Issue 3 Today's Topics: Suggested ANSI C changes for international character set support the environment (2nd try) ---------------------------------------------------------------------- Date: Mon, 21 Oct 85 17:09:31 PDT From: ihnp4!l5!gnu (John Gilmore) Subject: Suggested ANSI C changes for international character set support To: ihnp4!cbosgd!std-c Something just occurred to me while thinking about how to define a version of stdio that works with 16-bit characters. I don't think we should consider adding a "long char" type to the language, but a few changes will need to occur to make wide characters easy to program with. The problem that came up is how to write strings, eg those fed to printf, for printw. (Printw ["printworld" or "printwide"] would be a "printf" designed to handle the full international character set.) What is needed is a way to initialize an array of 16-bit values (shorts) with a string, as in: char string[] = "The value is %d\n"; versus short string[] = "The value is %d\n"; This doesn't work in today's C compilers, though I haven't seen a particular reason why it shouldn't. I see no reason not to make float string[] = "abc"; assign the values 97., 98., and 99., and I suggest that in general, an initializer like "abc" be standardized as exactly equivalent to {'a', 'b', 'c'} for ALL types. There is also the question of what to do about character constants in expressions: how do you make "short" character strings instead of "char" character strings, or indicate that '%' means a short '%' rather than a char '%'? (replace % with your favorite Chinese glyph.) I suggest that a trailing letter do this, as is currently done for long integer constants. If 37L works, why not "foo"L or "foo"S or "foo"C? Similarly, 'a'L and 'a'S and 'a'C, where C is the default as now. There is a slight wrinkle in saying that "abc" is equivalent to {'a', 'b', 'c'}. In a 16-bit character set, 'a' must be the first *character* that appears after the ", not just the first *byte*. C compilers which are written assuming in 8-bit input characters (and which support strings > 8 bits) must run their strings thru a conversion routine to get long values for internal use. The conversion routine comes from stdio, since stdio will need it for I/O. An example: short w[] = "chinese"; (insert 7 chinese letters in place of "chinese") would be tokenized and stored internal to the compiler as a 7-element string, so that the resulting array <w> would be an array of 7 shorts. Meanwhile, char c[] = "chinese"; would be tokenized the same way, but the resulting array <c> might be an array of 20 chars, since each of the 7 glyphs requires more than one 8-bit char to hold it. This lets programs use the entire character set, either via encoded byte strings or via true wide character processing, whichever is more convenient. ------------------------------ Date: Mon, 14 Oct 85 13:06:29 PDT From: UCLA Computer Club <cc1@LOCUS.UCLA.EDU> Subject: the environment (2nd try) To: cbosgd!std-c@LOCAL.Berkeley.EDU As long as things are being improved in the standard, lets fix the environment: getenv()--return a string addenv()--add a string firstenv()-- nextenv()-- these 2 will step through the entire environment clrenv()--delete the entire environment delenv()--delete one item from the environment Require system() to put information into the environment, and require the initialization defaults (such as buffering of stdout, next name to be returned by tmpnam, etc) to be initialized from the environment (if information is present) Incidently, how about a new call to complement perror()? It would return a string of the form "prog1: prog2: prog3" if it had been invoked by prog3 running under prog2 which is running under prog1. The information would be passed in the environment, could be cleared by interactive shells, and would help deternime just where an error occured. Michael Gersten ------------------------------ End of mod.std.c Digest - Mon, 28 Oct 85 08:25:13 EST ****************************** USENET -> posting only through cbosgd!std-c. ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C) In all cases, you may also reply to the author(s) above.