[mod.std.c] mod.std.c Digest V11#3

osd@hou2d.UUCP (Orlando Sotomayor-Diaz) (10/28/85)

From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c>


mod.std.c Digest            Mon, 28 Oct 85       Volume 11 : Issue   3

Today's Topics:
   Suggested ANSI C changes for international character set support
                      the environment (2nd try)
----------------------------------------------------------------------

Date: Mon, 21 Oct 85 17:09:31 PDT
From: ihnp4!l5!gnu (John Gilmore)
Subject: Suggested ANSI C changes for international character set support
To: ihnp4!cbosgd!std-c

Something just occurred to me while thinking about how to define a
version of stdio that works with 16-bit characters.  I don't think we
should consider adding a "long char" type to the language, but a
few changes will need to occur to make wide characters easy to program
with.

The problem that came up is how to write strings, eg those fed to
printf, for printw.  (Printw ["printworld" or "printwide"] would be a
"printf" designed to handle the full international character set.)
What is needed is a way to initialize an array of 16-bit values
(shorts) with a string, as in:

char  string[] = "The value is %d\n";
   versus
short string[] = "The value is %d\n";

This doesn't work in today's C compilers, though I haven't seen a
particular reason why it shouldn't.  I see no reason not to make

float string[] = "abc";

assign the values 97., 98., and 99., and I suggest that in general,
an initializer like "abc" be standardized as exactly equivalent to
{'a', 'b', 'c'}  for ALL types.

There is also the question of what to do about character constants in
expressions: how do you make "short" character strings instead of
"char" character strings, or indicate that '%' means a short '%' rather
than a char '%'?  (replace % with your favorite Chinese glyph.)

I suggest that a trailing letter do this, as is currently done for long
integer constants.  If  37L  works, why not  "foo"L  or  "foo"S  or
"foo"C?  Similarly,  'a'L  and  'a'S  and  'a'C, where C is the default
as now.

There is a slight wrinkle in saying that "abc" is equivalent to {'a',
'b', 'c'}.  In a 16-bit character set, 'a' must be the first
*character* that appears after the ", not just the first *byte*.  C
compilers which are written assuming in 8-bit input characters (and
which support strings > 8 bits) must run their strings thru a
conversion routine to get long values for internal use.  The conversion
routine comes from stdio, since stdio will need it for I/O.  An example:

short w[] = "chinese";

(insert 7 chinese letters in place of "chinese") would be tokenized and
stored internal to the compiler as a 7-element string, so that the
resulting array <w> would be an array of 7 shorts.  Meanwhile, 

char c[] = "chinese";

would be tokenized the same way, but the resulting array <c> might be
an array of 20 chars, since each of the 7 glyphs requires more than one
8-bit char to hold it.  

This lets programs use the entire character set, either via encoded byte
strings or via true wide character processing, whichever is more convenient.

------------------------------

Date: Mon, 14 Oct 85 13:06:29 PDT
From: UCLA Computer Club <cc1@LOCUS.UCLA.EDU>
Subject: the environment (2nd try)
To: cbosgd!std-c@LOCAL.Berkeley.EDU

 
As long as things are being improved in the standard, lets fix the environment:
getenv()--return a string
addenv()--add a string
firstenv()--
nextenv()-- these 2 will step through the entire environment
clrenv()--delete the entire environment
delenv()--delete one item from the environment

Require system() to put information into the environment, and require the
initialization defaults (such as buffering of stdout, next name to be
returned by tmpnam, etc) to be initialized from the environment (if
information is present)
 
Incidently, how about a new call to complement perror()? It would return
a string of the form "prog1: prog2: prog3" if it had been invoked by
prog3 running under prog2 which is running under prog1. The information
would be passed in the environment, could be cleared by interactive shells,
and would help deternime just where an error occured.

			Michael Gersten

------------------------------

End of mod.std.c Digest - Mon, 28 Oct 85 08:25:13 EST
******************************
USENET -> posting only through cbosgd!std-c.
ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C)
In all cases, you may also reply to the author(s) above.