[net.lang.c] Does C depend on ASCII?

nishri@utcsstat.UUCP (Alex Nishri) (05/03/84)

Does anyone have any experience or comments about the dependability of
programs written in C on the ASCII character representation?  Could most
programs written in C be run on a different character representation
scheme?  What about the Unix system itself?

(For a completely different scheme consider EBCDIC.  The numerics collate
after the alphabetics.  So 'a' < '1' in EBCDIC.  Also EBCDIC has holes in
the alphabetic sequence.  Thus 'a' + 1 is equal to 'b', but 'i' +1 is not
equal to 'j'.  In fact 'i' + 8 equals 'j'.)

Alex Nishri
University of Toronto
 ... utcsstat!nishri

gwyn@brl-vgr.ARPA (Doug Gwyn ) (05/12/84)

Traditionally C has used the host computer "native" character set
(how can a convention be "native"? you ask; yet it really is).
However many programs written in C implicitly assume that the
character set is ASCII, although the language doesn't guarantee this.

I seem to recall that the C Language Standards Committee addressed
this question but I don't remember whether they decided that ASCII is
the "official" C character set.

For my own use in those few cases where the character codes are
important, I have the following lines in my standard header file:
/* integer (or character) arguments and value: */
/* THESE PARTICULAR DEFINITIONS ARE FOR ASCII HOSTS ONLY */
#define tohostc( c )	(c)		/* map ASCII to host char set */
#define tonumber( c )	((c) - '0')	/* convt digit char to number */
#define todigit( n )	((n) + '0')	/* convt digit number to char */

The idea is to use toascii() to map the native input characters to
internal ASCII form, although you then have to do the same to the
C character constants against which the mapped input characters are
to be compared (or else use numerical ASCII codes).  Then on output
one uses tohostc() to map the internal form back to native chars.
Obviously there is non-negligible run-time overhead if the host
character set is not ASCII but something stupid like EBCDIC, but I
am willing to live with this in order to not have to change my source
code when I port it to a non-ASCII machine (just the standard header
needs to be changed).

gam@proper.UUCP (Gordon Moffett) (05/22/84)

#
Virtually ALL the application programs on UTS written in C assume
that ASCII is the base character set.  In fact, many of the
programs you are familiar with on other architectures are just
the same on UTS.  (but -- see below about type ``char'').

The ``virtually'' refers to two cases (that I know of) where EBCDIC
is used: in device drivers for EBCDIC-based devices (like 3270's
(ibm tubes)), and programs that read/write volume lables on tapes
or disks.  The drivers are doing EBCDIC <--> ASCII translations, and the
volume labels are artifacts of an Amdahl-compatable environment.

The applications (and for the most part systems) programmer need
never be aware of EBCDIC on UTS.

Oh, by the way, the type ``char'' is unsigned in UTS/370-architecture,
so for all you people who've been writing:

	char c;
	while ((c = getc()) != EOF) ...

... you have frustrated my work very much ....


UTS is a registered trademark of Amdahl Corporation.