nishri@utcsstat.UUCP (Alex Nishri) (05/03/84)
Does anyone have any experience or comments about the dependability of programs written in C on the ASCII character representation? Could most programs written in C be run on a different character representation scheme? What about the Unix system itself? (For a completely different scheme consider EBCDIC. The numerics collate after the alphabetics. So 'a' < '1' in EBCDIC. Also EBCDIC has holes in the alphabetic sequence. Thus 'a' + 1 is equal to 'b', but 'i' +1 is not equal to 'j'. In fact 'i' + 8 equals 'j'.) Alex Nishri University of Toronto ... utcsstat!nishri
gwyn@brl-vgr.ARPA (Doug Gwyn ) (05/12/84)
Traditionally C has used the host computer "native" character set (how can a convention be "native"? you ask; yet it really is). However many programs written in C implicitly assume that the character set is ASCII, although the language doesn't guarantee this. I seem to recall that the C Language Standards Committee addressed this question but I don't remember whether they decided that ASCII is the "official" C character set. For my own use in those few cases where the character codes are important, I have the following lines in my standard header file: /* integer (or character) arguments and value: */ /* THESE PARTICULAR DEFINITIONS ARE FOR ASCII HOSTS ONLY */ #define tohostc( c ) (c) /* map ASCII to host char set */ #define tonumber( c ) ((c) - '0') /* convt digit char to number */ #define todigit( n ) ((n) + '0') /* convt digit number to char */ The idea is to use toascii() to map the native input characters to internal ASCII form, although you then have to do the same to the C character constants against which the mapped input characters are to be compared (or else use numerical ASCII codes). Then on output one uses tohostc() to map the internal form back to native chars. Obviously there is non-negligible run-time overhead if the host character set is not ASCII but something stupid like EBCDIC, but I am willing to live with this in order to not have to change my source code when I port it to a non-ASCII machine (just the standard header needs to be changed).
gam@proper.UUCP (Gordon Moffett) (05/22/84)
# Virtually ALL the application programs on UTS written in C assume that ASCII is the base character set. In fact, many of the programs you are familiar with on other architectures are just the same on UTS. (but -- see below about type ``char''). The ``virtually'' refers to two cases (that I know of) where EBCDIC is used: in device drivers for EBCDIC-based devices (like 3270's (ibm tubes)), and programs that read/write volume lables on tapes or disks. The drivers are doing EBCDIC <--> ASCII translations, and the volume labels are artifacts of an Amdahl-compatable environment. The applications (and for the most part systems) programmer need never be aware of EBCDIC on UTS. Oh, by the way, the type ``char'' is unsigned in UTS/370-architecture, so for all you people who've been writing: char c; while ((c = getc()) != EOF) ... ... you have frustrated my work very much .... UTS is a registered trademark of Amdahl Corporation.