[net.internat] International Character

vandome@imag.UUCP (Gerard Vandome) (05/01/86)

I would like to clarify the definition of an "international character".

First, becareful with some words such as :
	character, char, byte, integer, letter, string ...

For example, SVID 1 indicates in GETC(BA_LIB) that :

"the function getc returns the next character (i.e., byte) ... "

although its definition is : int getc(stream).

Secondly, using ISO 646 (US ASCII), no problems arise because of a
correspondance between byte and character. The fact that the result
is an integer and not an unsigned integer (as expected) allows
the test of EOF (generally -1).
Consider the following problem in CONV(BA_LIB) :

int toupper(c) (with int c) called, for example, in ISO 8859/1 with character
c = ll must return Ll in Spanish.

In an international version of UNIX, what should be a "character" ?
	with ISO 8859/1 (latin 1) code
	with CCITT (teletext) code where acharacter may be constituted
			by a diacritical sign followed by a letter
	with JIS 6226 (japanese) code where a character stands on 2 bytes


QUESTIONS:

- What is the size in bytes of a character ?

- Is that question a real question?

- Are double letters such as "ij" in Dutch or "'e" in teletext code 
considered as one character?

- Is an international character a signed or an unsigned character?

I will be pleased to receive yours comments on this topic.

Pascal BEYLS   BULL France   EUNET : mcvax!vmucnam!echbull!xopen