keld@diku.UUCP (Keld J|rn Simonsen) (07/22/85)
<> A while ago there was some discussion in these groups on international UNIX. I missed it due to faults in our news system. (I still wonder why). Here is my two cents worth. The EUUG is also having a Standards Commitee on International UNIX. We are seeing forward to a cooperation with the /usr/group/UK group. There is a meeting on this in connection with the EUUG Copenhagen Conference scheduled to Thursday 12th September 1985. As Leif Samuelson noted, some chars in what you think is ASCII, but in reality is ISO 646-1983, are reserved for national use, namely the twelve (12) chars: #$@[\]^`{|}~ Various European National Standardisation Boards have adopted character representations different from ASCII on (in total) all the abovenamed positions. So these should not be thought of as generally useful for international software, any of these characters will generate weird output at least in one major European area. Yes, we need to be able to have variable names with these characters. ANSI C does not allow this, but it allows a representation of nine of the abovenamed chars in *trigraph* form: ?? is used as a lead-in to define: # [ \ ] ^ { | } ~ ??= ??( ??/ ??) ??' ??< ??! ??> ??- $@` are not used (at the moment) in ANSI C. Personally I do not like the choice of ? as lead-in char as it is graphically quite dominating, maybe .. was better, but the trigraph scheme is quite general and OK to me. If we then could use the national chars in variable names, C could become a quite useful programming language :-)
foust@gumby.UUCP (07/22/85)
> Yes, we need to be able to have variable names with these characters. > ANSI C does not allow this, but it allows a representation of nine of > the abovenamed chars in *trigraph* form: ?? is used as a lead-in to > define: > > # [ \ ] ^ { | } ~ > ??= ??( ??/ ??) ??' ??< ??! ??> ??- > > $@` are not used (at the moment) in ANSI C. > Personally I do not like the choice of ? as lead-in char as it > is graphically quite dominating, maybe .. was better, > but the trigraph scheme is quite general and OK to me. > If we then could use the national chars in variable names, C could > become a quite useful programming language :-) But just think what this would do for an international obfuscated C contest! Anybody want to translate this year's entries? -- ---------- John Foust "I used to be disgusted, but now I'm just amused"
minow@decvax.UUCP (Martin Minow) (07/23/85)
Keld Joern Simonsen suggests, probably with tongue in cheek, that C would be a useful programming languge if only European users could use their full national character set in identifiers. To my knowledge, no commercially available computer language -- including a few developed in Scandinavia such as Algol 60 (for Trask and Besk), Algol-Genius (for the Datasaab machines) and Simula (for Dec PDP10s) permit national letters in variable names, so the marketplace hasn't exactly mandated their inclusion. I would also point out that national replacement character sets are being superseded by the Draft ISO/ANSI/ECMA 8-bit character set called Latin 1. Latin 1 has a unique representation for the national letters of the major European languages and, once the initial problems of going from a seven-bit character set to an eight-bit set have been solved, should prove to be a much simpler representation to deal with for international products. Martin Minow (fil.kand. Stockholms Universitet) decvax!minow
levy@ttrdc.UUCP (Daniel R. Levy) (07/25/85)
What do you do about the punctuation marks [], which are used in C to denote arrays? Wouldn't they come out screwy in some international ASCII dialects? Something like char foo ??(100??) or what? -- ------------------------------- Disclaimer: The views contained herein are | dan levy | yvel nad | my own and are not at all those of my em- | an engihacker @ | ployer, my pets, my plants, my boss, or the | at&t computer systems division | s.a. of any computer upon which I may hack. | skokie, illinois | | "go for it" | Path: ..!ihnp4!ttrdc!levy -------------------------------- or: ..!ihnp4!iheds!ttbcad!levy
trb@masscomp.UUCP (Andy Tannenbaum) (07/25/85)
I don't think it's necessary to hack up languages to allow funny (uhm, international...) characters as variable names. It is important, though, to allow international character sets in the user interface. This is a totally different problem, and it's a problem that Hewlett Packard seems to be addressing. Within the past year, there have been several articles in the HP Journal which address the issues involved in the international software marketplace. For example, Wilson and Shaw, "Designing Software for the International Market," HP Journal Sept 1984 is an overview. There have also been articles which discussed sorting, hyphenation, spelling correction, maintaining multi-language prompt-string databases, date formats, etc. By the way, the HP Journal is free from HP, and often contains interesting and timely information on their products and engineering. It's nice to see a free publication which isn't useless. I think you can get put on the list by mailing to HP Journal 3000 Hanover Street Palo Alto, CA 94304 USA Andy Tannenbaum Masscomp Westford, MA (617) 692-6200 x274
zap@ttds.UUCP (Svante Lindahl) (07/25/85)
["For you, for you, for you, I came for you" -- Bruce Springsteen, "For you"] In article <93@decvax.UUCP> minow@decvax.UUCP (Martin minow) writes: >Keld Joern Simonsen suggests, probably with tongue in cheek, >that C would be a useful programming languge if only European users >could use their full national character set in identifiers. > >To my knowledge, no commercially available computer language -- >including a few developed in Scandinavia such as Algol 60 (for >Trask and Besk), Algol-Genius (for the Datasaab machines) and >Simula (for Dec PDP10s) permit national letters in variable >names, so the marketplace hasn't exactly mandated their inclusion. The PDP-10 Simula compiler does allow the lowercase national characters { (a w/ umlaut, :a), | (o w/ umlaut, :o) and } (a with a circle on top, Oa). >Martin Minow (fil.kand. Stockholms Universitet) >decvax!minow Svante Lindahl (fil.kand. Stockholms Universitet) -- Svante Lindahl, NADA, KTH (Dept of Numerical Analysis and Computer Science at the Royal Institute of Technology) UUCP: {decvax,philabs,seismo}!{mcvax,ukc,unido}!enea!ttds!zap ARPA: mcvax!enea!ttds!zap@seismo.ARPA or Svante_Lindahl_NADA%QZCOM.MAILNET@MIT-MULTICS.ARPA
rjh@ihlpa.UUCP (Randolph J. Herber) (07/31/85)
> >To my knowledge, no commercially available computer language -- > >including a few developed in Scandinavia such as Algol 60 (for > >Trask and Besk), Algol-Genius (for the Datasaab machines) and > >Simula (for Dec PDP10s) permit national letters in variable > >names, so the marketplace hasn't exactly mandated their inclusion. > (a with a circle on top, Oa). > > >Martin Minow (fil.kand. Stockholms Universitet) > >decvax!minow > UUCP: {decvax,philabs,seismo}!{mcvax,ukc,unido}!enea!ttds!zap IBM PL/I does allow three "national alphabet" characters in variable names: $ (dollar sign), @ (at sign), and # (pound or number sign). Randolph J. Herber, Amdahl Senior Systems Engineer, at AT&T Bell Labs, Naperville, IL, 312-979-6553 or 800-843-7467 extension 1075
bilbo.jbrown@ucla-locus.ARPA (Jordan Brown) (10/24/85)
A couple of notes on the message from Erik Fair (ucbvax!fair): Unfortunately, you CAN'T build a good international character set. Some of those silly European countries have the same character in several languages, but sort the character in different places in each language. They also have interesting constructs like characters that sort as two characters, and pairs of characters that sort as single characters. That is, there might be a character @ which sorts as "xy", so that @m sorts right after xylophone and before xyn. Similarly, they sometimes say that the pair ll sorts as a single character; I don't remember where. Character set is not (or should not be) a very basic assumption. Aren't there EBCDIC UNIXes out there? Most of the system is (should be) completely independent of the character set. The only place you should have problems will be programs which make assumptions about arithmetic on characters, or about the range of values characters take on. (Note that C promises that all characters are non-negative (this is not to say that all possible values of a char variable are non-negative, however)) What characters does the kernel (for instance) know and care about? Slash (/), Null (\0), and maybe Dot (.) in the main body of the kernel; a few control characters in the tty drivers. No big deal. There will be work, but it shouldn't be too bad. Much more grunt work is involved in isolating the messages for translation. People writing code commercially should keep this in mind. Keep your messages in a separate module, or better yet in an external file. Try to make the code flexible about exactly how long messages are; the length will vary dramatically when you translate the message, and English is usually the most terse language. Wouldn't it be easier to convince the Europeans to speak English? :-)
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/25/85)
Just a reminder: international != European
fair@ucbarpa.BERKELEY.EDU (Erik E. &) (10/27/85)
The point about the kernel not having much ASCII dependent code is well taken, however, I was thinking (and expounding upon) the whole of UNIX, which most specifically includes the entire lot of ASCII ridden utility programs. For an inaccurate survey (it will grossly underestimate the number of programs that will have to be changed), do something like this: cd /usr/src egrep -l 'include.*ctype.h' *[ch] */*[ch] | wc -l This will give you the number of files that include `ctype.h' which is explicitly ASCII dependent. Someone else said that different European languages (we're ignoring the orient for the moment) use the same glyph or letter in different order, or with completely different meaning, and therefore an international character set can't be done. If that's the case, then each country will end up writing its own version of UNIX based on the national character set (like the French have done, and the Japanese are doing now). The real goal we're shooting for is the international exchange of information. There's nothing stopping the Europeans or the Japanese from building their own computers on incompatible character sets. And they do so. However, it is hard, slow and tedious to translate data from a Japanese computer in (say) kanji, to English/ASCII. I think that we're attacking the wrong problem, however. Instead of attempting the technological solution by teaching computers some large `n' number of languages, we should attack the basal cultural problem by developing and widely teaching a common intermediate language. Esperanto, anyone? Erik E. Fair ucbvax!fair fair@ucbarpa.BERKELEY.EDU
sambo@ukma.UUCP (Father of micro-ln) (10/30/85)
In article <2400@brl-tgr.ARPA> bilbo.jbrown@ucla-locus.ARPA (Jordan Brown) writes: >Unfortunately, you CAN'T build a good international character set. >Some of those silly European countries have the same character in >several languages, but sort the character in different places in each >language. They also have interesting constructs like characters that >sort as two characters, and pairs of characters that sort as single >characters. That is, there might be a character @ which sorts as "xy", >so that @m sorts right after xylophone and before xyn. Similarly, they >sometimes say that the pair ll sorts as a single character; I don't >remember where. > I guess I would like to see some examples of the above. Are you saying that in some language, the order of the letters might be "a b c ...", whereas in some other language, the order might be "a c b ..."? What pair of languages is like this? Also, in which language is some single character considered as two characters? I speak Spanish and some French. Without thinking very much, something like the double "l" (which at least in Honduras is pronounced the same as a "y") would need to be treated as a single character, but written out as two characters. The problem is in capitalizing it. There need to be two forms for the uppercase double "l": "LL" and "Ll". This would mean that there would be two different codes for the uppercase double "l". Again, without thinking very much, this is the same situation as with vowels, since they may have an accent. Disclaimer: I am not an expert on International Unix. -- Samuel A. Figueroa, Dept. of CS, Univ. of KY, Lexington, KY 40506-0027 ARPA: ukma!sambo<@ANL-MCS>, or sambo%ukma.uucp@anl-mcs.arpa, or even anlams!ukma!sambo@ucbvax.arpa UUCP: {ucbvax,unmvax,boulder,oddjob}!anlams!ukma!sambo, or cbosgd!ukma!sambo "Micro-ln is great, if only people would start using it."
bilbo.jbrown@ucla-locus.ARPA (Jordan Brown) (10/30/85)
> gwyn@brl: > international != European true, but European is a subset of international. > ucbvax!fair: > grep 'ctype.h' * > finds ASCII-dependent programs Not true at all. "isalpha", "isupper", and most of the others are explicitly NOT ASCII dependent. They exist to allow independence from ASCII. Sure, they are implemented in an ASCII-dependent way, but if you want to change the charset, all you need to do is change ctype.h and the library routine(s) (if any). In fact, for one of the implementations of ctype.h, all you need to do is to change a table of character types.
piet@mcvax.UUCP (Piet Beertema) (10/30/85)
>Wouldn't it be easier to convince the Europeans to speak English? :-)
Far easier would it be to get all Americans to speak Dutch... :-)
--
Piet Beertema, CWI, Amsterdam
(piet@mcvax.UUCP)
andy@cheviot.uucp (Andy Linton) (10/31/85)
In article <864@mcvax.UUCP> piet@mcvax.UUCP (Piet Beertema) writes: > > >Wouldn't it be easier to convince the Europeans to speak English? :-) >Far easier would it be to get all Americans to speak Dutch... :-) > I agree with piet but.... Wouldn't Gaelic be a better choice - hardly anyone knows any so we all start out equal (I have already started). There are only sixteen characters in the alphabet with two extra symbols (sineadh fada - it looks like the french acute accent and the inclusion of the letter 'h' to indicate aspiration of the preceding letter. We may even be able to reduce the number of bits in a byte! All the Americans who claim Irish or Scottish extraction will have an inherent ability to master this as it is part of their unconscious folk heritage(:-). I don't want my culture (Anglo-Irish) swamped by the American one any more than the rest of the Europeans do. After all 'Live the difference' doesn't have the same ring to it in English as in French. Slainte mhaith, Andy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SENDER : Aindrias Mac Giolla Fhionntain PHONE : +44 632 329233 POST : Computing Lab, University of Newcastle upon Tyne, UK, NE1 7RU ARPA : andy%cheviot.newcastle.ac.uk@ucl-cs.ARPA) JANET : andy@uk.ac.newcastle.cheviot UUCP : <UK>!ukc!cheviot!andy *** Ni fui moran beagan d'aon rud, ach is fui moran beagan ceille. ***
ds@warwick.UUCP (Douglas Spencer) (10/31/85)
Summary: Expires: Sender: Followup-To: Distribution: Keywords: Xpath: warwick snow snow ubu >>Wouldn't it be easier to convince the Europeans to speak English? :-) >Far easier would it be to get all Americans to speak Dutch... :-) That's easier than teaching the Americans to speak *English*.. :-) -- +-------------------------+-------------------------------+ | Douglas Spencer | ..seismo!mcvax!ukc!warwick!ds | | Mathematics Institute +-------------------------------+ | University of Warwick | 'Stop youth! your thinking is | | Coventry | muddy, turbid and confused !' | | CV4 7AL | Who said it, what story ? | | England | please email, big prizes | +-------------------------+-------------------------------+
radzy@calma.UUCP (Tim Radzykewycz) (10/31/85)
In article <2344@ukma.UUCP> sambo@ukma.UUCP (Father of micro-ln) writes: >In article <2400@brl-tgr.ARPA> bilbo.jbrown@ucla-locus.ARPA (Jordan Brown) writes: >>Unfortunately, you CAN'T build a good international character set. >>Some of those silly European countries have the same character in >>several languages, but sort the character in different places in each >>language. They also have interesting constructs like characters that >>sort as two characters, and pairs of characters that sort as single >>characters. That is, there might be a character @ which sorts as "xy", >>so that @m sorts right after xylophone and before xyn. Similarly, they >>sometimes say that the pair ll sorts as a single character; I don't >>remember where. >I guess I would like to see some examples of the above. Are you saying >that in some language, the order of the letters might be "a b c ...", >whereas in some other language, the order might be "a c b ..."? What >pair of languages is like this? Also, in which language is some single >character considered as two characters? Basically, yes. That's the general idea. If you go through your archives for net.nlang for about the last 3 or 4 weeks, you can get about 6 examples of alphabets, at least two of which have "letters out of sequence". One other way of looking at this (let's see how far ahead of myself I can get) is to think of the reasons for the internaltional character set: 1. consistent sorting 2. consistent pred/succ operations 3. no special characters in one language that are printable chars in another Well, reason 2 says we can't have gaps in the letters for *any* language. Reason 3 says languages with smaller alphabets can't use the extra chars. Reason 1 says everything has to be in order. So lets take a look at 3 character sets (english, spanish and german) a b c d e f g h i j k l m n o p q r s t u v w x y z <- english a b c d e f g h i j k l ll m n o p q r s t u v w x y z <- spanish a b c d e f g h i j k l m n o p q r s B t u v w x y z <- german (pardon me if any of this is wrong, but at least it makes the point, even if it *is* wrong.) So the letters (E:m-z,S:ll-z,G:m-z) are all different, and we're still on the latin alphabet (How about cyrillic?). Aside: I strongly recommend that anyone seriously interested in international [issues|unix] read net.nlang. It is not too difficult to cull the garbage from it and read only the relevant articles, such as the ones I mentioned above. Please send flames to /dev/null and discussions to me or the net. >I speak Spanish and some French. Without thinking very much, something >like the double "l" (which at least in Honduras is pronounced the same >as a "y") would need to be treated as a single character, but written >out as two characters. The problem is in capitalizing it. There need >to be two forms for the uppercase double "l": "LL" and "Ll". This would >mean that there would be two different codes for the uppercase double >"l". Again, without thinking very much, this is the same situation as >with vowels, since they may have an accent. I assume this is all an argument to support the original article, however I don't think that was clear the way it was written. -- Tim (radzy) Radzykewycz, The Incredible Radical Cabbage calma!radzy@ucbvax.ARPA {ucbvax,sun,csd-gould}!calma!radzy
radzy@calma.UUCP (Tim Radzykewycz) (11/01/85)
In article <864@mcvax.UUCP> piet@mcvax.UUCP (Piet Beertema) writes: > >>Wouldn't it be easier to convince the Europeans to speak English? :-) >Far easier would it be to get all Americans to speak Dutch... :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Piet Beertema, CWI, Amsterdam > (piet@mcvax.UUCP) True. Look at what attempts to be English. :-) :-) How about: Possible to get all Europeans to speak English, but those Americans would never go for it. :-) :-) Maybe: Why get the Europeans to speak English? We never talk to them anyway. :-} :-} -- Tim (radzy) Radzykewycz, The Incredible Radical Cabbage calma!radzy@ucbvax.ARPA {ucbvax,sun,csd-gould}!calma!radzy
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/01/85)
> > international != European > > true, but European is a subset of international. So is Japanese and Chinese. How are you going to fix that by playing with the ASCII character set?
pete@kvvax4.UUCP (Peter J Story) (11/01/85)
In article <> sambo@ukma.UUCP (Father of micro-ln) writes: >that in some language, the order of the letters might be "a b c ...", >whereas in some other language, the order might be "a c b ..."? >What pair of languages is like this? Norwegian, Swedish which have three extra characters which you can't represent on your terminal but on mine use the ASCII positions {|} depending on the language. In Norwegian it is as given above. Swedish is }{|. And then there are Danish and Finnish, which I don't know offhand. >Also, in which language is some single character considered as two How about the German character that looks like a beta, which is "ss" in the nearest transliteration. Or u with an umlaut diacritical mark, which at least in some historical texts must sort as if it were ue. Unless someone in Germany corrects my too old knowledge. -- Pete Story {decvax,philabs}!mcvax!kvport!kvvax4!pete A/S Kongsberg Vaapenfabrikk, PO Box 25, N3601 Kongsberg, Norway Tel: + 47 3 739644 Tlx: 71491 vaapn n
dave@ecrcvax.UUCP (David Morton) (11/01/85)
Summary: Expires: References: <2400@brl-tgr.ARPA> <mcvax.864> Sender: Reply-To: dave@ecrcvax.UUCP (David Morton) Followup-To: Distribution: Organization: European Computer-Industry Research Centre, Munchen, W. Germany Keywords: > >Wouldn't it be easier to convince the Europeans to speak English? :-) >Far easier would it be to get all Americans to speak Dutch... :-) > You must be joking, they still cannot speak Gringlish properly :- -- Dave Morton Tel. (49) 89 - 92699 - 139 CSNET: dave%ecrcvax.uucp@germany.csnet UUCP: decvax!mcvax!unido!ecrcvax!dave
rcd@opus.UUCP (Dick Dunn) (11/04/85)
>>>Wouldn't it be easier to convince the Europeans to speak English? :-) >>Far easier would it be to get all Americans to speak Dutch... :-) >That's easier than teaching the Americans to speak *English*.. :-) If you guys keep it up, we'll bring the Australians into this just so that we Americans will have someone to pick on... :-) -- Dick Dunn {hao,ucbvax,allegra}!nbires!rcd (303)444-5710 x3086 ...Never attribute to malice what can be adequately explained by stupidity.
zben@umd5.UUCP (11/05/85)
>>>Wouldn't it be easier to convince the Europeans to speak English? :-) >>Far easier would it be to get all Americans to speak Dutch... :-) >That's easier than teaching the Americans to speak *English*.. :-) Why don't we all learn Hebrew - that way if G*d does show up, we will be able to talk to him(?). Does anybody remember the control string to make a Heathkit go into right-to-left mode? -- Ben Cranston ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben zben@umd2.ARPA
zben@umd5.UUCP (11/05/85)
>>>Wouldn't it be easier to convince the Europeans to speak English? :-) >>Far easier would it be to get all Americans to speak Dutch... :-) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >True. Look at what attempts to be English. :-) :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The really interesting thing is that it makes perfect sense to a literate American reader. (I know, roger, null set :-) One feels a sense of oddness while reading them, but the meaning is certainly clear enough. The first would be perfect American English if written in the negative: >>Wouldn't it be far easier to get <random predicate> The second implies to me that the referent is actually alive and attempting to pass itself off somehow! This might be something like the English idiom "on a plane" giving foreign readers a mental picture of riding on the *outside* of the fuselage of the plane, or of asking for "the milk" to mean "give me all the milk in creation". Not to mention "Throw your father down the stairs his hat!" :-) -- Ben Cranston ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben zben@umd2.ARPA
khasin@hcrvx2.UUCP (Khasin Teow) (11/06/85)
this topic should have some interesting issues to discuss, and instead has deterioted into a flurry of smart-alec comments! i can't believe this! this is net.unix, not net.smartcomments. there has been talks about high phone bills for netnews and my company is talking about either cut off the news or reduce the number of news group. if these stupid comments keep pouring in, i can see that there is no way i can support this news group. (flame off) sorry folk, although i could purge all articles under this subject easly, i didn't want to miss some "real" articles on this subject.