gnu@l5.uucp (John Gilmore) (10/14/85)
The issue of the 8th bit is not the real problem. It's clear that all the programs that hack the 8th bit will have to be rewritten. The ideal objective is for the same binaries to run anywhere in the world, in any font or language or currency or date/time format. [For now, let's not get off into currency/date/time conversions, and just talk about character set representation issues.] What will cause a LOT of grief is fitting the large Asian character sets in. I saw a memo purported to come from somewhere in AT&T that seemed to be a mix of realism and brain damage. Some of the brain damage included: * a "long char" data type for C -- haven't they ever heard of "short"? * No "locking" select-character-set codes embedded in data streams (like what you'd send to a terminal to enter the "graphics character set"). Instead, they had two different ways to encode extended character sets (beyond 8-bit), and a bit OUTSIDE THE DATA STREAM (eg in the inode of a disk file) that said which format a file was in. The two formats were for places where 8-bit or >8-bit character sets were the norm. I don't think either of those is a viable idea, but I'm not sure that a single representation will suffice UNLESS there are locking character set selections (so the first few bytes of your file would describe its default character sets, if strange). Once you open that can, various other worms come out, like making sure those specs get propagated when you cut and paste in an editor, etc. It's quite a job when you realize that unless ALL the Unix utilities process Asian characters as characters, the system will lose. Any volunteers to hack grep for 16-bit characters encoded in an 8-bit data stream with case shifts? Of course stdio would be modified to encode and decode the extended character set, and that will do much of the work for us. Maybe that should be our first research project -- a public domain stdio that defines a standard programming interface to 16-bit characters and a standard datastream representation for them.
tmb@talcott.UUCP (Thomas M. Breuel) (10/15/85)
In article <191@l5.uucp>, gnu@l5.uucp (John Gilmore) writes: > What will cause a LOT of grief is fitting the large Asian character > sets in. I saw a memo purported to come from somewhere in AT&T that [...] > It's quite a job when you realize that unless ALL the Unix utilities > process Asian characters as characters, the system will lose. Any What do you mean? Most UN*X utilities are programming utilities, and nobody is going to program in Chinese characters. And the demands of Chinese and Japanese word processing are so utterly different that a completely new kind of user interface and a completely new set of utilities is needed anyhow (sort, grep, &c don't really make sense with Kanji or are extremely tricky to do. And how do you propose does the shell deal with Kanji? And should file names be allowed to have Chinese characters in them???). As I see it, the most straightforward solution to the 'internationalisation' problem' is to leave the programming and system utilities alone (that also means not to put vertical bars into your logname...) and to provide special purpose word-processors for word-processing in your favourite natural language. This need not be a complete departure from the nroff/troff style word processing (to which I am, incidentally, very attached), but could be some extension to your favourite editor that deals with translating nroff escape sequences back and forth from your terminal's representation. Even if you managed to cook up an operating system which were capable of dealing with all kinds of Asian characters at all levels, nobody in the western hemisphere would want to, or even could, run it. In addition, it would still have to be able to communicate with all these old fashioned things like ARPA, BITNET, System V, VT100's &c. Thomas.
edwards@uwmacc.UUCP (mark edwards) (10/15/85)
In article <527@talcott.UUCP> tmb@talcott.UUCP (Thomas M. Breuel) writes: >In article <191@l5.uucp>, gnu@l5.uucp (John Gilmore) writes: >> What will cause a LOT of grief is fitting the large Asian character >> sets in. I saw a memo purported to come from somewhere in AT&T that >[...] >> It's quite a job when you realize that unless ALL the Unix utilities >> process Asian characters as characters, the system will lose. Any > >What do you mean? Most UN*X utilities are programming utilities, >and nobody is going to program in Chinese characters. And the demands of >Chinese and Japanese word processing are so utterly different that >a completely new kind of user interface and a completely new set of >utilities is needed anyhow (sort, grep, &c don't really make sense with >Kanji or are extremely tricky to do. And how do you propose does the >shell deal with Kanji? And should file names be allowed to have Chinese >characters in them???). > >As I see it, the most straightforward solution to the 'internationalisation' >problem' is to leave the programming and system utilities alone (that >also means not to put vertical bars into your logname...) and to provide >special purpose word-processors for word-processing in your favourite >natural language. > >Even if you managed to cook up an operating system which were capable >of dealing with all kinds of Asian characters at all levels, nobody in the >western hemisphere would want to, or even could, run it. In addition, >it would still have to be able to communicate with all these old >fashioned things like ARPA, BITNET, System V, VT100's &c. > > Thomas. It seems to me that this is just the problem. Look at our Big Automobile companies. A few years ago with the fabricated oil shortages the JAPANESE were the only ones to see the value in small cars. Now they have a good percentage of OUR (the U.S.) market. Look at the stereo market, the TV, VCR, CAMERA, ETC.... There are more JAPANESE and CHINESE then all those who natively speak English. This attitude will continue the ORIENTAL invasion of our markets. I agree finding solutions to the CHINESE character sets is a very difficult problem. But stopping the ORIENTAL ( JAPANESE, KOREAN, TAIWANESE, CHINESE and others that might exist) invasion in native markets should transcend typical thinking approaches. We the computer people have the ability to do this. Has anyone seen a JAPANESE word processor in action? They have 4 types of characters: Kanji (the chinese characters), KANA ( KATAKANA and HIRA- GANA alphabets of sorts), and they use a fair amount of English (and other foriegn words) words in their texts. Some of the important research for upcoming computer generations has to do with NATURAL LANGAUGE. Should we just lay down and pass the wealth of our future generations over to the EAST. Another question: Don't these Unix Utilities output messages of some sort in text- natural language. Would you have the rest of the world learn and use English just because, WE the Americans (and other Western countries) are so narrow minded that we will not consider other usages of characters in our computers. COME ON !! This is net.international!! After all, we Computer Scientists take the difficult problems, define them and come up with viable solutions. Lets not pass off difficult problems by just ignoring them. I can assure you the JAPANESE will not because they can't. MARK ********************************************************************** These views are solely my own and possibly reflect no one elses. --- When given the choice of two evils, I always try the one I haven't tryed before. -- MAE WEST
long@ittatc.ATC.ITT.UUCP (H. Morrow Long [Systems Center]) (10/17/85)
> other foriegn words) words in their texts. Some of the important research > for upcoming computer generations has to do with NATURAL LANGAUGE. Should ^^^^^^^^^ > we just lay down and pass the wealth of our future generations over to the > EAST. > I agreed with the message of this article but NATURAL LANGUAGE should begin at home. -- H. Morrow Long ITT-ATC Systems Center, 1 Research Drive Shelton, CT 06484 Phone #: (203)-929-7341 x. 634 path = {allegra bunker ctcgrafx dcdvaxb dcdwest ucbvax!decvax duke ittral milford mit-eddie psuvax1 qumix sii supai tmmnet yale}!ittatc!long
tmb@talcott.UUCP (Thomas M. Breuel) (10/17/85)
In article <1558@uwmacc.UUCP>, edwards@uwmacc.UUCP (mark edwards) writes: > This attitude will continue the ORIENTAL invasion of our markets. I > agree finding solutions to the CHINESE character sets is a very difficult > problem. But stopping the ORIENTAL ( JAPANESE, KOREAN, TAIWANESE, CHINESE > and others that might exist) invasion in native markets should transcend > typical thinking approaches. We the computer people have the ability > to do this. I am not sure I understand how incorporating foreign (oriental) character sets into an operating system can help stop 'the oriental invasion' (if such a thing exists at all). Why don't you elaborate. > Another question: Don't these Unix Utilities output messages of some > sort in text- natural language. Would you have the rest of the world > learn and use English just because, WE the Americans (and other Western > countries) are so narrow minded that we will not consider other usages > of characters in our computers. COME ON !! This is net.international!! The rest of the world is learning and using English. Personally, I think English is far from ideal for a universal language, but it was established by historical accident and not concious choice. Any attempt at 'internationalising' UN*X is pretty much doomed to fail. Likewise, any attempt at 'internationlising' programming environments is doomed to fail. Symbols and identifiers in programming languages are ususally mnemonically chosen words or abbreviations. In UN*X, the name of a user program is at the same time an identifier in other programs (shell scripts), and its output serves both as a user interface and as input for other programs. This is one of the main strenghts of the UN*X way of operating system architecture (what? you mean it was designed???). The only way to provide a user-friendly, nationalised interface in UN*X is to write something which translates between the UN*X names and identifiers and the language the user understands. From personal experience, I can tell you, though, that most foreigners prefer not to use such interfaces. > After all, we Computer Scientists take the difficult problems, define > them and come up with viable solutions. Lets not pass off difficult > problems by just ignoring them. I can assure you the JAPANESE will not > because they can't. The Japanese will not ignore the problem of how to represent their language on their computers because they have to solve this problem for their own good. If you are really into selling computers into the Japanese market, then you should also concern yourself with this problem. If you want to make competitive products for the American market, you had better ignore it. The Chinese writing system is a very special problem (for computers, not for people) and demands a very special solution. Thomas.
gnu@l5.uucp (John Gilmore) (10/18/85)
In article <527@talcott.UUCP>, tmb@talcott.UUCP (Thomas M. Breuel) writes: > What do you mean? Most UN*X utilities are programming utilities, > and nobody is going to program in Chinese characters. :-) Of course nobody in China would ever program in Chinese. They'd just learn English because it's the natural language for talking to computers. > utilities is needed anyhow (sort, grep, &c don't really make sense with > Kanji or are extremely tricky to do. I don't claim to know how to do them, I just claim that in Japan people will want to grep their text files, the same way we do. And they certainly do have a sorting order (if not more than one), as we do. > And should file names be allowed to have Chinese > characters in them??? Of course file names should have Chinese characters. Why deny the essential benefit of a file system (a way to organize data with names) to the people who happen to speak and write in Chinese? The file system code really doesn't care what those bytes of name MEAN, it just remembers name<->data correspondences. (Certainly the code that implements the file system and its utilities is currently making some assumptions about file names, and those will need changing. That's what this group is for!) > As I see it, the most straightforward solution to the 'internationalisation' > problem' is to leave the programming and system utilities alone (that > also means not to put vertical bars into your logname...) Given the choice of buying a system that lets me use my *name* as my login name, or one that forbids it, other things equal I know which one I will buy...or design, build and sell. > and to provide > special purpose word-processors for word-processing in your favourite > natural language. These already exist and are not the subject of this newsgroup. We're talking about extending all the benefits of Unix (I presume you think Unix is a nice environment to work and play in, yes?) to people who speak and write differently than you do in Murray Hill. The local-language word processor problem is pretty well licked, though some of the solutions (eg Japanese) still cost many yen more than an American word processor.
tmb@talcott.UUCP (Thomas M. Breuel) (10/18/85)
In article <198@l5.uucp>, gnu@l5.uucp (John Gilmore) writes: > :-) Of course nobody in China would ever program in Chinese. They'd just > learn English because it's the natural language for talking to computers. I think that our design of programming languages has been influenced strongly by our natural languages. The question of lexical analysis doesn't really make sense in Chinese, for example. Of course you could design a programming language that uses Chinese characters as its terminal symbols. Given how primitive the vocabulary of programming languages is, and how unrelated the 'mnemonic names' are to the real-life meanings of the words, it is hardly worth it, though. > These [special purpose word processors] already exist > are are not the subject of this newsgroup. We're > talking about extending all the benefits of Unix (I presume you think > Unix is a nice environment to work and play in, yes?) to people who > speak and write differently than you do in Murray Hill. As I posted before, I think it is impossible to implement an internationl user interface at a low level in UN*X because one of the strengths of UN*X is that the user interface is identical with an interactive programming language. If you change the user interface (i.e. change the name of 'grep' to something else, all shell scripts using 'grep' will break). Remember what trouble it caused when someone decided that the extra space in 'date' output was ugly and did away with it? This reliance upon fixed ouput formats, fixed names, &c is not bad programming, but a logical consequence of the UN*X philosophy. Now, that doesn't mean that you couldn't cook up a shell for UN*X that encodes Chinese characters in ASCII (for file names) and translates between Japanese commands and 'English' commands, But once you change anything lower level than that, like the names of system utilities or the output format of almost any program in UN*X, you can forget about sharing software. Altogether: -- I doubt that a hybrid system that understands all character sets, all string orders, all national date and time conventions, &c. has a chance in the west, because of the overhead and cost involved. Maybe a system that can handle just all Roman character set based languages has a chance, although I even doubt that... -- File exchange, program exchange, networking, or any other kind of communication between machines with different character sets is a nightmare and very likely not to work. Just the difference in byte order between the VAX and the PDP is causing lots of grief already. Conclusions: -- Before you start screwing around with UN*X, please make a backup copy so that there is still something working around when you are done (anyone have a copy of 4.1 :-). -- Of course, in an ideal world, everybody could sit down at his terminal, type to the computer in his natural language, and the computer would automatically do the rest. Now, I am not opposed to that idea, I would just like to hear more reasonable proposals of how to do it. And, honestly, I don't think that you can begin by hacking namei or by starting to put funny characters into your logname. If you are realistic, you have to come up with something that works on top of existing operating systems (shudder), if you are revolutionary, you have to present a completely new concept, but you can hardly call it UN*X anymore, as most of what makes UN*X a fast and efficient system to work with is intimately related to its data structure: the ASCII text file, composed of English alphabetic characters. Thomas.
ellis@spar.UUCP (Michael Ellis) (10/18/85)
>It seems to me that this is just the problem. Look at our Big Automobile >companies. A few years ago with the fabricated oil shortages the >JAPANESE were the only ones to see the value in small cars. That viewpoint overlooks our own stupidity. We are to blame for our arrogant assumption that things would continue to favor the `American way' -- wasteful overconsumption and contemptuous misappraisal of foreign, especially noneuropean, nations. Overbearing complacence is the deadliest symptom of the disease called being #1. >Now they have a good percentage of OUR (the U.S.) market. Look at the stereo >market, the TV, VCR, CAMERA, ETC.... There are more JAPANESE and CHINESE >then all those who natively speak English. This attitude will continue the >ORIENTAL invasion of our markets. Good for them! It only goes to show that the capitalist system might really work. The Chinese and Japanese are eager to understand our culture. How many Americans even care to learn the languages of our honorable competitors? Hubris provokes nemesis. -michael
minow@decvax.UUCP (Martin Minow) (10/19/85)
Digital has sold a Japanese-language version of Ultrix (Unix) in Japan for some time now. It uses the VT80 Kanji terminal to display English, Katakana (syllabic) or Kanji. There are also Japanese-language versions of other Dec operating systems. IBM sells a Japanese-language version of the IBM-PC. The VT80 terminal contains a built-in ROM with the most popular Kanji representations. It also has a RAM that can be down-line loaded with other representations. When the VT80 receives an "unknown" Kanji, it sends an escape sequence to the host computer -- which is interpreted by the terminal subsystem -- requesting a display representation of that character. Martin Minow decvax!minow
eugene@ames.UUCP (Eugene Miya) (10/23/85)
> > >Now they have a good percentage of OUR (the U.S.) market. Look at the stereo > >market, the TV, VCR, CAMERA, ETC.... There are more JAPANESE and CHINESE > >then all those who natively speak English. This attitude will continue the > >ORIENTAL invasion of our markets. > Permit me to make a comment about this statement. This does not deal with internationalization directly, but it does deal with discussing this topic. Fortunately, our site had a copy of the parent article, because the above can easily be misconstrued. I am talking about the use of loaded words like "ORIENTAL invasion." I realize the speaker has quite a respect, but his words can easily be taken out of context as his entire text was not quoted. Recently, I was returning from a meeting in Montreal and behind me sat two members from a major military weapons facility (to go unnamed) who attended the same meeting. They were discussing the supercomputer race with a non-meeting passenger on the plane talking about how if the US did not keep up with the Japanese, that US industry/defense, everything would be at the mercy (too kind a word for genitals) of the Japanese. At which point, I turned around to join an what I thought would be interesting discussion. The speaker promptly shut up. If you are going to discuss invasions, let us not forget that WWII hysteria drove my relatives into detention centers and created witch hunts in the 1950s. Do we begin by turning the US into a totalitarian state because of economic hardship? Perhaps, I should quit working for the US government because I might be suspected of being an economic spy. Perhaps, you want all those descended from the Far East to jump off cliffs to remove all doubt? I rarely like to think about "the color of my skin," but recent protectionist attitudes has my guard up. I don't regard my comments as a flame, but rather a defense of civil liberties. To repeat, I don't regard the above as a personal attack, but everyone discussing internationalisation had best put a good foot forward (or I should say hand) as this group is read around the world. --eugene miya NASA Ames Research Center {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene emiya@ames-vmsb