jr@inset.UUCP (Jim R Oldroyd) (10/08/85)
One of the things I preceive from discussions on international UNIX, is that many people are merely thinking in terms of enhancing existing software to solve one particular aspect of a much larger multi-faceted problem. I first asked the question: "What do we REALLY want?". Ignore issues of implementation for a moment - consider the situation we find ourselves faced with. It would be useful for me if I could, not only edit a file containing English text, but also intersperse at will text in a different language. Simple. OK, but what if I suggest that the other language is Arabic? And then, I want to go on and combine this with characters in different fonts and point sizes. I also have a printer with a programable character set. I have taken the trouble to ``design'' a character containing the company logo, another with a pointed hand, bullets, pound signs etc. I look at them as ordinary single characters and I have a need to manipulate files containing these characters using my editor, grep, sort and so on. I do not want to learn strange hieroglyphics like \(*p for the Greek character pi, or the sequence \o'\(is_-' to get a British Sterling symbol. These are a few points. To get what I want I will need not only new utilities, but also special hardware such as a bitmapped display terminal. You can see that I need an extremely large and variable character set. It is not possible to construct that set out of existing character sets, nor to expect a final set to remain static. I believe that the time is now ripe for the computer world to take a jump from the traditional viewpoint and realize that users' requirements in these days of networks and typesetters are already far ahead of anything that an enhanced character set can provide. -- ++ Jim R Oldroyd ++ jr@inset.UUCP ++ ..!mcvax!ukc!inset!jr
ncx@cheviot.uucp (Lindsay F. Marshall) (10/09/85)
In article <723@inset.UUCP> jr@inset.UUCP (Jim R Oldroyd) writes: > >I believe that the time is now ripe for the computer world >to take a jump from the traditional viewpoint and realize >that users' requirements in these days of networks and >typesetters are already far ahead of anything that an >enhanced character set can provide. > The mention of typesetters shows the way to go. Instead of a concept of character set we need the printer's concept of 'font'*. Ok, that makes programming a little difficult (!) as you have to know which font you're in so as to understand the significance of the character, but it does allow you to manage user defined symbols in a much cleaner way, and may mean that you can get away without having to extend the number of bits in a character so as to cope with every possible character set in the world as your fonts need only contain the characters that you want them to. ------------------------------------------------------------------------------ Lindsay F. Marshall, Computing Lab., U of Newcastle upon Tyne, Tyne & Wear, UK ARPA : lindsay%cheviot.newcastle.ac.uk@ucl-cs.arpa JANET : lindsay@uk.ac.newcastle.cheviot UUCP : <UK>!ukc!cheviot!lindsay ------------------------------------------------------------------------------- * fount n. A complete assortment of types of one sort, with all that is necessary for printing that kind of letter. - Also (esp. USA) font. [Fr. fonte - fondre - L. fundere, to cast] Chambers Twentieth Century Dictionary 1966 edition.
bill@inset.UUCP (Campbell) (10/09/85)
In article <723@inset.UUCP> jr@inset.UUCP (Jim R Oldroyd) writes: >..... users' requirements in these days of networks and >typesetters are already far ahead of anything that an >enhanced character set can provide. Here is just one example. Almost all sites running UNIX, of any flavour, will have the "standard" date command, plus the "standard" ctime, asctime routines in libc. In other words, they are forced to use American date formats. In the light of recent correspondence in net.general (USA != world), it will come as no surprise that most citizens of the world find the inflexibility of the date routines disappointing, or even offensive. However, it is NOT going to be sufficient to provide fixed length translations of the string portions of ctime output, to give, for example, Lun Sep 16 12:23:05 1985 There are two problems this does not address. Firstly, one chooses a date format to suit a particular purpose. In some countries the layout of a date is required, for some legal purposes, to follow a fixed format, which is very unlikely to be that given by ctime. It may be that local requirements mean the ctime output format has to be changed. A quick survey shows that of the 300-odd tools programs distributed with UNIX, 45 use ctime, and just over half of those depend on the various words in the string being at *fixed* locations. Note that these are just the tools programs, no applications were looked at. There is going to be a porting problem here. Secondly, rather a lot of people are based in areas which do not use the 24 hour GMT clock, let alone the Gregorian calendar. Anyone got any ideas on what internationalisation could do (cheaply !) for them ? Surely, in MCMLXXXV it's not beyond the wit of man. :-)
stephen@dcl-cs.UUCP (Stephen J. Muir) (10/10/85)
In article <725@inset.UUCP> bill@inset.UUCP (Bill Fraser-Campbell) writes: >Almost all sites running UNIX, of any flavour, will have the >"standard" date command, plus the "standard" ctime, asctime routines in libc. >In other words, they are forced to use American date formats. > >There are two problems this does not address. Firstly, one chooses a date >format to suit a particular purpose. In some countries the layout of a >date is required, for some legal purposes, to follow a fixed format, which is >very unlikely to be that given by ctime. It may be that local requirements >mean the ctime output format has to be changed. A quick survey shows that of >the 300-odd tools programs distributed with UNIX, 45 use ctime, and just over >half of those depend on the various words in the string being at *fixed* >locations. Note that these are just the tools programs, no applications >were looked at. There is going to be a porting problem here. > >Secondly, rather a lot of people are based in areas which do not use the >24 hour GMT clock, let alone the Gregorian calendar. Anyone got any ideas >on what internationalisation could do (cheaply !) for them ? Surely, in >MCMLXXXV it's not beyond the wit of man. :-) The answer is quite simple. Put all the date conversion routines in the Kernel code with system calls for user programs to fetch the date in string form or whatever. This way, local changes can be made to the kernel code to accommodate variations, without having to recompile any programs. Of course, this kernel code would run in user mode (where possible) so as not to lock-out other processes. I stress that the kernel *must* still store the time internally in GMT. This is so that, e.g., tar tapes will have the correct time when taken to another system. -- UUCP: ...!seismo!mcvax!ukc!dcl-cs!stephen DARPA: stephen%lancs.comp@ucl-cs | Post: University of Lancaster, JANET: stephen@uk.ac.lancs.comp | Department of Computing, Phone: +44 524 65201 Ext. 4599 | Bailrigg, Lancaster, UK. Project:Alvey ECLIPSE Distribution | LA1 4YR
crs@lanl.ARPA (10/10/85)
> I believe that the time is now ripe for the computer world > to take a jump from the traditional viewpoint and realize > that users' requirements in these days of networks and > typesetters are already far ahead of anything that an > enhanced character set can provide. While we are at it, would it be asking too much for common sense and the needs of touch typists to prevail in keyboard design? The IBM-PC keyboard is well known. That of the VT-220 isn't much better. Who designs these layouts? Have they ever typed? Item: What is the one key that is used to enter *every* single line of text? The return key! Why, then, stick every off-the-wall key you can think of between the home keys and the return key? One key between the home keys and the return key is acceptable; two is too many. I happen to be typing this on a VT-220, on which it is the vertical-bar/back-slash. I don't recall what it is on the IBM-PC but it is no more often used. The vertical-bar and the back-slash are fairly often used in Unix but *not* as often as the return key. Why not put it *outside* the return key? [Or where suggested at the end of the last item.] Item: The caps lock key is usually used infrequently, usually once before typing at least a full word and often *only* once at the beginning of a session to change to all caps for the *entire* session. The control key, on the other hand is used on a key by key basis. That is to say, the control key must be held down for *every* single control character you want to type. Why, then, put the lock key between the home keys and the control key instead of the other way about (VT-220)? Item: The lock key on the VT-220 is a caps lock, not a *shift* lock. Why, then, move the angle brackets to a separate key so that comma and period are *both* unshifted and *shifted* versions of their respective keys? This is done on typewriter keyboards because *typewriters* have *shift* lock, not caps lock. On computer terminals, I can think of no reason not to put the angle brackets at shift comma and shift period. This would eliminate an unnecessary key. [Perhaps this would have been a good place to put vertical bar & back slash.] I'm sure that the designer of this keyboard (and all the others) thought that he or she had good reasons for using this layout. I happen to disagree. Perhaps keyboard designers should all be required to learn touch typing and then should be required to spend many hours typing on a prototype of their creations before being allowed to select a final design. -- All opinions are mine alone... Charlie Sorsby ...!{cmcl2,ihnp4,...}!lanl!crs crs@lanl.arpa
lee@rochester.UUCP (Lee Moore) (10/10/85)
> I believe that the time is now ripe for the computer world > to take a jump from the traditional viewpoint and realize > that users' requirements in these days of networks and > typesetters are already far ahead of anything that an > enhanced character set can provide. > > ++ Jim R Oldroyd I think you may as well trash Unix then. There are enough problems with getting 8 bit characters let alone a universal character set. I think the best you can hope for is an 8-bit ISO character set that will cover Western Europe. If you want see an approach to universality that was done from scratch, check-out the Xerox Star. It essentially uses a 16-bit character set that encodes many national character sets including all of Western Europe, Greek, Russian and Japanese*. Documents can contain any mix of languages. Since Xerox won a Voice of America contract, they have been producing a new alphabet a month. Last month was Amharic, a language in Ethiopia. lee * side note on Japanese... you can't solve all of it. Xerox is following the standards produced by the JIS. -- TCP/IP: lee@rochester.arpa UUCP: {seismo, allegra, decvax, cmcl2, topaz, harvard}!rochester!lee XNS: Lee Moore:CS:Univ Rochester Phone: +1 (716) 275-7747, -5671 Physical: 43 01' 40'' N, 77 37' 49'' W -- 11 months 'till I drop off the face of the earth.
roy@phri.UUCP (Roy Smith) (10/10/85)
Referring to the VT-220: > I'm sure that the designer of this keyboard (and all the others) > thought that he or she had good reasons for using this layout. I > happen to disagree. > > Charlie Sorsby, {cmcl2,ihnp4}!lanl!crs I agree with Charlie. After an infinite variety of keyboard layouts by various manufacturers, DEC had finally come up with a de-facto standard with the VT-100 layout (also on the LA-120's, etc.) So why change it? The return key is off in right field somewhere, the business with the "<>,." keys looses badly, and if you are running emacs, you have to go hunting for the escape key. But then, what does this have to do with internationalization? -- Roy Smith <allegra!phri!roy> System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
mike@erix.UUCP (Mike Williams) (10/11/85)
In article <723@inset.UUCP> jr@inset.UUCP (Jim R Oldroyd) writes: > > >I first asked the question: "What do we REALLY want?". Ignore >issues of implementation for a moment - consider the situation >we find ourselves faced with. > It all depend on for what we want to use UNIX. I use UNIX as a collection of programming tools. However if I wanted to use UNIX for word processing, I might buy a good word processing package. If that's all I wanted to do, I could write a special shell for word processing. I think that these programs themselves could deal with national character sets without disturbing standard UNIX. But really, I wouldn't do it that way at all. I would buy a Mac. Word processor users don't want UNIX type interfaces. Mice and icons and such like are so much better. Of course I could build these on top of UNIX, but why bother when I can buy other systems like SmallTalk with all these things anyway? I suppose that all this rambling is just asking "Is there any point in an international UNIX?". Mike Williams. ...mcvax!enea!erix!mike [PS all these opinions are of course my own and do not represent those of my employer, my wife, my child or my cat.]
tmb@talcott.UUCP (Thomas M. Breuel) (10/11/85)
In article <668@dcl-cs.UUCP>, stephen@dcl-cs.UUCP (Stephen J. Muir) writes: > The answer is quite simple. Put all the date conversion routines in the > Kernel code with system calls for user programs to fetch the date in string > form or whatever. This way, local changes can be made to the kernel code to > accommodate variations, without having to recompile any programs. Of course, > this kernel code would run in user mode (where possible) so as not to lock-ou > other processes. You don't really mean to put date conversion routines into the *kernel*, do you? The 4.2 kernel already is much too big, too unwieldy, and has far too many system calls. I am sure it is the same for other modern versions of UN*X (in fact I know it for certain). The point at which country specific routines are combined with a program is usually linking. This means that you don't have to re-compile, but just to re-link your binaries if you want to generate them for a different country. It seems to me that what you really want is run-time linking for the functionality and shared locked libraries for the efficiency, an addition to UN*X that is even justified on grounds other than this national date format silliness. If you want to see what generality in terms of character sets, dates, string comparisons, &c does to an operating system, just look at the M*cIntosh ROM. It is a mess, and not much is gained by it, since most (American) programs don't use the facilities provided by the operating system anyhow (i.e. string comparisons are, of course, done numerically). Thomas.
flaps@utcs.uucp (Alan J Rosenthal) (10/13/85)
In article <725@inset.UUCP> bill@inset.UUCP (Bill Fraser-Campbell) writes: >However, it is NOT going to be sufficient to provide fixed length translations >of the string portions of ctime output, to give, for example, > > Lun Sep 16 12:23:05 1985 > >There are two problems this does not address. Firstly, one chooses a date >format to suit a particular purpose. In some countries the layout of a >date is required, for some legal purposes, to follow a fixed format, which is >very unlikely to be that given by ctime. It may be that local requirements >mean the ctime output format has to be changed. A quick survey shows that of >the 300-odd tools programs distributed with UNIX, 45 use ctime, and just over >half of those depend on the various words in the string being at *fixed* >locations. Note that these are just the tools programs, no applications >were looked at. There is going to be a porting problem here. How about two routines, a ctime which returns american date format and a locctime which returns local date format, or some such? And also a routine adatetolocdate (ugh) which converts a given date to local date format, what ever that may be? Then any software can use ctime to pick out bits of the date by fixed location, still remaining portable, but date(1) and anything else can use locctime. Furthermore old software which was unfriendly (ie uses only american date format) would still work, and probably be easily patched, perhaps with adatetolocdate. In fact, programs like readnews could convert to local date with this, even though the news article only contains american-style date. I might be missing something important, not being very familiar with this kind of stuff, but it seems like a good idea which would not cause transition problems. Alan J Rosenthal decvax!utzoo!utcs!flaps -- Note: I am not employed by University of Toronto Computer Science Department or Computer Services, or anything else that would come to mind.
inc@fluke.UUCP (Gary Benson) (10/15/85)
> Would it be asking too much for common sense and > the needs of touch typists to prevail in keyboard design? > > ... [ many examples ] ... > > I'm sure that the designer of this keyboard (and all the others) > thought that he or she had good reasons for using this layout. I > happen to disagree. > > Perhaps keyboard designers should all be required to learn touch > typing and then should be required to spend many hours typing on a > prototype of their creations before being allowed to select a final > design. > > Charlie Sorsby *** REPLACE THIS LINE *** Hear, hear! Those who design *anything* should be required to use it! This is paricularly true of human interface items such as keyboards, displays, and "error" messages. A few weeks ago, I was complaining to a programmer about how cumbersome the thing was to use, and was told, "Work with it a while- you'll get used to it." Well I'm getting sick and tired of having to "get used to it". As soon as someone tells me that, a little voice inside me says, "Uh oh. Another schlock job". Keyboards and where you put the different keys are perfect candidates for ergonomists, but somehow the old attitude prevails that says, "The qwerty keyboard is too universally familiar to change". Horse pucky. And horse pucky to every "designer" who never even talks to a person who will be using his product. -- Gary Benson * John Fluke Mfg. Co. * PO Box C9090 * Everett WA * 98206 MS/232-E = = {allegra} {uw-beaver} !fluke!inc = = (206)356-5367 _-_-_-_-_-_-_-_-ascii is our god and unix is his profit-_-_-_-_-_-_-_-_-_-_-_
rcd@opus.UUCP (Dick Dunn) (10/16/85)
> >I believe that the time is now ripe for the computer world > >to take a jump from the traditional viewpoint and realize > >that users' requirements in these days of networks and > >typesetters are already far ahead of anything that an > >enhanced character set can provide. >... > The mention of typesetters shows the way to go. Instead of a concept of > character set we need the printer's concept of 'font'*... It's true that we need to understand the world of typesetting, and also that it gives us some clues about how to proceed from where we are today, but be careful--the concepts of `font' and `character set' are two entirely different ideas. To specify printed material, you specify (among other things) the characters to be printed and the font to be used in printing them. The abstraction `character' is meaningful quite independent of the font used to represent characters. For example, spelling and collating are done without regard to font. Consider ligatures in the sense they are used in typesetting English if you need to sort out your ideas about characters and fonts. A character is some magic abstraction of an atomic entity at some level. A font provides a specific set of physical realizations (concrete notations) for certain characters. HOWEVER, the printer's concept of font (or fount, over there) illustrates the peril of considering character set as a simple, immutable concept. One ordinary font might have 150 or so characters. The total number of characters possible is in the thousands (at least?) What do we use for a character set? If we choose some small (<200) set of common characters, how do we represent the rest? If we attempt to choose some large (>>1000) set of characters, we come up with the questionable idea that 90%+ of our characters will not be representable in any given font we pick! Some sort of hybrid approach seems necessary somehow.(waffle waffle) And we certainly don't want to have to deal with more than one representation of a given character (e.g., for different fonts) when we are only interested in the underlying information rather than the presentation. (The "content" vs. "presentation" distinction is a key one!) -- Dick Dunn {hao,ucbvax,allegra}!nbires!rcd (303)444-5710 x3086 ...Simpler is better.
peter@graffiti.UUCP (Peter da Silva) (10/16/85)
> But really, I wouldn't do it that way at all. I would buy a Mac. Word > processor users don't want UNIX type interfaces. Mice and icons and such > like are so much better. Of course I could build these on top of UNIX, but > why bother when I can buy other systems like SmallTalk with all these things > anyway? Because you can't get SmallTalk (if you ever considered UNIX to be a resource hog, have a look at SmallTalk some time), and the Mac user interface is running on a horrid CP/M-like operating system. UNIX is an excellent base to build all sorts of special-purpose systems: it's small & fast, expert friendly (remember, someone has to write the user friendly interface), and widely available.
seifert@hammer.UUCP (Snoopy) (10/16/85)
In article <960@erix.UUCP> mike@erix.UUCP (Mike Williams) writes: >But really, I wouldn't do it that way at all. I would buy a Mac. Word >processor users don't want UNIX type interfaces. Mice and icons and such >like are so much better. That's one opinion. I wouldn't buy a Mac. I find the mouse and those cutsey-pooh icons a pain to use. Some people find the Mac-style interface easier to use, some find the Unix-style interface easier to use. There's room for both. What was the first thing Unix was used for? TEXT PROCESSING! There are all sorts of nice utilities for dealing with text. One thing I haven't seen that would be nice is to take emacs, add the functionality of ditroff, and use a bit-mapped display that shows a full page of text, just as it will come out of the laser printer. Does something like this exist? As far as character sets go, it would seem that 16 bits (65536 possible characters) should be more than enough. About 9000 for Chinese, and 7000 for Japanese, plus all the European languages, some math and other symbols, and there should be room left over for some simple graphics characters. In fact, 15 bits should be enough, leaving one bit for parity or flagging. Snoopy tektronix!tekecs!doghouse.TEK!snoopy
peter@graffiti.UUCP (Peter da Silva) (10/19/85)
> > Perhaps keyboard designers should all be required to learn touch > > typing and then should be required to spend many hours typing on a > > prototype of their creations before being allowed to select a final > > design. > > > > Charlie Sorsby > > > Hear, hear! Those who design *anything* should be required to use it! This... ...isn't always possible. While I'm often amazed at the junk that other programmers produce & call finished products, I'm embarrased at some of the stuff that I've been forced to release before it's really ready & debugged: I always try to spend at least a couple of days *after* I'm satisfied just looking for the limits. All too often I've been pulled from a project before I've had a chance to do this. Keyboards, now. Let's not start that up again. I'm pretty sure everyone's already been through the "why didn't IBM put a Selectric-style keyboard on the IBM-PC, and why is everyone following that schlock design?" debate too many times...
henry@utzoo.UUCP (Henry Spencer) (10/20/85)
> As far as character sets go, it would seem that 16 bits (65536 > possible characters) should be more than enough. About 9000 > for Chinese, and 7000 for Japanese, plus all the European > languages, some math and other symbols, and there should be > room left over for some simple graphics characters... The trouble with this (and the other similar proposals) is it asks the Western world to pay a factor of 2 in storage overhead for the sake of the Asian character sets. This will never sell. Most of the sites that would be affected will never want to store *anything* written in Japanese or Chinese. Why should they pay double the storage price (and bandwidth price) for the ability to do so? The only reason that the new 8-bit ISO standard isn't going to cause major disruption (except in a few sloppy Unix programs) is that the 8th bit is already there, and largely unused, in existing machines. Solving the problems of the Asian languages is a laudable goal, but I am not convinced that we know how to do it effectively. The new ISO set will be an important step towards solving the problems of the Western languages, and this may be all we can realistically hope for in the short term. "The best is the enemy of the good." -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
seifert@hammer.UUCP (Snoopy) (10/23/85)
In article <6066@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >> As far as character sets go, it would seem that 16 bits (65536 >> possible characters) should be more than enough. About 9000 >> for Chinese, and 7000 for Japanese, plus all the European >> languages, some math and other symbols, and there should be >> room left over for some simple graphics characters... > >The trouble with this (and the other similar proposals) is it asks the >Western world to pay a factor of 2 in storage overhead for the sake of >the Asian character sets. This will never sell. Most of the sites that >would be affected will never want to store *anything* written in Japanese >or Chinese. Why should they pay double the storage price (and bandwidth >price) for the ability to do so? I don't like having to use up more bits per character either, but I can't see any way around it. Non-fixed length characters would be a real mess, as someone pointed out. (Previous to his/her article I had been thinking that this would be a great idea.) For some applications, data compaction could be done. >The only reason that the new 8-bit ISO standard isn't going to cause >major disruption (except in a few sloppy Unix programs) is that the >8th bit is already there, and largely unused, in existing machines. Better than nothing, but it doesn't solve the whole problem. How many times do you want to change the standard? I'd like to see it changed once, correctly, and be done with it. Problem is, that changing the standard character set is going to be a really major change. In addition to changing software, there's all those terminals that need to change. Whatever we do, there is going to be a LONG period of time when we have to deal with *both* standards. Which will likely be a mess. We have to introduce the new without blowing away the old. Maybe the old standard won't go away at all. We seem to survive with both ascii and ebsidic(sp?). With Beta and VHS, with metric and SAE hardware, etc. etc. Also note that memory/disk costs are dropping. Sixteen bit chars are not as outragious sounding as they were a few years ago. (I know, I know, they'll never be as cheap as we'd like.) We definitely need to make an improvement. What we have now is not good enough. A change, any change, is going to be painful. We have a chance to do it right, Let's go for it! -------------------------- Regarding this newsgroup, appariently the new policy is that any group gets killed unless created with the permission of the US net-lords. It appears that Europe isn't allowed to create groups based on the consensus of a conference. -sigh- Whoever's counting, add one vote for net.international, or net.unix.international, or whatever. No, *don't* create a Europe-only group, that's taking a step backwards. -------------------------- Snoopy, waiting for the day I'm forced to buy a Chinese-English dictionary to read my e-mail. (ihnp4 | decvax | allegra | ???) !tektronix!tekecs!doghouse.TEK!snoopy
jr@inset.UUCP (Jim R Oldroyd) (10/23/85)
In article <960@erix.UUCP> mike@erix.UUCP (Mike Williams) writes: >It all depend on for what we want to use UNIX. I use UNIX as a collection of >programming tools. However if I wanted to use UNIX for word processing, I >might buy a good word processing package. This is EXACTLY the point I was making. Mike goes on to say that it is up to each piece of software to interprate the internal codes in whatever way it suits best. But what happens if one wishes to use the same input files for different applications? -- ++ Jim R Oldroyd ++ jr@inset.UUCP ++ ..!mcvax!ukc!inset!jr
dave@enmasse.UUCP (Dave Brownell) (10/25/85)
In article <311@graffiti.UUCP> peter@graffiti.UUCP (Peter da Silva) writes: >> ... Of course I could build these on top of UNIX, but >> why bother when I can buy other systems like SmallTalk with all these things >> anyway? > > Because you can't get SmallTalk (if you ever considered UNIX to be a resource > hog, have a look at SmallTalk some time), and the Mac user interface is > running on a horrid CP/M-like operating system. You CAN TOO get smalltalk on a Mac !!! I've seen it and it actually looks good (thanks Mark!) on a 1 Mb Mac with a hard disk. Though you're right about SmallTalk needing memory -- I wouldn't want to run it on a 512K Mac, or without a hard disk. But it IS available, and from Apple at that. (See discussions on INFO-MAC, and the next SMUG meeting.) -- David Brownell EnMasse Computer Corp ...!{harvard,talcott,genrad}!enmasse!dave
gnu@l5.uucp (John Gilmore) (10/28/85)
In article <6066@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes: > > As far as character sets go, it would seem that 16 bits (65536 > > possible characters) should be more than enough... > > The trouble with this (and the other similar proposals) is it asks the > Western world to pay a factor of 2 in storage overhead for the sake of > the Asian character sets. I think the proposals are that a coding scheme for text be defined which allows 16-bit characters to be escape-coded into an 8-bit text stream. The arguments mostly center on what kind of coding scheme would fit both the needs of few-16-bit-char folks and few-8-bit-char folks without wasting too much storage for either. Internally to an international program, characters would be 16 bits, but stdio routines (printw, fprintw, sscanw, etc) would encode to a bytestream on the way in and out. ("w" for "world" or "wide"). (Hmm, the non-Unix-opsys people have been looking for a way to tell when we Unixoids are reading or writing a text file versus a binary file...now that we propose encoding our own text files, they will have the clue.)
kimcm@diku.UUCP (Kim Christian Madsen) (10/28/85)
In article <1581@hammer.UUCP> tekecs!doghouse.TEK!snoopy writes: >Problem is, that changing the standard character set is going to be >a really major change. In addition to changing software, there's >all those terminals that need to change. Can you imagine a keyboard with 65535 different characters available, *WOUW* (-; Well, I work with keyboard layouts with app. 200 different visible characters available, by pressing ctrl, alt and certain dead keys (like accent keys) to obtain the many characters within a sensible keyboardsize. The major obstacle to bypass is not the terminals (you can just build the character proms big enough!) but the keyboards. If you have all the fancy characters at hand, you bet some will want to use them. Having each key on the keyboard to represent more than 4 different characters is too frustrating (I have enough trouble finding the correct character with only *FOUR* different characters for each key!!!) And I would certainly not be too satisfied with a keyboard which fills all of my desk Yes, there certainly is a need for a International Standard of lettering, I would like to be able to use the correct way of adressing a person in another country with another alphabeth, whether its japanese, french, danish or whatever. But I see no easy way of doing this (if the human interface is going to be friendly). Maybe we shall have to wait for the computer which understands human speech, and then translates the spoken word into the proper characters! However some advance is still possible, if we restrict the characters to those build upon the LATIN characters (ABCDE...etc) we can do it with easy to remember keys, like Olivetti has done with their M24, where you can hit the (dead) key ' and then an e and get an e with an accent aigu. This can be done with all the accents and the like ( ' ` ^ " ~ u v o ) and thereby increase the number of characters a great deal! We might be able to create a full European characterset including the characters used in Eastern Europe. However to use LATIN, japanese, chinese, arabic, hebraian and other characters types simultaneously on the same keyboard isn't going to work well. Kim Chr. Madsen kimcm@diku.uucp
robert@erix.UUCP (Robert Virding) (10/30/85)
In article <224@l5.uucp> gnu@l5.uucp (John Gilmore) writes: >In article <6066@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes: >> > As far as character sets go, it would seem that 16 bits (65536 >> > possible characters) should be more than enough... >> >> The trouble with this (and the other similar proposals) is it asks the >> Western world to pay a factor of 2 in storage overhead for the sake of >> the Asian character sets. The way I see is that *everyone* will have to pay a price to allow "foreign" character sets. And anyway there are more Asians than people with european origins so it seems only just. :-) >I think the proposals are that a coding scheme for text be defined which >allows 16-bit characters to be escape-coded into an 8-bit text stream. >The arguments mostly center on what kind of coding scheme would fit both >the needs of few-16-bit-char folks and few-8-bit-char folks without wasting >too much storage for either. Wow, this sounds like trying to convert ITS-Emacs' 9-bit ascii into 7-bit sequences, but 7 bits worse. Talk about breaking existing programs. Ans who is to say that the *english* alphabet should be in the 8-bit set? >Internally to an international program, characters would be 16 bits, >but stdio routines (printw, fprintw, sscanw, etc) would encode to a >bytestream on the way in and out. ("w" for "world" or "wide"). Does this mean there will be two basically different types of programs that handle text? And will these two worlds be able to communicate with each other through text files? This sounds a little like "we have to accept that the rest of the world may like to use their own language, but <strong expletive> if we english speakers are going to have to change anything their sakes". Robert Virding @ L M Ericsson, Stockholm UUCP: {decvax,philabs,seismo}!mcvax!enea!erix!robert
peter@graffiti.UUCP (Peter da Silva) (10/31/85)
> However to use LATIN, japanese, chinese, arabic, hebraian and other > characters types simultaneously on the same keyboard isn't going to > work well. What you need for the multiple-languages problem is a dynamic keyboard. For instance, you could use an LCD touch-screen for the keyboard & display the currently-selected character set... this would also solve the problem of switching between Qwerty, Dvorak, and some of the more exotic layouts that have been proposed. And for the problem of the large number of glyphs in oriental character sets, I believe there are already systems that address this problem. They provide a katakana keyboard, and after entering a word a selection of kanji characters is displayed for the operator to select the correct one from. (I hope I have the correct terms here). Combining the two ideas, you could have the *keyboard* itself change to one containing the possible kanji for a given word after entering each world. I know there are UNIX sites in Japan. Are there any on this net? -- Name: Peter da Silva Graphic: `-_-' UUCP: ...!shell!{graffiti,baylor}!peter IAEF: ...!kitty!baylor!peter
seifert@hammer.UUCP (Snoopy) (11/03/85)
In article <18@diku.UUCP> kimcm@diku.UUCP (Kim Christian Madsen) writes: > Can you imagine a keyboard with 65535 different characters available, > *WOUW* (-; I'd rather not, actually. One possibility is that the terminal can *display* any character, but the keyboard remains a reasonable size. There could be an optional second keyboard with additional characters. > Maybe we shall have to wait for the computer which understands human > speech, and then translates the spoken word into the proper characters! This is bound to be unsuitable for many environments. Would be nice to have around when it *was* useable, though! Might be especially handy for portable computers, where the keyboard is already the limiting factor for compactness. > However to use LATIN, japanese, chinese, arabic, hebraian and other > characters types simultaneously on the same keyboard isn't going to > work well. I'd like to see a 'standard western keyboard' that had all the characters for English, German, French, Swedish, etc., plus Greek for math/engr and for APL, a few of the common math symbols, and of course copyright and trademark symbols for all the net lawyers. :-) A few simple graphics characters would be nice. This is possible on a keyboard of reasonable size. Soft keys would allow easy access to a 'few' other characters. (The keyboard I'm using now has 24 (!) extra soft keys. Wish I could load them up with alphas and umlauts and such.) These can be downloaded from the host easily enough. If you need another language like Hebrew or Chinese, plug in a second keyboard. (Presumably in, say China, the Chinese keyboard would be the primary, and the western keyboard the optional one.) Is that better? Auf Wiedersehen, Snoopy (ECS RONIN #901) tektronix!tekecs!doghouse.TEK!snoopy
donn@hpfcla.UUCP (11/04/85)
I'm not sure about ANSI, but both ISO and JIS have standards for character font selection. (So does GOST (that's Russia, for those who care), I believe.) Before carrying any discussion further on the issue of character sets, it's probably a good idea to do it in the context of existing standards. These standards do NOT solve all (or anything like all) of the problems, but any proposal inconsistent with them is doomed to fail due to government standards (usually NOT in the US) endorsing the above standards. In particular the ESC character is used in conjunction with SI and SO for a lot of mixed font data. I don't have copies of the relevent standards handy, and I'm not enough of an expert to talk sensibly about the technical issues, but pragmatic reality says that these standards have to be considered. Donn Terry HP Ft. Collins ihnp4!hpfcla!donn (303)226-3800 x2367 P.S. Honeywell (Arizona??) used to print a multi-colored chart of all the character set standards current at the time. It included ASCII/ISO, JIS, GOST, and (gasp) EBCDIC (at least). It summarized all the exceptions, national conventions, and had citations to the relevent standards. Does anyone know if they've kept it up, or if there is an equivalent I could get?
franka@mmintl.UUCP (Frank Adams) (11/04/85)
In article <988@erix.UUCP> robert@erix.UUCP (Robert Virding) writes: >In article <224@l5.uucp> gnu@l5.uucp (John Gilmore) writes: >>I think the proposals are that a coding scheme for text be defined which >>allows 16-bit characters to be escape-coded into an 8-bit text stream. >>The arguments mostly center on what kind of coding scheme would fit both >>the needs of few-16-bit-char folks and few-8-bit-char folks without wasting >>too much storage for either. > >Wow, this sounds like trying to convert ITS-Emacs' 9-bit ascii into >7-bit sequences, but 7 bits worse. Talk about breaking existing >programs. Ans who is to say that the *english* alphabet should be in >the 8-bit set? I think you miss the point here. Certainly the 8-bit code should support the basic Roman alphabet and reasonable extensions to it. This will cover all the European languages except Greek and those using the Cyrillic alphabet. (What to do about those, as well as Arabic, is not obvious.) What is not included is the Japanese and Chinese ideographs, which do not fit in an 8 bit code just by themselves. Doubling the size of all text files is just not a viable option. Let me make a more concrete proposal for a standard (although still pretty vague). One needs an escape character from an 8-bit Acsii code. The obvious choice for this is decimal 255 (hex FF). Following the escape byte would be a byte identifying the function. Functions include: * The following two bytes are a 16-bit character. * Change into 16-bit mode. * Specify the alphabet to be used for subsequent characters (e.g., Greek, Cyrillic, Arabic, etc.) The same two byte sequences can be used as escapes from the 16 bit mode. Thus, if 01 is the function code for the Roman alphabet, the 16 bit "character" FF01 would mean "drop into 8 bit mode, using the Roman alphabet". This would mean two bytes of overhead per file for documents using a different alphabet. I do not think this is an unacceptable overhead. Now, this would leave the default to be the Roman alphabet. This is de facto discriminatory, but the reasons for it are not. The cost of converting to a non-upward compatible format are large. (The cost of converting to an upward compatible format are large enough that it will be a problem.) >This sounds a little like "we >have to accept that the rest of the world may like to use their own >language, but <strong expletive> if we english speakers are going to >have to change anything their sakes". Yeah, it does sound a bit like that. And there are people who feel that way. But there are also good economic reasons for finding an upward compatible solution. And regardless of the reasons, if you don't make it easy for the English speakers to adopt the standard, they won't, and the effort will fail, or at best be much less successful than it could have been for many years. I think success in this endeavor is much more important than keeping to any absolute standards of fairness. (Absolute is a key word in that sentence. Some minimum of fairness is what this is all about.) Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
roy@phri.UUCP (Roy Smith) (11/06/85)
> One needs an escape character from an 8-bit Acsii code. [...] Following > the escape byte would be a byte identifying the function. Functions > include: [...] Specify the alphabet to be used for subsequent characters > (e.g., Greek, Cyrillic, Arabic, etc.) Just to play devil's advocate for a minute, let's say you have a file in greek, with the first couple of bytes being the "locking shift to greek" function. Guess what breaks: Tail -- you can't get the last 10 lines of a file if you don't read the whole file and track the shift commands. Grep -- you're looking for all lines containing pi-iota-gamma; should grep track the shift commands and surround each output line by "locking shift to greek" and "back to English"? If you do it that way, and run "grep ^ < greek1 > greek2", the greek[12] files will not cmp the same because the second will have lots of extraneous shift commands. Do you now need a shift-optimizing filter to put files into canonical form? I'm sure there are more examples, but you get the idea. -- Roy Smith <allegra!phri!roy> System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
craig@dcl-cs.UUCP (Craig Wylie) (11/06/85)
In article <18@diku.UUCP> kimcm@diku.UUCP (Kim Christian Madsen) writes: > Can you imagine a keyboard with 65535 different characters available, > *WOUW* (-; The main problem here is obviously one of size (Awards for statement of the bleeding obvious need not be presented). People seem to want a keyboard that will not only display the required characters on the tops of the keys but they also want it general enough to handle n different character sets. Note that as well as additional characters on a French keyboard compared to a British one, even some of the standard characters are in different places. What we need is a re-configurable keyboard. Of usable size but with all needed charaters displayed, as they appear on the screen, on the tops of the keys. OK. Implant a matrix of LCD on top of each key and display the character that it represents. If a large enough matrix was available then surely most charaters could be displayed. If a set contains more characters than available keys then one key changes between 'pages' of characters. This needs more thought as a language with 5000 charactes would have far too many 'pages' to be useful, but heuristics could probably be devised to help. The internal representation of character sets is a problem to be resolved at another time (stay tuned). Craig.
mikeb@inset.UUCP (Mike Banahan) (11/07/85)
It is worth noting that to provide support for languages with a very large repertoire (``characters'') such as Chinese, it is not common practice to use a particularly large keyboard. The technique normally employed for data entry in such languages is different. Typically it is done by entering a phonetic equivalent of the word that is wanted, using a small number of characters: a phonetic notation for Chinese, using roman characters, is already well established. The terminal has enough intelligence to search its dictionary of characters and to display several alternatives in the large character set which correspond more or less closely to the phonetic input. The user selects the one wanted and carries on. This sounds slow, but as far as I remember it is recognised as being one of the quickest ways of actually inputting ideograms. Anyhow, the upshot is that you can input Chinese using standard keyboards. The terminal display and intelligence has to be upgraded considerably, but then that is pretty simple nowadays. The terminal I'm using now isn't that much less intelligent than our Vax (and a lot less overloaded!). Forget all those pictures in the silly papers of Chinese typewriters with a keyboard the size of a table. -- Mike Banahan, Technical Director, The Instruction Set Ltd. mcvax!ukc!inset!mikeb
greg@hwcs.UUCP (Greg Michaelson) (11/07/85)
> In article <18@diku.UUCP> kimcm@diku.UUCP (Kim Christian Madsen) writes: > > > Can you imagine a keyboard with 65535 different characters available, > > *WOUW* (-; > > I'd rather not, actually. One possibility is that the terminal can > *display* any character, but the keyboard remains a reasonable size. > There could be an optional second keyboard with additional characters. There's a research project ( Southampton Uni ???) which is putting light matrix displays into keys to show the characters the keys currently activate. When a new character set is programmed the key displays get updated. With a character set menu it might be possible to pull down huge character sets in manageable chunks. Maybe an intelligent system could learn/predict when different characters are used in different contexts?
mats@fortune.UUCP (Mats Wichmann) (11/08/85)
Okay, I don't know if anyone has posted this, we seem to be getting things very sporadically here so I may have missed it. However. There is an ISO standard for "code extension techniques" (ISO 2022) which is supposed to address these wonderful issues. It starts from 7-bit ASCII (very important, because they use the 8th bit...). There are two ways to shift character sets: "Single-shift" and "Locking-Shift". Single shift is like you pressing the SHIFT or CONTROL key on your terminal - it has to be done for each character. Locking Shift puts you into a different mode until an unlock sequence comes along. The AT&T internationalization proposal is based on this idea, but uses only single-shift, and basically follows these two rules: 1. If the high-order bit of an 8-bit byte is turned off, the 8-bit sequence comes from an ASCII character set. 2. If the high-order bit is turned on, the 8-bit sequence is non-ASCII and should be interpreted as belonging to one of the three local character sets. The exact character set it belongs to depends on the internal coding method and whether it was preceded by a single-shift character. There will be special "single-shift characters" which signify one or two byte following sequences (the two magic cookies which select this would be "SS2" = 0x8e and "SS3" = 0x8f). The above is a major condensation, and only represents the proposal as I understand it. The reference document is: "Information Processing - ISO 7-bit and 8-bit Coded Character Sets - Code Extension Techniques", ISO 2022-1982(E). I am relatively new to this game, so if anyone has sensible objections to this scheme, I would love to be educated. This sort of suggestion does of course not tackle issues like sorting at all; it merely suggests how to represent the data, not what you can do with it. Mats Wichmann Fortune Systems {ihnp4,hplabs,dual}!fortune!mats
peter@graffiti.UUCP (Peter da Silva) (11/10/85)
> > One needs an escape character from an 8-bit Acsii code. [...] Following > > the escape byte would be a byte identifying the function. Functions > > include: [...] Specify the alphabet to be used for subsequent characters > > (e.g., Greek, Cyrillic, Arabic, etc.) Ascii is really a 7-bit code. Thus if the 8th bit is set then this byte and perhaps the next can be considered escaped info. I don't believe that locking shifts are a good idea, though, since it makes it hard to take an arbitrary lump of text and tell what it means. Since it's been established that there is no way of implementing a general foreign-language sort without table look-up and perhaps more involved heuristics (to handle dutch "ij", for example) anyway, why not do something like this... 0xxxxxxx Normal ASCII 10xxxxxx Foreign ROMAN characters 11xxxxxx xxxxxxxx Kanji or other extended character ...and just stuff all the foreign variants into the 64 extra characters this makes available for the purpose. I know I said something like this before, but nobody seems to have noticed and I am sufficiently egocentric to believe that there is something to it... -- Name: Peter da Silva Graphic: `-_-' UUCP: ...!shell!{graffiti,baylor}!peter IAEF: ...!kitty!baylor!peter
mark@cbosgd.UUCP (Mark Horton) (11/11/85)
The Japanese Kanji character set can be input in the same phonetic way as was described for Chinese. (You type in 2 or 3 Roman letters which phonetically sound like the syllable you want, and it turns into the (unique) Katakana glyph for the syllable you want. You do this for every syllable in the word and then press a special key, and something consults a (big) table and finds all the glyphs that sound like that. It puts up a menu, which often has 2-6 choices, on an extra line at the bottom of the terminal. You pick one and it goes up on the screen. I'm told there are about 60000 Kanji characters, and a few tens of thousands more Chinese characters (I can't remember the exact numbers.) However, a subset that fits in 14 bits is in common use, and they are willing to restrict theirselves to this subset. There are apparently already official standards for encoding Kanji in 16 bits, intermixed with ASCII. It seems that you take the 14 bits and put them in two bytes, each byte with the 8th bit on. Having two consecutive bytes with the parity bit on means it's a Kanji character. A single parity character might have a different international meaning. This doesn't break tail or grep. I don't know what they do if there are two European characters in a row, but I gather there is some standard way of dealing with this. The only mode needed is attached to the keyboard, so it can tell if you're typing in Roman or Katakana. By the way, I've seen several references to a function "printw" with an assumption that this would be a 16 bit printf. I'd like to point out that the name "printw" has already been taken by curses, which is present in both 4BSD and System V. (printw means "print window.") I'm not even convinced that such a function is needed, since the existing standards seem oriented toward streams of 8 bit bytes. I don't think stdio cares whether a character is Kanji or Roman, that's between the application and the terminal. Regular old printf works fine. Mark Horton P.S. While everybody agrees that this group should exist and should be distributed worldwide, but the name "net.internat" is terrible. Let's settle the issue of whether it's to be moderated (I understand we have a volunteer to be the moderator) and then call it either net.international or mod.international.
req@warwick.UUCP (Russell Quin) (11/11/85)
[...] >What we need is a re-configurable keyboard. Of usable size but with all >needed charaters displayed, as they appear on the screen, on the tops >of the keys. [ suggests using LCDs ] [...] >If a set contains more characters than available keys then one key changes >between 'pages' of characters. I have enough problems coping with modes in editors (at lot of software seems to have at least two modes where keys typed are interpreted differently), without having to worry about what mode the *keyboard* is in as well! This sort of information must be duplicated on the screen if it is to be useful at all. In any event, I don't look at both screen and keyboard when typing. Usually just the screen, in fact, unless the terminal is unfamiliar to me (like this one). Another problem -- look at the buttons on your keyboard. Are they clean? Not only do fingers conceal the keytops, but dirt wouldn't help either, as well as the difficulty of getting an adequate connection to the tiny display as it moves up & down. There seem to be several other issues involved. 1 people using differnt alphabets need different sets of characters available. A French keyboard without a cedilla is as useful as a Finnish one with a cedilla but no umlaut. 2 portability -- it isn't helpful if a program uses the grave accent (eg. Bourne Shell) and this happens to print as a Pound Sterling symbol on your device. So it would be good if the same characters always printed in the same way. 3 Big alphabets -- there are already too many charcters to fit onto a sane keyboard, but a big problem comes when there are *many* characters. One possible solution here that has already been mentioned involves using multiple-character names for symbols and having a routine to turn this into/out of an internal representation. The characters would be stored in a homogenous way, so grep-like tools would work. This would help for maths symbols, too. Which leads up to 4 Mixed alphabets: what does grep '[a-deltaC-OMEGA]' file mean? What about grep '[alpha-epsilon ALPHA-EPSILON aleph yod Man-In-House-With-Dog]'? It seems sensible not to define the meaning of ranges of mixed alphabets (eg. [aleph-delta]), so a character's alphabet would have to be obvious from the internal representation. By the time we get this far, we seem to be moving away from a good-old-ASCII- computer-system and towards a cross between a graphics machine and a typesetter! Since presumably not all machines would ever have access to all alphabets, there are huge portability problems. Has anyone built a machine that goes even partway towards addresing these areas? TeX or Troff in the tty driver... [0.5 :-)] Perhaps we would do better to try not to address the huge oriental alphabets in this way at all -- the benefits don't seem wothwhile. >The internal representation of character sets is a problem to be resolved at >another time (stay tuned). > Craig. I feel that the representation is important. A standard will not be useful if it can't be implemented. - Russell -- ... mcvax!ukc!warwick!req (req@warwick.UUCP) ... mcvax!ukc!warwick!frplist (frplist@warwick.UUCP) friend: someone one seems to be able to tolerate at the moment
spw2562@ritcv.UUCP (11/13/85)
In article <422@graffiti.UUCP> peter@graffiti.UUCP (Peter da Silva) writes: > 0xxxxxxx Normal ASCII > 10xxxxxx Foreign ROMAN characters > 11xxxxxx xxxxxxxx Kanji or other extended character >...and just stuff all the foreign variants into the 64 extra characters this >makes available for the purpose. >I am sufficiently egocentric to believe that there is something to it... >Name: Peter da Silva >UUCP: ...!shell!{graffiti,baylor}!peter >IAEF: ...!kitty!baylor!peter This is a good idea - even if you are egocentric 8-). Alternately, use a 16 bit code, as has been mentioned, keeping the lower byte the same as the current ascii standard, but setting bits in the upper byte to indicate using different varients of the base character. This would allow stripping the upper byte of to leave only 8 bits without changing which basic character it is. As for totally unique characters, they could be arbitrarily assigned. If this has been suggested before, my apologies for mentioning it again. I just now started reading this newsgroup. ============================================================================== Steve Wall @ Rochester Institute of Technology USnail: 6675 Crosby Rd, Lockport, NY 14094 Usenet: ...!ritcv!spw2562 Unix 4.2 BSD BITNET: SPW2562@RITVAXC VAX/VMS 4.2 Voice: Yell "Hey Steve!"
franka@mmintl.UUCP (Frank Adams) (11/15/85)
In article <2004@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: > Just to play devil's advocate for a minute, let's say you have a >file in greek, with the first couple of bytes being the "locking shift to >greek" function. Guess what breaks: > > Tail -- you can't get the last 10 lines of a file if you don't read >the whole file and track the shift commands. Yeah, you can't. You can read from the end back to the last (permanent) shift command; anything preceding it is OK. Of course, this frequently means reading the whole file backwards. > Grep -- you're looking for all lines containing pi-iota-gamma; >should grep track the shift commands and surround each output line by >"locking shift to greek" and "back to English"? If you do it that way, and >run "grep ^ < greek1 > greek2", the greek[12] files will not cmp the same >because the second will have lots of extraneous shift commands. Do you now >need a shift-optimizing filter to put files into canonical form? Actually, I would use the sixteen bit format internally. You use a standard routine to read the file and convert to sixteen bit form, and another standard routine to write the file, optimizing the shifts. This takes care of this whole class of problems. > I'm sure there are more examples, but you get the idea. I never said it would be easy. Just easier and more practical than throwing away ASCII entirely, or having each non-ASCII character preceded by an escape sequence. While critical comments such as this are welcome, alternative suggestions would be more so. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
peter@graffiti.UUCP (Peter da Silva) (11/18/85)
> In article <422@graffiti.UUCP> peter@graffiti.UUCP (Peter da Silva) writes: > > 0xxxxxxx Normal ASCII > > 10xxxxxx Foreign ROMAN characters > > 11xxxxxx xxxxxxxx Kanji or other extended character > > Alternately, use a 16 bit code, as has been mentioned, keeping the lower byte > the same as the current ascii standard, but setting bits in the upper byte That would also work, but wouldn't address the problem of storage, which is what I was attempting to address. It's already been mentioned that most people aren't willing to put up with a factor of two increase in the size of their text files just to satisfy the Japanese. The reason for the ethnocentric use of ASCII as the base character set is because most of the worlds computers are in the US... -- Name: Peter da Silva Graphic: `-_-' UUCP: ...!shell!{graffiti,baylor}!peter IAEF: ...!kitty!baylor!peter
henry@utzoo.UUCP (Henry Spencer) (11/19/85)
> ...I have enough problems coping with modes in editors ... > without having to worry about what mode the *keyboard* is in as well! > This sort of information must be duplicated on the screen if it is to be > useful at all... Another problem -- look at the buttons on your keyboard. > Are they clean? Not only do fingers conceal the keytops, but dirt wouldn't > help either, as well as the difficulty of getting an adequate connection to the > tiny display as it moves up & down. Actually, these problems can be solved by a sneaky trick. You put an angled glass plate over the keyboard, in your line of sight to it but high enough that it does not obstruct hand access. Then you put a monitor in the right place so that the image of the monitor face seen in the glass is superimposed on the keyboard. Presto: keytop displays that are dirt-proof and can be seen *through* your fingers. No tricky connection problems either. It's been tried, and it works pretty well. It doesn't solve the problem of wanting to touch-type, though. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry