erik@srava.sra.co.jp (Erik M. van der Poel) (04/10/91)
I'm directing followups to comp.std.internat. John Gilmore writes: > And my windows all use ISO Latin 1. If Torbj|rn would send the > umlauted letter in that standardized character set, it would look right > in both the States and in Sweden. Have you ever tried to send yourself a message in Latin-1? Did it work? And even if *you* have a reasonable version of sendmail (one that doesn't strip the 8th bit), what makes you so certain that Torbj|rn's message and anyone else's won't pass through a site that *does* strip the 8th bit? Also, what's so "standardized" about ISO Latin-1? What makes it more standard than, say, Latin-2? - -- Erik M. van der Poel erik@sra.co.jp Software Research Associates, Inc., Tokyo, Japan TEL +81-3-3234-2692
randall@Virginia.EDU (Randall Atkinson) (04/11/91)
John Gilmore originally wrote: % And my windows all use ISO Latin 1. If Torbj|rn would send the % umlauted letter in that standardized character set, it would look right % in both the States and in Sweden. In article <1110@sranha.sra.co.jp>, Erik M. van der Poel <erik@srava.sra.co.jp> responded: >Have you ever tried to send yourself a message in Latin-1? Did it >work? And even if *you* have a reasonable version of sendmail (one >that doesn't strip the 8th bit), what makes you so certain that >Torbj|rn's message and anyone else's won't pass through a site that >*does* strip the 8th bit? It does work for a fair and ever increasing subset of the Internet. BITNET doesn't do very well with it. Clearly we need to move towards 8-bit and 16-bit and 32-bit transparent mail transport mechanisms. Fortunately there are a number of possible transport mechanisms out there to choose from, some of which are already 8-bit transparent. >Also, what's so "standardized" about ISO Latin-1? What makes it more >standard than, say, Latin-2? ISO 8859/1 is NOT any "more standard" than ISO 8859/2, however sites in the US are in fact migrating towards ISO 8859/1 from US ASCII and most sites in the US are NOT migrating towards ISO 8859/2 (though they might support it on the side as vendors begin to). The languages that are most commonly used in the US are in ISO 8859/1 and the languages supported by ISO 8859/2 are less commonly used (again in the US as a whole). Note that ISO Latin-1 is ISO 8859/1 which is the 8-bit character set used for Western European languages. ISO Latin-2 is ISO 8859/2 which is the 8-bit character set for Eastern European languages. Clearly we need to add additional information to the header of mail messages to indicate which character set to use. I'm not sure of the current state of the Internet protocols (RFC 822 et. al.) with respect to this. If there isn't the equivalent of a "Character-set:" header yet, serious consideration should be given to adding one with clearly defined values for at least existing ANSI and ISO character sets. Character sets that should have a defined string to use with such a header field include at least: ASCII ISO 8859/1 ... ISO 8859/N (where N is the last defined set) ISO 10646 (once it gets completed) The Internet is the dominant mail transport network at present, partly because so many other networks gateway with it. Getting the Internet to convert to supporting such needs would be a big step in the right direction. Perhaps someone on the IETF can comment on their current activities in this area ?? Ran Atkinson randall@Virginia.EDU
dlv@cunyvms1.gc.cuny.edu (Dimitri Vulis, CUNY GC Math) (04/12/91)
In article <1991Apr10.172756.4991@murdoch.acc.Virginia.EDU>, randall@Virginia.EDU (Randall Atkinson) writes: > ISO 10646 (once it gets completed) "Unicode" seems both more practical and more realistic. >Ran Atkinson >randall@Virginia.EDU Dimitri Vulis, D&M BITNET: DLV@CUNYVMS1 Internet: DLV@CUNYVMS1.GC.CUNY.EDU Snail: Department of Mathematics/Box 330 City University of New York Graduate Center 33 West 42 Street New York, NY 10036-8099 USA
enag@ifi.uio.no (Erik Naggum) (04/12/91)
In article <1110@sranha.sra.co.jp> erik@srava.sra.co.jp (Erik M. van der Poel) writes: John Gilmore writes: > And my windows all use ISO Latin 1. If Torbj|rn would send the > umlauted letter in that standardized character set, it would look right > in both the States and in Sweden. Have you ever tried to send yourself a message in Latin-1? Did it work? And even if *you* have a reasonable version of sendmail (one that doesn't strip the 8th bit), what makes you so certain that Torbj|rn's message and anyone else's won't pass through a site that *does* strip the 8th bit? Relax, we're working on that. It doesn't really take an 8-bit SMTP data path to get this done, although many think it would be kind of useful. Please don't confuse the transport layer word width (7 bits) with the transported data's word width (e.g. 8 bits). Also, what's so "standardized" about ISO Latin-1? What makes it more standard than, say, Latin-2? I don't think anyone is discussing which is the "more" standardized part of the ISO 8859 family, it's just that ISO 8859-1 has been adopted by more people more places than any other part has, partly because it's better organized (IMO). As an example, using guillemot quotes +like this;, if you get + and ;, you didn't benefit from ISO 8859-1 right now. Maybe in the future. -- I don't need ISO 8859-1 to spell my name. Thanks, mom & dad. -- [Erik Naggum] <enag@ifi.uio.no> Naggum Software, Oslo, Norway <erik@naggum.uu.no>
rja7m@calico.cs.Virginia.EDU (Ran Atkinson) (04/12/91)
UNICODE isn't a sufficient solution as it doesn't fully support (for example) Vietnamese. DIS 10646 is a sufficient solution. I wish it were otherwise, but I have to live in the real world...
eliot@chutney.rtp.dg.com (Topher Eliot) (04/12/91)
In article <1991Apr10.172756.4991@murdoch.acc.Virginia.EDU>, randall@Virginia.EDU (Randall Atkinson) writes: |> In article <1110@sranha.sra.co.jp>, |> Erik M. van der Poel <erik@srava.sra.co.jp> responded: |> >Have you ever tried to send yourself a message in Latin-1? Did it |> >work? And even if *you* have a reasonable version of sendmail (one |> >that doesn't strip the 8th bit), what makes you so certain that |> >Torbj|rn's message and anyone else's won't pass through a site that |> >*does* strip the 8th bit? |> It does work for a fair and ever increasing subset of the Internet. |> BITNET doesn't do very well with it. Clearly we need to move towards |> 8-bit and 16-bit and 32-bit transparent mail transport mechanisms. I expected to see someone else post a more authoritative answer, but since none has been forthcoming, I will venture. The folks who work on such things have been considering the 8-bit, different-codeset issues, as part of a much larger picture of including such things as graphics and other binary information in mail. Since those are harder problems, it means that they won't have solutions all that quickly. There is a mailing list on this subject; if you really need it I can probaly dig out a lead on how to get onto that mailing list. |> Fortunately there are a number of possible transport mechanisms out |> there to choose from, some of which are already 8-bit transparent. Ack! "Fortunately"? There is an ancient curse: "may you live in interesting times". I think it's modern equivalent is "may you have many standards to choose from". -- Topher Eliot Data General DG/UX Internationalization (919) 248-6371 62 T. W. Alexander Dr., Research Triangle Park, NC 27709 eliot@dg-rtp.dg.com {backbone}!mcnc!rti!dg-rtp!eliot Obviously, I speak for myself, not for DG.
dlv@cunyvms1.gc.cuny.edu (Dimitri Vulis, CUNY GC Math) (04/14/91)
In article <ENAG.91Apr12040930@maud.ifi.uio.no>, enag@ifi.uio.no (Erik Naggum) writes: >As an example, using guillemot quotes +like this;, if you get + and ;, 'Guillemet'. This word was misspelled by some jerk from Adobe, and now no one knows how to spell is right. :) Dimitri Vulis, D&M BITNET: DLV@CUNYVMS1 Internet: DLV@CUNYVMS1.GC.CUNY.EDU Snail: Department of Mathematics/Box 330 City University of New York Graduate Center 33 West 42 Street New York, NY 10036-8099 USA
dlv@cunyvms1.gc.cuny.edu (Dimitri Vulis, CUNY GC Math) (04/14/91)
In article <1991Apr12.123302.17817@murdoch.acc.Virginia.EDU>, rja7m@calico.cs.Virginia.EDU (Ran Atkinson) writes: >UNICODE isn't a sufficient solution as it doesn't fully support (for >example) Vietnamese. DIS 10646 is a sufficient solution. *NOT TRUE* Unicode supports Vietnamese. You either don't know what you're taling about or you're lying. On the other hand, Cyrillic support in 10646 totally sucks, and Unicode got it right. Our proposed comments on what's wrong with the Cyrillic in 10646 are 12 pages long. :) > >I wish it were otherwise, but I have to live in the real world... Well, it's certainly easier to get a copy of Unicode than of 10646 to see for oneself what's in it and what's not... Dimitri Vulis, D&M BITNET: DLV@CUNYVMS1 Internet: DLV@CUNYVMS1.GC.CUNY.EDU Snail: Department of Mathematics/Box 330 City University of New York Graduate Center 33 West 42 Street New York, NY 10036-8099 USA
amanda@visix.com (Amanda Walker) (04/16/91)
rja7m@calico.cs.Virginia.EDU (Ran Atkinson) writes:
UNICODE isn't a sufficient solution as it doesn't fully support (for
example) Vietnamese. DIS 10646 is a sufficient solution.
Yup. Unfortunately, I suspect that Unicode is doomed to popularity,
thanks to support from major U.S. manufacturers (IBM, Apple, etc.).
--
Amanda Walker amanda@visix.com
Visix Software Inc. ...!uunet!visix!amanda
--
I am the Imp of the Perverse (knowing this won't help you, either).
enag@ifi.uio.no (Erik Naggum) (04/18/91)
Dimitri, I'm sure we all need your angry comments to help us reach a consensus in this admittedly delicate matter. Unicode is not the answer. ISO DIS 10646 is not the answer. As I see it, it's too early for an answer. However, we need a reference standard, just like we needed a reference standard for the Latin script, and that's ISO 6937-2. I think ISO DIS 10646 will make a great reference standard due to its clean design, but will not make it into production use due to its currently lacking implementation of that design, and it may never see widespread use except as an interchange standard. I don't think that's bad at all. Let's try to fix the problem, not try to hide behind the pretention that the _other_ party's problems are sufficiently bigger than ours. And please remember that we're dealing with international politics and diplomacy in ISO DIS 10646, while Unicode is pretty much free of that. This is a problem we can't fix. -- [Erik Naggum] <enag@ifi.uio.no> Naggum Software, Oslo, Norway <erik@naggum.uu.no>
gwb@crosfield.co.uk (George Battrick) (04/19/91)
In article <1991Apr14.024739.3042@timessqr.gc.cuny.edu> dlv@cunyvms1.gc.cuny.edu writes: "In article <ENAG.91Apr12040930@maud.ifi.uio.no>, enag@ifi.uio.no (Erik Naggum) writes: ">As an example, using guillemot quotes +like this;, if you get + and ;, "'Guillemet'. This word was misspelled by some jerk from Adobe, and now "no one knows how to spell is right. :) " This is more amusing than I had realised. The <<French>> quotation marks, as approximated on the line above, are indeed called "guillemets": the pronunciation is approximately "gee-uh-may" (hard "g" as in "get"). But there *is* a word "guillemot". It's pronounced "gilly-mott" (again a hard "g"), and it's a sea-bird of the awk family. [No: nothing to do with the Unix "awk" :-) ] -- George Battrick Crosfield Electronics Ltd Hemel Hempstead HP2 7RH U.K. gwb@cel.uucp -or- gwb@crosfield.co.uk -or- ...!{mcsun,ukc,uunet}!cel!gwb phone: +44 442 230000 ext 3638 fax: +44 442 232301 telex: 827530 CROSEL G #include <disclaimer.std> "Remember, George: this is no time to go wobbly!"
enag@ifi.uio.no (Erik Naggum) (04/22/91)
In article <9504@sun101.crosfield.co.uk> gwb@crosfield.co.uk (George Battrick) writes: In article <1991Apr14.024739.3042@timessqr.gc.cuny.edu> dlv@cunyvms1.gc.cuny.edu writes: >In article <ENAG.91Apr12040930@maud.ifi.uio.no>, enag@ifi.uio.no (Erik Naggum) writes: >>As an example, using guillemot quotes +like this;, if you get + and ;, >'Guillemet'. This word was misspelled by some jerk from Adobe, and now >no one knows how to spell is right. :) This is more amusing than I had realised. The <<French>> quotation marks, as approximated on the line above, are indeed called "guillemets": the pronunciation is approximately "gee-uh-may" (hard "g" as in "get"). But there *is* a word "guillemot". It's pronounced "gilly-mott" (again a hard "g"), and it's a sea-bird of the awk family. I looked this up in my French-English dictionary, and the stupid Brit who "translated" it managed to make it into "inverted commas". Geez. Guillemot is indeed a bird, also in English, according to the same dictionary. Then again, Norwegians call `"' "goose-eyes", so one more bird didn't look overly strange to me. :-) -- [Erik Naggum] <enag@ifi.uio.no> Naggum Software, Oslo, Norway <erik@naggum.uu.no>
daniels@parc.xerox.com (Andy Daniels) (04/25/91)
In article <1991Apr12.123302.17817@murdoch.acc.Virginia.EDU> rja7m@calico.cs.Virginia.EDU (Ran Atkinson) writes: >UNICODE isn't a sufficient solution as it doesn't fully support (for >example) Vietnamese. DIS 10646 is a sufficient solution. > Sufficient for you, perhaps, but not for me. By your criteria, DIS 10646 doesn't support Rhade, a close neighbor of Vietnamese, nor does it support Navajo. Moving away from Latin, where's Tamil? where's Tibetan? If you're looking for your favorite combination of Latin character + applied accents as a single character in Unicode, you've missed the point. Just about all such combination that you find in there are included as a result of political compromise. The "pure Unicode" approach is that if you want, for instance, 'A' with a circumflex and underdot, you emit exactly those three characters - it's up to your rendering software to display the composite glyph correctly. (That's funny, I seem to have described a Vietnamese character that Unicode "doesn't support.") One can argue endlessly (in fact, people do) about just which set of characters are "base" letters and which are applied marks, but the situation in the real world is that new characters are formed in the Latin (and to some extent) Cyrillic scripts by putting random marks on other characters that already exist. If you try to enumerate all of the legal possibilities, you're bound to have somebody come up to you the day after you've sent your standard to the publisher and tell you, "but you don't have e-diaresis-rude." You can try to include optimizations for your favorite set, but you will then invariably offend the people who use the first ones you've left out. -- Andy. --