[comp.protocols.tcp-ip] universality of Latin-1

randall@Virginia.EDU (Randall Atkinson) (04/11/91)

John Gilmore originally wrote:
% And my windows all use ISO Latin 1.  If Torbj|rn would send the
% umlauted letter in that standardized character set, it would look right
% in both the States and in Sweden.

In article <1110@sranha.sra.co.jp>, 
	Erik M. van der Poel <erik@srava.sra.co.jp> responded:

>Have you ever tried to send yourself a message in Latin-1? Did it
>work? And even if *you* have a reasonable version of sendmail (one
>that doesn't strip the 8th bit), what makes you so certain that
>Torbj|rn's message and anyone else's won't pass through a site that
>*does* strip the 8th bit?

It does work for a fair and ever increasing subset of the Internet.
BITNET doesn't do very well with it.  Clearly we need to move towards
8-bit and 16-bit and 32-bit transparent mail transport mechanisms.
Fortunately there are a number of possible transport mechanisms out
there to choose from, some of which are already 8-bit transparent.

>Also, what's so "standardized" about ISO Latin-1? What makes it more
>standard than, say, Latin-2?

ISO 8859/1 is NOT any "more standard" than ISO 8859/2, however sites
in the US are in fact migrating towards ISO 8859/1 from US ASCII and
most sites in the US are NOT migrating towards ISO 8859/2 (though they
might support it on the side as vendors begin to).  The languages that
are most commonly used in the US are in ISO 8859/1 and the languages
supported by ISO 8859/2 are less commonly used (again in the US as a
whole).  

Note that ISO Latin-1 is ISO 8859/1 which is the 8-bit character set
used for Western European languages.  ISO Latin-2 is ISO 8859/2 which
is the 8-bit character set for Eastern European languages.

Clearly we need to add additional information to the header of mail 
messages to indicate which character set to use.  I'm not sure of
the current state of the Internet protocols (RFC 822 et. al.) with
respect to this.  If there isn't the equivalent of a "Character-set:"
header yet, serious consideration should be given to adding one with
clearly defined values for at least existing ANSI and ISO character
sets.

Character sets that should have a defined string to use with such a
header field include at least:
	ASCII
	ISO 8859/1 
          ...
        ISO 8859/N  (where N is the last defined set)
        ISO 10646   (once it gets completed)

The Internet is the dominant mail transport network at present, partly
because so many other networks gateway with it.  Getting the Internet
to convert to supporting such needs would be a big step in the right
direction.  Perhaps someone on the IETF can comment on their current
activities in this area ??

Ran Atkinson
randall@Virginia.EDU

dlv@cunyvms1.gc.cuny.edu (Dimitri Vulis, CUNY GC Math) (04/12/91)

In article <1991Apr10.172756.4991@murdoch.acc.Virginia.EDU>, randall@Virginia.EDU (Randall Atkinson) writes:
>        ISO 10646   (once it gets completed)
"Unicode" seems both more practical and more realistic.
>Ran Atkinson
>randall@Virginia.EDU
Dimitri Vulis, D&M
BITNET:            DLV@CUNYVMS1
Internet:          DLV@CUNYVMS1.GC.CUNY.EDU
Snail:             Department of Mathematics/Box 330
                   City University of New York Graduate Center
                   33 West 42 Street
                   New York, NY 10036-8099
                   USA

rja7m@calico.cs.Virginia.EDU (Ran Atkinson) (04/12/91)

UNICODE isn't a sufficient solution as it doesn't fully support (for
example) Vietnamese.  DIS 10646 is a sufficient solution.

I wish it were otherwise, but I have to live in the real world...

eliot@chutney.rtp.dg.com (Topher Eliot) (04/12/91)

In article <1991Apr10.172756.4991@murdoch.acc.Virginia.EDU>, randall@Virginia.EDU (Randall Atkinson) writes:
|> In article <1110@sranha.sra.co.jp>, 
|> 	Erik M. van der Poel <erik@srava.sra.co.jp> responded:
|> >Have you ever tried to send yourself a message in Latin-1? Did it
|> >work? And even if *you* have a reasonable version of sendmail (one
|> >that doesn't strip the 8th bit), what makes you so certain that
|> >Torbj|rn's message and anyone else's won't pass through a site that
|> >*does* strip the 8th bit?
|> It does work for a fair and ever increasing subset of the Internet.
|> BITNET doesn't do very well with it.  Clearly we need to move towards
|> 8-bit and 16-bit and 32-bit transparent mail transport mechanisms.

I expected to see someone else post a more authoritative answer, but since
none has been forthcoming, I will venture.  The folks who work on such things
have been considering the 8-bit, different-codeset issues, as part of a much
larger picture of including such things as graphics and other binary
information in mail.  Since those are harder problems, it means that they
won't have solutions all that quickly.  There is a mailing list on this
subject; if you really need it I can probaly dig out a lead on how to get
onto that mailing list.

|> Fortunately there are a number of possible transport mechanisms out
|> there to choose from, some of which are already 8-bit transparent.
Ack!  "Fortunately"?  There is an ancient curse:  "may you live in interesting
times".  I think it's modern equivalent is "may you have many standards to
choose from".  

-- 
Topher Eliot                           Data General DG/UX Internationalization
(919) 248-6371        62 T. W. Alexander Dr., Research Triangle Park, NC 27709
eliot@dg-rtp.dg.com                           {backbone}!mcnc!rti!dg-rtp!eliot
Obviously, I speak for myself, not for DG.

randall@Virginia.EDU (Ran Atkinson) (04/15/91)

Ran Atkinson originally wrote:
% Fortunately there are a number of possible transport mechanisms out
% there to choose from, some of which are already 8-bit transparent.

In article <1991Apr12.124741.11555@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes:
>Ack!  "Fortunately"?  There is an ancient curse:  "may you live in interesting
>times".  I think it's modern equivalent is "may you have many standards to
>choose from".  

I said what I meant, namely that there are several different transport
MECHANISMS (i.e. sendmail, MMDF, PMDF, etc.) not several different
transport PROTOCOLS.   The whole of the Internet uses the same mail
protocols and that is a good thing, but the availability of different
mechanisms to implement those protocols is also a good thing.  Especially
since some of the mechanisms are already 8-bit transparent, though not
all are.

I would like to see the 8-bit transparency with some kind of character
set definition be added to the protocol more rapidly than Eliot seems
to think likely.