[comp.mail.uucp] 8-bit mail

ct@dde.uucp (Claus Tondering) (05/12/89)

More and more companies (especially in Europe) are moving from a
7-bit pseudo-ASCII environment to an 8-bit environment (typically
based on the ISO 8859/1 character set). Our company has been using
this 8-bit character set for some years now. But we have problems with
E-mail. Within our organization uucp transfer of E-mail with 8-bit characters
works fine, but if our mail leaves the organization and goes to this
country's backbone machine, the 8th bit is removed from our letters.

The reason, I am told, is that E-mail is based in a set of RFCs that
specify 7-bit ASCII as the character set to use, and therefore characters
with the 8th bit set are stripped. Why must it be so? Uucp has no problems
with 8-bit characters, so why must we restrict ourselves to a standard
that is dying anyway?
-- 
Claus Tondering
Dansk Data Elektronik A/S, Herlev, Denmark
E-mail: ct@dde.dk    or    ...!uunet!mcvax!dkuug!dde!ct

prc@erbe.se (Robert Claeson) (05/16/89)

In article <557@Aragorn.dde.uucp> ct@dde.uucp (Claus Tondering) writes:

>More and more companies (especially in Europe) are moving from a
>7-bit pseudo-ASCII environment to an 8-bit environment (typically
>based on the ISO 8859/1 character set). Our company has been using
>this 8-bit character set for some years now. But we have problems with
>E-mail. Within our organization uucp transfer of E-mail with 8-bit characters
>works fine, but if our mail leaves the organization and goes to this
>country's backbone machine, the 8th bit is removed from our letters.

Some do, some don't. Our SysVR3.1 systems with sendmail don't strip the eight
bit (I know for sure, we always use ISO 8859/1 within our organization), but
most (all?) BSD systems do.

>The reason, I am told, is that E-mail is based in a set of RFCs that
>specify 7-bit ASCII as the character set to use, and therefore characters
>with the 8th bit set are stripped. Why must it be so? Uucp has no problems
>with 8-bit characters, so why must we restrict ourselves to a standard
>that is dying anyway?

RFC is an American kind of "standard". ASCII is an American standard, too.
The Americans only need 7 bits, so don't expect the RFC to change within
this century. I guess we'll have to wait for X.400 to become more widespread
(anyone got a PD X.400 MUA/MTA?).

-- 
          Robert Claeson      E-mail: rclaeson@erbe.se
	  ERBE DATA AB

storm@texas.dk (Kim F. Storm) (06/08/89)

ct@dde.uucp (Claus Tondering) writes:

>More and more companies (especially in Europe) are moving from a
>7-bit pseudo-ASCII environment to an 8-bit environment (typically
>based on the ISO 8859/1 character set). Our company has been using
>this 8-bit character set for some years now. But we have problems with
>E-mail. Within our organization uucp transfer of E-mail with 8-bit characters
>works fine, but if our mail leaves the organization and goes to this
>country's backbone machine, the 8th bit is removed from our letters.

>The reason, I am told, is that E-mail is based in a set of RFCs that
>specify 7-bit ASCII as the character set to use, and therefore characters
>with the 8th bit set are stripped. Why must it be so? Uucp has no problems
>with 8-bit characters, so why must we restrict ourselves to a standard
>that is dying anyway?

Other people have pointed out the technical reasons for the stripping of the 
eight bit (the Danish backbone is a BSD based system).  It has alse been
pointed out, that there is little hope that all 8 bit messages will
survive, and that we will have to live with this draw-back until X.400
comes around and solves all our problems.

However, at least in Europe where the need for 8-bit character sets is
greater than in the U.S., the Backbones on EUnet should run mailers that
can forward 8-bit data unchanged (not the situation today).  If an end-site
cannot handle 8-bit data it is their problem.

However, even if the backbones and the end-sites would pass on 8-bit data,
we would still be faced with the problem that the messages may be read on
a terminal with a different character set than the one it was posted from.
The poster will have to tell which character set his message is written in,
and the recipient must use a terminal which can show this character set,
or his mail program must be able to remap the characters in a sensible way.

I think that a new header field would be required for this, e.g.
Character-Set: 8859/1 
and if that is missing, ASCII (8859/0 ?) is assumed.  

Is the concept of characters sets defined in X.400 or are we going 
to have the same problem there?

-- 
Kim F. Storm        storm@texas.dk        Tel +45 429 174 00
Texas Instruments, Marielundvej 46E, DK-2730 Herlev, Denmark
	  No news is good news, but nn is better!

recerik@alliant.uni-c.dk (Erik Bertelsen) (06/14/89)

X.400 in the 1984 version supports teletex and the IA5 (ASCII) character
encodings. The 1988 version adds some more options, but not 8-bit encodings
like ISO 8859/1. To my best knowledge X.400 in its current status will not
help us poor souls in countries with alphabets with more letters than the
26 letters used in English. - sigh!

But making mail messages written in ISO 8859/1 (and other flavors of 8859) 
work would be very nice. As Kim suggests we may have to invent yet another
header line to accomplish this - sigh again !

- erik