hansen@pegasus.att.com (Tony L. Hansen) (12/31/90)
<< From: hansen@pegasus.att.com (Tony L. Hansen) << By the way, the SMTP protocol doesn't permit 8-bit data. This limits << mailers which must send mail using that protocol. < From: tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) < True. But there is no technical reason (other than short-sightedness) < why SMTP has to strip off the 8th (high) bit. There are in fact < working versions of sendmail that don't disturb the 8th bit. I agree completely, there is no reason to limit SMTP to 7-bits. Unfortunately, the standard currently REQUIRES the stripping and doing anything else is non-standard. I would definitely support changing the standard to allow an arbitrary 8-bit byte stream. This would also require eliminating the limitation of 1024-byte lines and anything else in the standard which is not content transparent. System V release 4 mail is completely content transparent. As long as the transport media is capable of handling the mail, SVr4 mail will be able to get it to you unchanged. Unfortunately, it can't do so over SMTP connections. Since this discussion is going somewhat away from the bounds of comp.text, I've added comp.mail.misc to the Newsgroup list. Tony Hansen att!pegasus!hansen, attmail!tony hansen@pegasus.att.com
barmar@think.com (Barry Margolin) (12/31/90)
In article <1990Dec31.004055.10335@cbnewsk.att.com> hansen@pegasus.att.com (Tony L. Hansen) writes: >System V release 4 mail is completely content transparent. As long as the >transport media is capable of handling the mail, SVr4 mail will be able to >get it to you unchanged. What does it do when sending textual mail to a system that doesn't use ASCII encoding, e.g. an IBM mainframe, or to a system with a different newline convention (e.g. CRLF rather than LF)? SMTP places restrictions on the characters that may appear in a message to support automated translation during the transfer process. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
keld@login.dkuug.dk (Keld J|rn Simonsen) (01/02/91)
hansen@pegasus.att.com (Tony L. Hansen) writes: >< From: tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) >< True. But there is no technical reason (other than short-sightedness) >< why SMTP has to strip off the 8th (high) bit. There are in fact >< working versions of sendmail that don't disturb the 8th bit. This introduces a problem with "embedded slashes" which are now represented internally in Sendmail with the 8th bit set. Have anybody got Sendmail patches to remedy this? >I agree completely, there is no reason to limit SMTP to 7-bits. >Unfortunately, the standard currently REQUIRES the stripping and doing >anything else is non-standard. I would definitely support changing the >standard to allow an arbitrary 8-bit byte stream. This would also require >eliminating the limitation of 1024-byte lines and anything else in the >standard which is not content transparent. I am much in favour of extending the character set supported by SMTP. But you should be careful. What is the meaning of a 8-bit character? Well, depends on the character set employed. Today we know that only 7-bit ASCII is allowed. But with 8-bit mail, is this octal code 0162 coming over the line an "small a with acute accent" (as in ISO 8859-1:1987), a Cent sign (as in IBM CP 437) or a "capital A with circumflex" (as in HP Roman8)? This might become a real problem given the current shares on the UNIX market. Just displaying the 8bit data to a user may be very confusing. It may even do strange things to your terminal equipment if IBM Codepage character set is employed, as some of the characters here are in the upper control character sets of ISO 8859 and other vendors chararacter sets. Should one then just say "Use ISO 8859"? Well, what ISO 8859? There are several parts, latin 1, latin 2 (eastern Europe), Greek, Cyrillic, Arabic, Hebrew (among others). The abovementioned character 0162 has different meanings in these different character sets. ISO 8859-1 would be the natural choice (and is also specified in a recent RFC on encoding: header.) But is that fair? I think that is like inventing a new ASCII, only capable of serving one region of the world sufficiently - this time having Western Europe (EEC) and all of North and South America covered. We should do something that could cover the whole world. It is also quite hard to persuade your manufactures to change their implementation character set, and even worse for equipment you already have bought and installed. Some of this may even be running software with no 8-bit capabilities! I think it would be nice to be able to support all of these new and oldie systems, and I have done an implementation of Sendmail capable of supporting more than 60 character sets. It currently does not touch the headers, but only the mail body. For characters not in the current character set, it encodes this character with a mnemonic code, for example a' for the above mentioned "small a with acute". Thus even in ASCII you can get the message! The sendmail patches are available with anon ftp in dkuug.dk:pub/ch.shar and sm5.64.8+bit.pa (sm.8+bit.pa for 5.61). Its about 100 kb - the Sendmail patches itself is under 100 lines, the rest is the character set stuff. It has been running here at dkuug.dk since Feb 90. A new ISO standard is showing up: ISO 10646 (which just has been published as a Draft International Standard (DIS)). This covers all characters in the world, with very few exceptions. And the exceptions are planned to be included in a later issue. Actually Dan Oscarsson and I have been planning (mostly Dan) to do a SMTP implementation for Sendmail negotiation 10646 for transmission, and write an RFC for this character set negotiation. Keld Simonsen
les@chinet.chi.il.us (Leslie Mikesell) (01/02/91)
In article <1990Dec31.013538.9473@Think.COM> barmar@think.com (Barry Margolin) writes: >In article <1990Dec31.004055.10335@cbnewsk.att.com> hansen@pegasus.att.com (Tony L. Hansen) writes: >>System V release 4 mail is completely content transparent. As long as the >>transport media is capable of handling the mail, SVr4 mail will be able to >>get it to you unchanged. >What does it do when sending textual mail to a system that doesn't use >ASCII encoding, e.g. an IBM mainframe, or to a system with a different >newline convention (e.g. CRLF rather than LF)? SMTP places restrictions on >the characters that may appear in a message to support automated >translation during the transfer process. But the automated translation can currently only work with text while many mailers are now capable of attaching arbitrary binary data to messages. Depending on the type of the content, a different transformation (or none) may be desired. Assuming that the non-textual portions are encapsulated with "Content-Type:" and "Content-Length:" headers, it would be easy for the transport to determine what, if any, transformation to use. In addition, an optional "Encoding-Method:" header can allow temporary transformations to meet the character set requirements of the transports. If the sending program had a way to determine the capabilities of the recipient, encoding could be done on-the-fly, using uuencode or atob, and thus only done where necessary (but I don't know of anyone actually doing this yet...). These issues are going to have to be addressed for messages originating on X.400 systems anyway, so why not try to do it efficiently by adding the equivalent functionality to SMTP/uucp mailers? Les Mikesell les@chinet.chi.il.us