toddp@hp-ptp.HP.COM (Todd_Poynor) (02/28/90)
Message bodies containing the Horizontal Tab character (ASCII 9) pose quite a problem to Mail User Agents: it is impossible to know how to correctly reproduce the original behavior of the tab on a recipient's display device. That is, the tab stops defined at the sending user's terminal may not correspond to the tab stops defined at the terminal of the destination user(s). Although for UNIX systems tabs at every 8 character positions is fairly standard, this is not the case for other flavors of hosts which send and receive Internet mail. Indeed, on certain hosts and display devices the tab character is not normally understood as a horizontal tab at all. Misaligned columns on reports, often to the point of near-incomprehensibility, are a constant annoyance to users in such a situation. Two obvious means of solution are apparent: to avoid use of that character in text messages which may potentially be received by someone with differing tab stops, or to include information with the message which informs the destination User Agent of the intended tab stops. Avoidance of use can be accomplished by simply not pressing the Tab key when entering messages, but we creatures of habit usually find this hard to remember not to do. Messages can be filtered through a process which locally expands tabs to blanks before dispatching the message, but again, it is difficult to remember to do this if not automatically done. If the filtering is performed automatically, it has the undesirable effect of corrupting certain verbatim-text usages, such as within messages containing files to be transferred "as is". A familiar UNIX example of such corruption is the expansion of tabs within messages containing "shar" archived files, where the receiving process may detect that the received data does not match the data originally sent. Automatic expansion of tabs may be feasible if some means of preventing unwanted expansion is provided. For the unusual case of mailing verbatim text it is perhaps not overly difficult to remember to include some sort of header information or text marker which inhibits message body modification. Taking a cue from the privacy enhancement RFCs, a text marker such as: -----TEXT PROTECTION BOUNDARY----- could indicate the end of text subject to detabbing or any other conceivable text modifications. This marker may have to be recognized even within encapsulated messages (messages within messages, as per RFC 934), where a "- " would be prefixed to the marker. The marker could even be automatically generated by archiving software at the top of the archive. Aside from the general impression of inelegance left by the text marker solution on many computer literates (including this author), the practice of automatic text modifications strikes some as a gross violation of data communications protocol. Although it can be argued that such tab expansion falls under the category of approved cross-host translations along with local character set translation, the general feeling is that the original content of the message should be preserved to the greatest extent possible. For this reason, a preferred method might be to preserve the tab characters within the message, and include information in the message header which informs User Agents what the proper tab settings are. This information would normally correspond to the tab stops which were set at the sending user's terminal. For mail sent by automatic means where no terminal can be identified with the creation of the message either a local default may be given, or the information may be omitted, indicating that, as in the present-day situation, the tab behavior is up to interpretation by the destination. A new header field is probably in order for this purpose. In absence of a standard, the user-defined field nomenclature of prefixing the field name with "X-" has been suggested for prototype implementations. The proposed field syntax in RFC 822 notation is: tab-define = "X-Tab-Stops" ":" 1#(tab-posn / tab-incr) tab-incr = "+" 1*DIGIT tab-posn = 1*DIGIT This syntax defines a field named "X-Tab-Stops" which takes as an argument a comma-separated list of numerical values, each optionally preceded with a plus sign. The list should include at least one of these values, and each value is a string of at least one decimal digit. Each of these values is interpreted as the definition of the next tab stop in left-to-right order across the destination display. The interpretation of each value is as follows: o If it is a tab-posn (that is, is not preceded by a plus sign) the value is the character position of the next tab stop, where the first character in the line is numbered one. o If it is a tab-incr (preceded with a plus sign) the value is a number of characters relative to the character position of the preceding tab stop at which the next tab stop is to be set. If there has been no previous tab stop definition, meaning that this is the first item in the list, the increment is relative to character position 1. If it is the last item in the list this increment applies indefinitely, such that the effect is to have an infinite number of tab stops set from this position forward, each with this same character position increment between them. So for UNIX users with tab stops every 8 characters this might appear as: X-Tab-Stops: 9, 17, 25, 33, 41, 49, 57, 65, 73 or more succinctly: X-Tab-Stops: +8 A typical setting for FORTRAN programmers might be: X-Tab-Stops: 7, +3 which sets the first tab at position 7, the start of the statement area, and every 3 positions thereafter. One possible modification to the argument syntax is to delete the commas, using blanks as separators between items for efficiency (RFC 822 favors the shown syntax for lists). This header field is intended to be interpreted by Mail User Agents at message viewing time. Tab characters in the body are expanded as blanks, according to the tab stops defined in the field. Of course, use of other control characters or characters outside the standard printing subset may cause the User Agent to have an incorrect notion of the current character position at expansion time. This is not expected to be a problem in most text messages of the sort normally used in inter-host environments. This solution addresses the problem of reading tabbed messages at the presentation level, which many feel is appropriate. Not specifically addressed is the problem of saving the message text in the local file system, where detabbing a particular message may be required or may be prohibited, depending on the intended use of the file. Conceivably, the same software which displays messages on terminals can perform the conversion into files, leaving execution of this software for file storage to user discretion. Digests may require interpretation of the "X-Tab-Stops" field at each encapsulated message header by presentation software, or digestification software may convert encapsulated messages to a common tabbing convention. A subject of controversy is whether gateways to foreign mail systems not adhering to any such tab stop representation should expand the tabs contained in the body according to the "X-Tab-Stops" field during transfer. Bearing the aforementioned warnings about data corruption in mind, this author recommends the data be passed unretouched; let the foreign community demand such a capability in that mailer if deemed of sufficient importance. Obviously, the problem of tab representation is difficult to solve, perhaps more difficult than is warranted by the relatively minor consequence involved. If you have any thoughts on a simpler solution I welcome your suggestions. Submitted for your approval, Todd Poynor HP Data Systems Operation todd@hpepoc.hp.com 408/746-5185
Craig_Everhart@transarc.com (03/01/90)
It seems a shame to burn (human) cycles on a minor problem when there are so many larger fish to fry. I would refer the reader to RFC 1049 for the description of a Content-Type: header in messages that reaches far beyond simple tab-stop specification. I've gotten reasonably used to viewing all messages in a variable-pitch font, switching to a fixed-pitch one only when I want to see the ASCII-graphic information that somebody has written. Yes, my mail reader assumes it knows what tab characters mean, but I could teach it other things by extending content-type rather than by inventing yet another header. Todd Poynor has done an effective job of analyzing the problems that would arise in practice: encapsulation, shar'ing, gatewaying, and the like. I just wish he were looking at the Content-type: problems instead of a simple tab-stop one! Craig
Makey@Logicon.COM (Jeff Makey) (03/01/90)
In article <1960003@hp-ptp.HP.COM> toddp@hp-ptp.HP.COM (Todd_Poynor) writes: >If you have any thoughts on a simpler solution I welcome your suggestions. People (not autonomous programs acting on behalf of people!) could convert tabs to the appropriate number of spaces before they send their mail. I do this and it works wonderfully. :: Jeff Makey Department of Tautological Pleonasms and Superfluous Redundancies Department Disclaimer: Logicon doesn't even know we're running news. Internet: Makey@Logicon.COM UUCP: {nosc,ucsd}!logicon.com!Makey
toddp@hp-ptp.HP.COM (Todd_Poynor) (03/07/90)
From: Craig_Everhart@transarc.com >It seems a shame to burn (human) cycles on a minor problem when there >are so many larger fish to fry. I would refer the reader to RFC 1049 >for the description of a Content-Type: header in messages that reaches >far beyond simple tab-stop specification. RFC-1049 Content-Type syntax could indeed be used to specify tab presentation, since the resource-ref allows a local-part to be given, which in turn allows a quoted-string to be given. I like this suggestion. At first I considered Content-Type inappropriate since it appeared to be concerned only with "larger fish": the content-type identifiers mentioned in the RFC are supposedly all that is needed to specify the desired appearance, that is, the identifiers name standard formats for which the interpretation should be clear. The syntax is not really geared toward supplying a more complex set of rules as required to interpret non-standardized "contents". If the Content-Type field is to be used in this manner, then I suspect that presentation software will need to handle more than one of these fields in a message header, such that one can specify the behavior of tabs within the larger context of a document format, for example. Failing this, we require the screen appearance to be completely defined by a single content-type/ver-num/resource-ref tuple, probably relegating the tab-stop definition to an optional part of the resource-ref of a content-type named "PLAIN-TEXT". Perhaps this is sufficient for the tab-stop problem, but I won't be surprised if instances arise where the contents of a message require interpretation at both the text and overall organization levels. And so on to the question of, "Are we analyzing to death a piddling little detail when more important issues abound, like mailing POSTSCRIPT files?". The final paragraph of my original posting anticipated such a viewpoint, and I freely admit that the accusation has merit. Part of my motivation to discuss this problem was to see if concern over mismatched tab stops would strike a responsive chord in the USENET community. Such a reaction is lacking thus far. The issue is far closer to my heart than those specifically tackled in RFC 1094 since I work on a computer where tab stops are entirely a matter of user preference, and deal with the e-mail problem regularly. The issue has been brought up a number of times by various people on Hewlett-Packard's private newsgroups due to the common use here of systems to which the tab character is almost completely foreign. I can hardly believe such a situation is all that rare. Does anyone else believe this problem is worth discussing? From: Makey@Logicon.COM (Jeff Makey) >People (not autonomous programs acting on behalf of people!) could >convert tabs to the appropriate number of spaces before they send >their mail. Not all users of mail systems are aware of the problem, and few of those who are aware are diligent about either avoiding the tab key or performing such a conversion. The basenote mentioned this. If relying on the user community never to use tabs is the most feasible solution then many people must learn to break a deeply ingrained habit. I do agree, of course, that simple abstinence is quite a clean solution from a technical standpoint, if not from a behavioristic one. The bulk of this discussion has been concerned with doing the right thing if, alas, a tab character makes it through. If complete avoidance is desired then the next step is to decide how best to educate users of the problem, and how to encourage or enforce this avoidance. ^todd