npn@cbnewsl.att.com (nils-peter.nelson) (12/27/90)
Thanks to all those who responded to my request for international troff requirements. The responses were uniformly helpful and specific (if occasionally impatient with the backwardness evident in this little part of the world). In addition to public responses, I got private letters from: Anders Thulin, Dan Berry, Dick Dunn, Heimir Thor Sverrisson, Chris Lewis, Alexios Zavras, Jaap, Robert Andersson, Steve Azmier, and a very appropriate paper from Keizer, Simonsen and Akkerhuis [KSA]. The consensus appears to be: 1. Allow all DWB components to read 8-bit characters as defined by ISO 8859-1, a.k.a Latin-1. The editing and preparation of such documents is the province of 8-bit terminals, 8-bit editors, and not our concern. This requires that we remove all &177's. 2. Default behavior for troff should be "8-bit in, 8-bit out". The postprocessors will be rewritten to take this into account. In addition, we should allow a "-7b" option to force troff output to be in the ASCII (ISO 646, 7 bit) subset. This would permit mailing of ditroff output to the part of North America that hasn't caught on to ISO 8859. 3. Recognize two-character 7-bit escapes so that people who don't have 8-bit terminals can still create documents with the extra characters, [KSA] have proposed a reasonable standard convention which could serve as both input and output for troff. (e.g., \(oa for 'aring') but there are other proposals we will look at as well. 4. Reserve \C'string' and \N'number' for the truly odd characters that don't have a more convenient representation. 5. Hyphenation may present insurmountable problems; we'll see if anyone else (e.g. Knuth) has solved them. Worst case, however, is that we'll hyphenate badly, and you'll have to turn it off. We will probably package this as DWB 3.2, which will be an "incremental" upgrade to DWB 3.1 (this means a minor fee for those who are DWB 3.1 licensees). Some of the work has already been completed, so the package should be ready around May 1991.
jjc@jclark.UUCP (James Clark) (12/31/90)
In article <1990Dec27.155046.14520@cbnewsl.att.com> npn@cbnewsl.att.com (nils-peter.nelson) writes:
The consensus appears to be:
1. Allow all DWB components to read 8-bit characters as defined
by ISO 8859-1, a.k.a Latin-1. The editing and preparation of
such documents is the province of 8-bit terminals, 8-bit editors,
and not our concern. This requires that we remove all &177's.
groff already does this.
2. Default behavior for troff should be "8-bit in, 8-bit out".
The postprocessors will be rewritten to take this into account.
groff already does this.
In addition, we should allow a "-7b" option to force troff
output to be in the ASCII (ISO 646, 7 bit) subset. This would permit
mailing of ditroff output to the part of North America that
hasn't caught on to ISO 8859.
I'm unconvinced by this. What's wrong with using uuencode? In any
case, if you want to send a document to somebody, it would seem to me
to be better to send either the ditroff input file or the
postprocessor output (since the ditroff output is tailored to a
particular device anyway).
3. Recognize two-character 7-bit escapes so that people who
don't have 8-bit terminals can still create documents with
the extra characters, [KSA] have proposed a reasonable standard
convention which could serve as both input and output for troff.
(e.g., \(oa for 'aring') but there are other proposals we
will look at as well.
groff already does this. It uses the two-character names described in
[KSA]. It would be a pity if DWB adopted an incompatible scheme.
4. Reserve \C'string' and \N'number' for the truly odd characters
that don't have a more convenient representation.
groff takes this approach.
5. Hyphenation may present insurmountable problems; we'll see
if anyone else (e.g. Knuth) has solved them. Worst case,
however, is that we'll hyphenate badly, and you'll have to
turn it off.
I believe groff has a good solution to the hyphenation problem.
Hyphenation works in terms of hyphenation codes. Initially, the
letters `a' to `z' have `a' to `z' as their hyphenation codes, and `A'
to `Z' have `a' to `z'. There's a request that allows you to specify
the hyphenation code for any normal or special character; for example,
.hcode \(^a a
would give `\(^a' (the name for `a' with a circumflex accent) a
hyphenation code of `a'. Groff uses the same hyphenation algorithm
that TeX does (invented by Frank Liang): the hyphenation process is
controlled by a set of hyphenation patterns; letters in the patterns
are interpreted as hyphenation codes. By supplying an appropriate
file of patterns and set of `hcode' requests, it should be possible to
make groff correctly hyphenate languages other than English.
We will probably package this as DWB 3.2, which will be an
"incremental" upgrade to DWB 3.1 (this means a minor fee for
those who are DWB 3.1 licensees). Some of the work has already
been completed, so the package should be ready around May 1991.
These features are in the currently released version of groff (0.6).
James Clark
jjc@jclark.uucp
jjc@ai.mit.edu
bruce@balilly.UUCP (Bruce Lilly) (01/02/91)
In article <JJC.90Dec31140544@jclark.jclark.UUCP> jjc@jclark.UUCP (James Clark) writes: >In article <1990Dec27.155046.14520@cbnewsl.att.com> npn@cbnewsl.att.com (nils-peter.nelson) writes: >groff already does this. > > In addition, we should allow a "-7b" option to force troff > output to be in the ASCII (ISO 646, 7 bit) subset. This would permit > mailing of ditroff output to the part of North America that > hasn't caught on to ISO 8859. > >I'm unconvinced by this. What's wrong with using uuencode? In any >case, if you want to send a document to somebody, it would seem to me >to be better to send either the ditroff input file or the >postprocessor output (since the ditroff output is tailored to a >particular device anyway). Uuencode/uudecode are not universally available. If the ditroff input file is in an 8-bit character set, it is unmailable via some mail transport software. Likewise for 8-bit output, hence the desire to restrict the output to a 7-bit character set. The 'd' and 'i' in ditroff atnd for "device" and "independent", respectively. Ditroff output is *not* tailored to any particular device. The ditroff output can be interpreted by postprocessors for specific devices. The same cannot necessarily be said of other text procssors (perhaps including groff). Postprocessor output *is* tailored to a specific device, hence is not suitable for widespread distibution. Also, note that some of these device-specific output formats (such as PostScript) are both extremely verbose (more so than ditroff output) and may include 8-bit characters. -- Bruce Lilly blilly!balilly!bruce@sonyd1.Broadcast.Sony.COM
jaap@mtxinu.COM (Jaap Akkerhuis) (01/03/91)
In article <JJC.90Dec31140544@jclark.jclark.UUCP> jjc@jclark.UUCP (James Clark) writes: > > would give `\(^a' (the name for `a' with a circumflex accent) a > hyphenation code of `a'. Groff uses the same hyphenation algorithm > that TeX does (invented by Frank Liang): the hyphenation process is > controlled by a set of hyphenation patterns; letters in the patterns > are interpreted as hyphenation codes. By supplying an appropriate > file of patterns and set of `hcode' requests, it should be possible to > make groff correctly hyphenate languages other than English. Not necessarily. This depends on the rules of the language. The hyphenation rules might threat the ``a^'' completely different then an ``a'' for a given language. In that case, mapping is not good enough. jaap
jay@silence.princeton.nj.us (Jay Plett) (01/03/91)
In article <1991Jan2.024946.10442@blilly.UUCP>, bruce@balilly.UUCP (Bruce Lilly) writes: ... > The 'd' and 'i' in ditroff atnd for "device" and "independent", > respectively. Ditroff output is *not* tailored to any particular device. > The ditroff output can be interpreted by postprocessors for specific > devices. ... That's misleading. Ditroff output is not only device dependent, it is dependent on a particular set of width tables for a particular device. Ditroff and the postprocessor MUST use the same set of width tables. Ditroff outputs motions that are derived from the width tables. Moreover, when a character does not exist in the current font, no font change is encoded in ditroff's output. Ditroff assumes that the postprocessor will not only have the same widths, but that it will also use the same strategy for noticing that a font change is necessary and for finding the same character in the same font that ditroff found it in. The "di" in ditroff means that device dependence is bound at run-time rather at compile-time. > Postprocessor output *is* tailored to a specific device, hence is not > suitable for widespread distibution. If the same device will be used, there's no harm in distributing postprocessor output. It would be rash to distribute ditroff output and expect it to print correctly, quite possibly even among similar systems at the same site. ...jay
peterson@lyle.austin.ibm.com (James L. Peterson/1000000) (01/03/91)
In article <679@silence.princeton.nj.us> jay@silence.princeton.nj.us (Jay Plett) writes: >That's misleading. Ditroff output is not only device dependent, it >is dependent on a particular set of width tables for a particular >device. Ditroff and the postprocessor MUST use the same set of >width tables. Ditroff outputs motions that are derived from the >width tables. Moreover, when a character does not exist in the >current font, no font change is encoded in ditroff's output. >Ditroff assumes that the postprocessor will not only have the same >widths, but that it will also use the same strategy for noticing >that a font change is necessary and for finding the same character >in the same font that ditroff found it in. That's not really true. Ditroff dvi codes include the output motions that troff expects as a result of output. This means that the post processor need not have any width tables at all, IF the output device does not automatically move after each output character. Thus, troff dvi will include "46z37o29t" which means to move 46 units, output a 'z', move 37 units, output an 'o', move 29 units, output a 't', ... ALL movement is explicit in the dvi file. Contrast this with TeX dvi which assumes that the postprocessor "knows" the same character widths as TeX did. TeX dvi would include only "zot" and would expect that the postprocessor would know the width of the characters and would move the "right" amount. This means that TeX dvi is smaller (doesn't have to include all the movements that are almost always the same for all the characters), but also means you have no idea where to put characters without the width tables. troff dvi, on the other hand, can easily be interpreted without the character width tables, or with other width tables. The position of each and every character is completely determined by the dvi file. Now if you don't have the right fonts (like you are printing on a different output device than the file was produced for), the characters will look strange, but each and every one will be in exactly the right place. I have used this in two ways: (1) it is easy to write a screen display for troff dvi that will preview a document at different resolution and with different fonts (screen fonts rather than printer fonts). The output is not as readable, but it shows the correct placement of the characters, so you can see margins, tables, and if expanded enough is still quite readable. (2) I can take documents that are meant for a typesetter at one site, ship them across country and print them on a laser printer of different resolution with different fonts with ease. The post processor has to be aware of the difference in font tables and be willing to substitute or ask for font substitutes (but with troff's R, I, B fonts it is fairly easy to substitute. Even the special two-character names are portable.) and it must distinguish between the input resolution (of the dvi file) and the output resolution (of the output device) which may be different, and be prepared to scale accordingly. Neither of these is possible/easy with TeX dvi, unless you have the exact same character font tables. >If the same device will be used, there's no harm in distributing >postprocessor output. It would be rash to distribute ditroff output >and expect it to print correctly, quite possibly even among similar >systems at the same site. > A lot of this statement depends on what "print correctly" means. If you mean print and look like it was formatted for the device that it was printed on, this is a reasonable statement, but if you mean "print correctly" to mean "put each character exactly where troff wanted it to be" (even though the characters are a different font, and probably different widths), troff dvi is great for this. James L. Peterson (peterson@futserv.austin.ibm.com) -- James L. Peterson IBM Advanced Workstations Div. !'s: cs.utexas.edu!ibmchs!peterson 11400 Burnet Road, MS 2812 @'s: @CS.UTEXAS.EDU:peterson@ibmchs.uucp Austin, Texas 78758 !&@: ibmchs!peterson@CS.UTEXAS.EDU
jjc@jclark.UUCP (James Clark) (01/04/91)
In article <1991Jan2.231520.21468@mtxinu.COM> jaap@mtxinu.COM (Jaap Akkerhuis) writes: In article <JJC.90Dec31140544@jclark.jclark.UUCP> jjc@jclark.UUCP (James Clark) writes: > > would give `\(^a' (the name for `a' with a circumflex accent) a > hyphenation code of `a'. Groff uses the same hyphenation algorithm > that TeX does (invented by Frank Liang): the hyphenation process is > controlled by a set of hyphenation patterns; letters in the patterns > are interpreted as hyphenation codes. By supplying an appropriate > file of patterns and set of `hcode' requests, it should be possible to > make groff correctly hyphenate languages other than English. Not necessarily. This depends on the rules of the language. The hyphenation rules might threat the ``a^'' completely different then an ``a'' for a given language. In that case, mapping is not good enough. There is nothing in the groff scheme that constrains `a^' to have the same hyphenation code as `a'. A hyphenation code can be any single input character that isn't a digit or white space. For example, you could make the hyphenation code of `a^' (and `A^') the character which is `a^' in ISO 8859-1. You just have to make sure that the `hcode' requests match the conventions that were used in the generation of the hyphenation patterns. James Clark jjc@jclark.uucp jjc@ai.mit.edu