[comp.text] TeX, DVI files, fonts, page selection

chris@mimsy.UUCP (Chris Torek) (09/20/87)

Lately I have seen some interest expressed about TeX and DVI files
and related issues such as fonts.  To dispell tales, disseminate
truth, define terms, and perhaps induce some other DTs, I have put
together this file.

TeX is a program for typesetting text, particularly mathematical
text.  _The_TeXbook_, by Donald E. Knuth, describes in great detail
what TeX does and how it goes about doing it.  _The_TeXbook_ is
the first in a five volume series on Computers and Typesetting.
In the series is also _TeX:_The_Program_, which might be described
as an annotated source listing of TeX itself; two books on METAFONT;
and one on Knuth's Computer Modern fonts.  These are available from
Addison-Wesley; I do not have ordering information handy.

The TeX program itself is available from Stanford University, and
a version specifically for Unix from University of Washington.  At
this time I believe the Unix TeX runs on 4.1, 4.2, and 4.3BSD on
Vaxen, on Sun 2s and 3s running SunOS 3.x, and on Pyramids.  No
doubt other ports are available; again, I have no details.  TeX is
written in a language called WEB, which contains both the source
to the program and the source to the annotated listing.  Two
auxiliary programs (Tangle and Weave) extract the program and
listing portion; the former creates a Pascal source file, and the
latter a TeX document suitable for formatting and printing.  The
Pascal produced is an extended version of a limited version of
standard J&W Pascal, avoiding `new' and `dispose' but using a
default clause in case statements.  Most Pascal compilers can handle
this with, if not ease, at least not too much tweaking.  There are,
however, translations of TeX into C.  Two of which I am aware are
Common TeX, by Pat Mondaro, which I believe to be freely available,
and C-TeX, by Tomas Rokicki.  There are several versions that run
on IBM PCs.  As usual, I have no details on other ports of TeX.

TeX produces DVI files.  The format of these files is documented
in _TeX:_The_Program_ and in the source code for DVITYPE.  The
latter should be included in any TeX distribution, and in any case
is available from Stanford.  DVI stands for DeVice Independent; a
DVI file is not suitable for printing on any particular typesetter
or laser printer.  Instead, it is converted by a `driver'.  There
is one driver for each kind of printer:  For instance, there is a
PostScript driver that will convert DVI files to the appropriate
commands for an Apple LaserWriter or other compatible PostScript
printer.  (There is another kind of DVI file that is produced by
ditroff.  This format is quite different from TeX's DVI; indeed,
the ditroff format hardly merits the appellation `device independent'.
It still requires a driver for conversion, but has embedded within
it assumptions about the printer that do not appear in a TeX DVI
file.  Not that it is a bad format---I just think calling it a DVI
file is misleading.)

A TeX driver converts the device independent output file to a
particular device's format.  This is a threefold task, involving
reading pages from the DVI file, handling fonts, and decoding
`\special's.  The first is fairly straightforward: drivers merely
need follow the rules laid out by DVITYPE.  The remaining two are
complicated by a profusion of font formats and a lack of standards.
No two drivers, it seems, do the same thing with any given \special,
and \specials tend to be overly device dependent.  Specials such
as `include a PostScript program here' clearly cannot work on
printers that do not implement PostScript.  This is inherent in
the nature of a \special, of course, but there are some common
operations that should be done in a common way, such as drawing
lines and arcs and other simple graphics.

The font problems are not quite as difficult, but to some sites
are more important.  A full set of TeX fonts could require more
than 30 megabytes of disk space.  By using a more compact format,
these same fonts shrink to less than 10 megabytes.  Many drivers
can handle only the least compact format.  Worse, some drivers
handle only one format, and some only another, forcing some sites
to keep two or more copies of every font.

The three standard font file formats for TeX are GF files, PK files,
and PXL files.  GF, or Generic Font, files use an intermediate
amount of space; PK, or PacKed, files use the least; and the obsolete
but still widely used PXL `pixel' files require the most space.
Typically a GF file will be about half the size of the corresponding
PXL file, and the same file in PK format will be half again the
size of the GF file.  In addition, PK files are easier to decode
than GF files, being better engineered for unpacking.  Hence PK
format is the best standard format around.  (The drivers in my TeX
support code---something now called `ctex', although it has little
to do with TeX in C---handle all three font formats.)

Even the ability to read any of these font formats does not solve
another crucial problem.  Low resolution fonts, such as those for
300 dpi printers like the LaserWriter, depend heavily upon the
mechanical qualities of the printer.  There are two major kinds of
laser printers available today.  The Canon engine, used in the
LaserWriter, the Imagen 8/300, and the HP LaserJet, uses a process
called `write black', in which the laser is used to create black
spots on a white background.  The Raven engine used in the Xerox
2700 and in some DEC printers uses a process known as `write white':
the laser draws white spots on a black background.  Fonts designed
for write-black engines usually look thin and spidery when printed
on write-white engines, and the fonts that come with Unix TeX are
tuned for write-black engines.

Fortunately, current distributions of TeX also come with the METAFONT
program and the sources to the fonts themselves.  Those with
write-white engines can build METAFONT and create a `mode definition'
file for their printer, then rebuild all the fonts.  The task is
by no means painless, and there seems as yet to be no standard
write-white mode definition, but it can be done.  Unless. . . .
What happens if you have both a Canon-based printer and a Raven-based
printer?

My own solution to this, although the problem has not yet occurred
here, is a directive in the font configuration file.  My drivers
specify the appropriate engine type; my font lookup code matches
this against a `device specifier'.  A site in the situation described
above might include these lines in the configuration file:

	#	TYPE	SPEC	SLOP	PATH
	font	pk	canon	3	/usr/lib/tex/fonts/canon/%f.%mpk
	font	pk	raven	3	/usr/lib/tex/fonts/raven/%f.%mpk

An Imagen or PostScript driver would thus use Canon-tuned fonts,
while a Xerox 2700 driver would use Raven-tuned fonts.  `%f' and
`%m' turn into the base name of the font and the magnification,
such as `cmr10' and `300' for a 300 dpi rendition of CMR10.  The
specifier `*' matches anything:

	font	pk	*	3	/usr/lib/tex/fonts/%f.%mpk

will be used on any kind of print engine.

Another question that comes up often is that of printing less than
a full TeX document.  If you have changed only one page, or need
to examine only a specific figure or table, it seems wasteful to
have to print an entire paper.  Printing just the page of interest
is so much more sensible.  Because a particular driver must read
pages from a DVI file anyway, it seems reasonable to have the driver
do the page selection.  A number of drivers do this.  This is, I
think, a mistake.  Page selection is, in practise, used rather
rarely, and it is hard to do well:  For example, did you want the
tenth page, or page 10?  If there is a page ix, page 10 may be the
22nd page in the DVI file.  Following Murphy's Law, drivers that
allow page selection will probably choose the tenth page when you
wanted page 10, and vice versa.

But obviously page selection is useful.  The trick is that it can
be done outside the driver.  A DVI file already consists of a series
of pages; it is quite feasible to read one DVI file, select a
subset, and write a new DVI file consisting only of the subset
pages.  The new DVI file can be fed to any driver, whether or not
that driver implements page selection, and the DVI selection program
can be written to understand the difference between page 19 and
page xix, or to allow such esoteric selections as `all of chapter
four, and the first page of the index too'.  My ctex distribution
includes this dviselect program.

Other DVI-to-DVI transformations are possible and sensible.  After
splitting a file, you could concatenate the pieces in a different
order: `dviconcat' would concatenate a whole series of DVI files.
`dvisort' might rearrange pages for proper two-sided stacking.
`dvibooklet' could prepare a file so that it can be printed with
four logical pages per physical page, in such a way that several
8.5 by 11 inch pages could be folded down the center to make an
8.5 by 5.5 inch booklet.  (dvibooklet may be a special case of
dvisort.)  No doubt there are other possibilities that I have
missed.

I hope I have managed to clear up some questions and influence
future DVI driver writers to solve the right problems.  Incidentally,
my font routines are available for anyone who would like the
flexibility of handling GF, PK, and PXL formats and multiple
print engines.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris