[net.text] Format of Xerox 2700 laser printer down-loaded fonts

tim@ubu.UUCP (Tim Clark) (12/10/84)

The University of Warwick Computer Unit are running a Xerox Corp. 2700
laser printer, connected to a VAX 11/780, running 4.2bsd. The software we run
on the VAX is `xroff'. Xroff is basically ditroff with a back-end for the
Xerox 2700, available from Image Network, 770 Mahogany Lane, Sunnyvale, CA
94086, phone (408) 746-3754. We gave up developing our own ditroff back-end
for the 2700 - it's not bit-mapped, and difficult to get anything complex out
of it. The Image Network (xroff) software seems to get up to some very clever
tricks to squeeze the most out of the 2700.

I needed to be able to decode the Xerox 2700 down-load font information,
in order to be able to add characters to fonts, and convert fonts from
other sources. Attempts to get hold of the information on the format have
proved difficult - so a little cryptology was called for. I've almost got the
format cracked, but there are a few things not yet decoded. I would love to
hear from anyone who can let me have the description of the format, or who can
fill in the gaps in what I've worked out so far.

Xroff holds its font files in a very similar form to what has to be
down-loaded to the 2700, except they are in binary rather than the
six-bit (converted to ASCII printable characters) format which is down-loaded.
I suspect there are some small differences between the format of xroff's font
database and the down-loaded information - but I haven't got access to the
source of xroff.

Any way, here's what I've managed to work out so far:-

In the description below the term "byte" is hopefully non-ambiguous, the term
"word" is used to describe a 16-bit item, interpretted in the PDP-11 sense
(i.e. low significance byte first, high significance byte second). Xerox
obviously did their development work around a machine which ordered words
in this fashion. Where bits are numbered, bit 0 is the least significant
bit of a word or byte, with bit 15 or bit 7 being the most significant bit of
a word or byte (respectively).

If a value is given in hex, it will be sourrounded by square brackets thus,
[0102]. Other values should be taken as decimal.

1. Producing the binary file
----------------------------
The Xerox fonts are down line loaded as a series of characters in the range
question mark to tilde. These have ASCII values 63 to 126. Take 63 off the
decimal value of each character, yielding a value in the range 0 to 63. The
resulting 6-bit quantities are taken four at a time to yield three 8-bit
quantities. The result of storing these in sequence is the binary
representation required.

2. Format of the binary file
----------------------------
The file consists of five sections:

        1. The header block
        2. A qualification? table
        3. The look-up table
        4. The bit patterns of the characters
        5. The trailer

3. The header block
-------------------
This is 24 words long

       word            use
        0       value [AAAA] as identification

        1       not fully solved yet, but bit 8 is 0 for a portarit
                font, and 1 for a landscape font, bit 10 is set if the
                (binary) file is more than [FFFF] bytes long.

        2       the least significant 16 bits of the length of the
                (binary) file. Normally 16 bits is enough to hold the
                length of the file.

        3-12    the name of the font in ASCII characters.

        13      unknown

        14      if bit 10 of word 1 is set, then bits 0-7 of this word
                contain the most significant 4 bits of the length of the
                (binary) file, expressed as a 24-bit item.

        15-19   unknown

        20-23   the revision level of the font in ASCII characters.
                (Believed to be for commentary purposes only).

4. The qualification table (perhaps)
------------------------------------
256 bytes long

  This follows immediately after the header block.

The meaning of this table is unknown, it is entirely zero in most of the
fonts we have, and mostly zero in the remainder. My guess is the
information in here is one byte per character in the font, and somehow
qualifies the information for that character.

5. The look-up table
--------------------

  This follows immediately after the qualification table. It consists of
4-word entries, one per character in the font, and is terminated by an
entry of all zeroes. The look up table is used by subtracting 32 from the
value of the ASCII character, and using that as an index. Thus space is index
0, exclamation mark is index 1, double quote 2, etc.

(below, msb denotes most significant byte, and lsb least significant byte.
Remember, however the PDP-11 style of byte ordering, so that the lsb is first
and the msb second. Though when the word is represented in hex the msb will
obviously be the left-pair and the lsb the right-pair)

       word             use

        0       the length of the pattern, in bytes.

        1       the lowest 16 bits of the location of the pattern,
                expressed as a 24-bit quantity, which is the offset into
                the (binary) file in bytes.

        2       msb - call this value "w". If w is zero, then there is
                      no pattern information. Otherwise, the value 63-w
                      gives the number of bytes in each row of the
                      character pattern.

                lsb - the highest 8 bits of the pattern location offset.
                      (see word 1). The value [FF] is sometimes found
                      here, and appears equivalent to [00] !

        3       msb - the amount to move the typing position on after
                      drawing this character, in units of rasters.

                lsb - in two's complement. How much the character is
                      above (or if -ve below) the baseline (portrait)
                      or from typing position (landscape). Units as
                      yet unknown (could be double or triple rasters).

6. The bit patterns of the characters
-------------------------------------

After the look up table are the patterns themselves, these are interpretted
according to the information in the look up table.

7. The trailer
--------------

After the last indexed character pattern the file is padded out with words of
value [5555].

8. Example
----------

Take the `Kosmos10-P' as an example.

The 24 word header is as follows in hex

[ aaaa 0402 2916 6f4b 6d73 736f 3031 502d ]
[ 2020 2020 2020 2020 2020 0404 040f 0000 ]
[ 000a 0024 0032 bd20 3252 3432 3931 2020 ]

word 0 is [AAAA], word 2 is the length of the (binary) file, [2916] or
10518 bytes, words 3-12 give the name, "Kosmos10-P          ", and words 20-23
give the revision, "R22419  ".

The qualification table is all zeros, and the look-up table starts with:

[ 0002 0620 3DFF 1200 ]

This indicates that the space character (ASCII SP value 32 - 32 = 0) has a
pattern of length 2 bytes, starting at byte [620] or 1568. The value [3D]
shows that is comprised of rows two bytes wide. The important thing here,
is that it should advance the typing position by [12] or 18 rasters, and it is
drawn on the baseline with no offset [00].

Looking into the file at offset 1568, we find (not suprisingly!) [00 00], i.e.
the pattern drawn is blank.

Looking further down the look-up table at the 33rd entry (ASCII A
value 65 - 32 = 33) we get:

[ 0074 106A 3BFF 1E00]

This means the pattern is of length [0074] or 116 bytes, starting at byte
[106A] or 4202. The value [3B] shows that it is comprised of rows four
bytes wide. It advances the typing position by [1E] or 30 rasters, and is
drawn on the baseline with no offset [00].

Looking into the file at offset 4202, and taking four bytes per row we get:

e0000000 ***
f8000000 *****
ff000000 ********
ffc00000 **********
fff80000 *************
1ffe0000    ************
07ffc000      *************
01fff000        *************
01fffe00        ****************
01e7ff80        ****  ************
01e1ffe0        ****    ************
01e03ff8        ****       ***********
01e007fc        ****          *********
01e001fc        ****            *******
01e0007c        ****              *****
01e001fc        ****            *******
01e007fc        ****          *********
01e03ff8        ****       ***********
01e1ffe0        ****    ************
01e7ff80        ****  ************
01fffe00        ****************
01fff000        *************
07ffc000      *************
1ffe0000    ************
fff80000 *************
ffc00000 **********
ff000000 ********
f8000000 *****
e0000000 ***

Tim Clark, Computer Unit, University of Warwick, Coventry, UK, CV4 7AL
phone: +44 203 24011 ext 2357
...ihnp4!cfg!ukc!ubu!tim
Arpa: T.CLARK%WKPA@UCL-CS