[net.micro.mac] Word Document Format

kc@rna.UUCP (Kaare Christian) (03/26/85)

I've recently acquired Microsoft WORD for the mac and I'm starting to
use it instead of MacWrite.  The virtues of Word are another story -
they've been discussed here and elsewhere.

One of the best features of MacWrite is a great program called
Write2Troff (w2t) that appeared on the net a few months ago.  W2t
converts a MacWrite document into a troff document, thus one can print
MacWrite documents using a laser printer or photo-typesetter attached
to a Unix machine.  Unfortunately the Word environment is missing this
key feature.  Is anybody working on a similar program for WORD? A grad
student here at Rockefeller U. desperately needs this program. Respond
immediately if you have any leads.

I've investigated the internal format of Word files.  It's unlike any
other word processor document file that I've investigated.  Word files
contain three sections: a fixed length binary header, the text, and a
variable length binary format trailer. The header is somewhat
decipherable - for example the length of the text is encoded in one of
the first few words.  The text section of a word document file is
completely clean - it simply contains carriage return delimited
paragraphs.  There aren't any embedded control codes.  The font, point
size, margins, and other format information appears to follow the text
in the trailing binary record.  Thus one must decode the format of the
trailing binary record in order to recover the formatting information
from the text.

The binary record is mysterious.  Perhaps it is encoded, or otherwise
processed to make life difficult.  (Why?) I saved a short file twice,
using the name 'a' the first time and the name 'b' the second. The text
sections were the same, but the trailing binary records were vastly
different. This implies that there is encryption, or perhaps that there
is random noise hiding the formatting information, or something even
more devious. Does anyone know the format of these things?

Question 3.  Word (version 2) on the PC includes a very nice program
that allows you to make your own printer drivers.  You can decode an
existing printer driver, change what you want, and then save the new
driver.  Its easy to use (for a programmer) and it is very powerful.
Enough hooks are provided to add a custom driver for a laser printer
such as our QMS.  Does anyone know the format of the word printer
driver files on the mac?  Are there any tools for building your own
mac printer drivers?  The PC version of word allows you to output word
documents in a printer independent manner, with complete positioning
information in the output.  It resembles (perhaps its a rip-off of) the
output of titroff. Is there any similar facility for the mac?

The microsoft customer hot-line (its not very hot - no 800 number)
wasn't any help with these questions.  I didn't think they would be,
but it was worth a call.  Are any of you netlanders able to help?
Microsoft, I know you're listening?  I'd be glad to serve as a
clearinghouse for any info I receive. Replies should probably go
directly to me unless they are of interest to the entire civilized world.

Thanks, and happy decoding.

Kaare Christian   cmcl2!rna!kc  212-570-7672   1230 E. 63rd. NYC, NY 10021

kc@rna.UUCP (Kaare Christian) (07/03/85)

During the past few months several people have enquired about the internal
format of Microsoft Word document files on the mac.  Until now there has
not been a satisfactory method of decoding these files.

Yesterday I received my Word 1.05 update kit.  The update fixes a few
miscellaneous bugs, it improves the support for the LaserPrinter, and it
contains a new convert utility that can convert mac Word documents to PC Word
format and vice versa. Although Word's macintosh document format is a 
mystery, the document format on the PC is much more understandable.  Thus
this convert utility may serve as an important first step in decoding Word
documents on the macintosh.

I haven't tried this yet, and I probably won't for a week or so because of
the holiday.

Kaare Christian
cmcl2!rna!kc