[comp.sys.apple] AppleWorks Word Processor file format follows

F3U@PSUVM.BITNET (01/24/88)

I noticed that several people have requested the format of AppleWorks
file formats.  Well, following is the result of several days during
the summer.  This is just the description of the AWP file.  Some
of the information in the header I haven't quite figured out yet, but
this information is enough to reconstruct a damaged file.  With this
information I've written a program which automatically(well, almost)
fixes a damaged file.
     
Here is a description of an AppleWorks Word Processor file.  I've
torn apart a file and have come up with the following information.
     
     
                         Format of an AWP file
                  Copyright (C) 1987-8 Frank Uzzolino
                          All rights reserved
     
     
A.  Once upon a line...
     
    A line of an AWP file can have one of two formats depending
on what kind of line it is.  Format I is for printer option
codes such as NP, PW, an DS.  Format II is for an actual line
of text.
     
    Format I.
     
    Each line consists of two bytes, a Count byte(1st) and a
Code byte(2nd). These are the only two bytes of the line.
     
    Example:
     
    Here is a string of bytes in the AWP file:
     
    0A D9 0C DB 02 E5 00 E7
     
    Which AppleWorks would show as:
     
    -------Left Margin: 1.0 inches
    -------Characters per Inch: 12 chars
    -------Lines per Inch: 2 lines
    -------Double Space
     
Following are all the printer option codes.
     
All bytes are in HEX unless otherwise specified.  Don't forget!
     
   Option   Byte Hex  Notes        Option   Hex Byte  Notes
            1st  2nd                        1st  2nd
   ------   --------  ----------   ------   --------  ----------
     PW     nn   D8                  DS     00   E7
     LM     nn   D9                  TS     00   E8
     RM     nn   DA                  NP     00   E9
     CI     ##   DB                  GB     00   EA
     P1     00   DC                  GE     00   EB
     P2     00   DD                  HE     00   EC
     IN     ##   DE                  FO     00   ED
     JU     00   DF                  SK     ##   EE
     UJ     00   E0                * PN     ##   EF   Page < 256
     CN     00   E1                  PE     00   F0
     PL     nn   E2                  PH     00   F1
     TM     nn   E3                  SM     ##   F2
     BM     nn   E4                * PN     ##   F3   Page >= 256
     LI     ##   E5                * EOP    ##   F4   Page < 256
     SS     00   E6                * EOP    ##   F5   Page >= 256
     
nn is 10th's of inches.  (i.e.  a line of 50 D8 means a Platen
width(D8) of 80 (decimal) tenth's of inches, or 8.0 inches)
     
## is an actual number.  (i.e.  05 DE means Indent(DE) 5 spaces)
     
Note that the Code bytes FOLLOWS the Count byte.
     
* For codes EF,F3,F4 & F5, ## can mean different things.  First, EOP
  is the End of Page markers put in BY Appleworks when IT calculates
  page breaks from a oa-p or oa-k.  If the page is less that 256, the
  code is F4 with ## representing the actual page number.  If the page
  is greater than or equal to 256, then the code is F5 with ## being
  the actual page number minus 256. Likewise for PN, if the page is
  less than 256, ## is the actual page number, and if greater that or
  equal to 256, ## is page-256.
     
Codes F6 through FF are invalid (as far as I can tell).
     
A line of:
          ## D0 is a blank line with just ## spaces on it.
          FF FF marks the end of the file.
     
-------End of Format I.
     
    Format II.
     
    This is a little more tricky.  Each AWP line of text consists of four header bytes, followed by the actual text and character
formatting codes. Here goes:
     
                                            This byte is the value of
                                            byte 03, plus 2.  So if byte
                                            03 held a 3C, then this byte
          _________________________________ would be 3E.  This count does
          |                                 NOT include the hi bit setting
          |                                 of byte 03.  So if 03 were C2,
          |                                 the actual count would be 42,
          |                                 and this byte would equal 44.
          |
          |                                 This byte MUST be equal to
          |        ________________________ zero if this line is a valid
          |        |                        text line.
          |        |
          |        |                        This is the number of leading
          |        |                        spaces on the line.  i.e. if
          |        |        _______________ there were 15 blank spaces on
          |        |        |               this line before the actual
          |        |        |               text, this byte would be 0F.
          |        |        |
          |        |        |               This is the actual # of char-
          |        |        |               acters in the line, which in-
          |        |        |               cludes any character formatting.
          |        |        |               If this line is the last line
          |        |        |        ______ of the paragraph, then the high
          |        |        |        |      bit is set.  So if there were
          |        |        |        |      4E characters in the line, then
          |        |        |        |      this byte would be CE.
          |        |        |        |                              Start
          |        |        |        |                                of
          |        |        |        |     These are the real bytes  next
          |        |        |        |     of the line.              line
       ------   ------   ------   ------   ------  ------ // ------ ------
     
Offset:  00       01       02       03       04      05         nn    00
     
Valid line characters:
     
Any ASCII character in the range from $20 to $7E are valid characters
in an Word Processing file.  In addition to these bytes, the values
$01 to $0C are used as character formatting codes as described below.
     
     01 -   Boldface Begin
     02 -   Boldface End
     03 -   SuperScript Begin
     04 -   SuperScript End
     05 -   SubScript Begin
     06 -   SubScript End
     07 -   Underline Begin
     08 -   Underline End
     09 -   Print Page Number
     0A -   Enter Keyboard
     0B -   Sticky Space
     0C -   Mail Merge
            If Omit Lines is yes, then the category is enclosed by [].
            If Omit Lines is no, then the category is enclosed by <>.
     
     
Auxiliary Type:
     
The Auxiliary Type field of the directory holds a special code
which specifies how the file name is displayed in the AppleWorks
list.  Since ProDOS allows for only Capital letters, numbers, and
the period in a file name, some conversion is needed to convert
the spaces and lowercase characters into a valid ProDOS filename.
     
AUX_TYPE:              hi byte                       lo byte
                -- -- -- -- -- -- --       -- -- -- -- -- -- -- --
     Char posn:  9 10 11 12 13 14 15        0  1  2  3  4  5  6  7
     
     If a bit is set(1) in a character position, then there are two
possibilities:
     
                 1.  Convert a space in AppleWorks name to a period
                     in the ProDOS file name.
                 2.  Convert a lowercase character in AppleWorks name
                     to uppercase character in the ProDOS file name.
     
     If the bit is clear(0) in a character position, the character is
already either a capital letter or a period and does not need converting.
     
That's all folks...
     
     
Comming attractions "In the 'Works'"
     
            Part II  : As the Record Turns (ADB file format)
            Part III : Death of a Cellsman (ASP file format) (maybe)
     
     
Oh, by the way, this work is my own, and does in no way
reflect the views of the University, my parents, or my
roommate.
     
-------
**********************************************************************
*  Frank Uzzolino                       Penn State University        *
*  F3U at PSUVM (prefered)              7 Hamilton Hall              *
*  (or)at PSUVMB                        University Park, PA  16802   *
**********************************************************************