F3U@PSUVM.BITNET (01/24/88)
I noticed that several people have requested the format of AppleWorks
file formats. Well, following is the result of several days during
the summer. This is just the description of the AWP file. Some
of the information in the header I haven't quite figured out yet, but
this information is enough to reconstruct a damaged file. With this
information I've written a program which automatically(well, almost)
fixes a damaged file.
Here is a description of an AppleWorks Word Processor file. I've
torn apart a file and have come up with the following information.
Format of an AWP file
Copyright (C) 1987-8 Frank Uzzolino
All rights reserved
A. Once upon a line...
A line of an AWP file can have one of two formats depending
on what kind of line it is. Format I is for printer option
codes such as NP, PW, an DS. Format II is for an actual line
of text.
Format I.
Each line consists of two bytes, a Count byte(1st) and a
Code byte(2nd). These are the only two bytes of the line.
Example:
Here is a string of bytes in the AWP file:
0A D9 0C DB 02 E5 00 E7
Which AppleWorks would show as:
-------Left Margin: 1.0 inches
-------Characters per Inch: 12 chars
-------Lines per Inch: 2 lines
-------Double Space
Following are all the printer option codes.
All bytes are in HEX unless otherwise specified. Don't forget!
Option Byte Hex Notes Option Hex Byte Notes
1st 2nd 1st 2nd
------ -------- ---------- ------ -------- ----------
PW nn D8 DS 00 E7
LM nn D9 TS 00 E8
RM nn DA NP 00 E9
CI ## DB GB 00 EA
P1 00 DC GE 00 EB
P2 00 DD HE 00 EC
IN ## DE FO 00 ED
JU 00 DF SK ## EE
UJ 00 E0 * PN ## EF Page < 256
CN 00 E1 PE 00 F0
PL nn E2 PH 00 F1
TM nn E3 SM ## F2
BM nn E4 * PN ## F3 Page >= 256
LI ## E5 * EOP ## F4 Page < 256
SS 00 E6 * EOP ## F5 Page >= 256
nn is 10th's of inches. (i.e. a line of 50 D8 means a Platen
width(D8) of 80 (decimal) tenth's of inches, or 8.0 inches)
## is an actual number. (i.e. 05 DE means Indent(DE) 5 spaces)
Note that the Code bytes FOLLOWS the Count byte.
* For codes EF,F3,F4 & F5, ## can mean different things. First, EOP
is the End of Page markers put in BY Appleworks when IT calculates
page breaks from a oa-p or oa-k. If the page is less that 256, the
code is F4 with ## representing the actual page number. If the page
is greater than or equal to 256, then the code is F5 with ## being
the actual page number minus 256. Likewise for PN, if the page is
less than 256, ## is the actual page number, and if greater that or
equal to 256, ## is page-256.
Codes F6 through FF are invalid (as far as I can tell).
A line of:
## D0 is a blank line with just ## spaces on it.
FF FF marks the end of the file.
-------End of Format I.
Format II.
This is a little more tricky. Each AWP line of text consists of four header bytes, followed by the actual text and character
formatting codes. Here goes:
This byte is the value of
byte 03, plus 2. So if byte
03 held a 3C, then this byte
_________________________________ would be 3E. This count does
| NOT include the hi bit setting
| of byte 03. So if 03 were C2,
| the actual count would be 42,
| and this byte would equal 44.
|
| This byte MUST be equal to
| ________________________ zero if this line is a valid
| | text line.
| |
| | This is the number of leading
| | spaces on the line. i.e. if
| | _______________ there were 15 blank spaces on
| | | this line before the actual
| | | text, this byte would be 0F.
| | |
| | | This is the actual # of char-
| | | acters in the line, which in-
| | | cludes any character formatting.
| | | If this line is the last line
| | | ______ of the paragraph, then the high
| | | | bit is set. So if there were
| | | | 4E characters in the line, then
| | | | this byte would be CE.
| | | | Start
| | | | of
| | | | These are the real bytes next
| | | | of the line. line
------ ------ ------ ------ ------ ------ // ------ ------
Offset: 00 01 02 03 04 05 nn 00
Valid line characters:
Any ASCII character in the range from $20 to $7E are valid characters
in an Word Processing file. In addition to these bytes, the values
$01 to $0C are used as character formatting codes as described below.
01 - Boldface Begin
02 - Boldface End
03 - SuperScript Begin
04 - SuperScript End
05 - SubScript Begin
06 - SubScript End
07 - Underline Begin
08 - Underline End
09 - Print Page Number
0A - Enter Keyboard
0B - Sticky Space
0C - Mail Merge
If Omit Lines is yes, then the category is enclosed by [].
If Omit Lines is no, then the category is enclosed by <>.
Auxiliary Type:
The Auxiliary Type field of the directory holds a special code
which specifies how the file name is displayed in the AppleWorks
list. Since ProDOS allows for only Capital letters, numbers, and
the period in a file name, some conversion is needed to convert
the spaces and lowercase characters into a valid ProDOS filename.
AUX_TYPE: hi byte lo byte
-- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Char posn: 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
If a bit is set(1) in a character position, then there are two
possibilities:
1. Convert a space in AppleWorks name to a period
in the ProDOS file name.
2. Convert a lowercase character in AppleWorks name
to uppercase character in the ProDOS file name.
If the bit is clear(0) in a character position, the character is
already either a capital letter or a period and does not need converting.
That's all folks...
Comming attractions "In the 'Works'"
Part II : As the Record Turns (ADB file format)
Part III : Death of a Cellsman (ASP file format) (maybe)
Oh, by the way, this work is my own, and does in no way
reflect the views of the University, my parents, or my
roommate.
-------
**********************************************************************
* Frank Uzzolino Penn State University *
* F3U at PSUVM (prefered) 7 Hamilton Hall *
* (or)at PSUVMB University Park, PA 16802 *
**********************************************************************