F3U@PSUVM.BITNET (01/24/88)
I noticed that several people have requested the format of AppleWorks file formats. Well, following is the result of several days during the summer. This is just the description of the AWP file. Some of the information in the header I haven't quite figured out yet, but this information is enough to reconstruct a damaged file. With this information I've written a program which automatically(well, almost) fixes a damaged file. Here is a description of an AppleWorks Word Processor file. I've torn apart a file and have come up with the following information. Format of an AWP file Copyright (C) 1987-8 Frank Uzzolino All rights reserved A. Once upon a line... A line of an AWP file can have one of two formats depending on what kind of line it is. Format I is for printer option codes such as NP, PW, an DS. Format II is for an actual line of text. Format I. Each line consists of two bytes, a Count byte(1st) and a Code byte(2nd). These are the only two bytes of the line. Example: Here is a string of bytes in the AWP file: 0A D9 0C DB 02 E5 00 E7 Which AppleWorks would show as: -------Left Margin: 1.0 inches -------Characters per Inch: 12 chars -------Lines per Inch: 2 lines -------Double Space Following are all the printer option codes. All bytes are in HEX unless otherwise specified. Don't forget! Option Byte Hex Notes Option Hex Byte Notes 1st 2nd 1st 2nd ------ -------- ---------- ------ -------- ---------- PW nn D8 DS 00 E7 LM nn D9 TS 00 E8 RM nn DA NP 00 E9 CI ## DB GB 00 EA P1 00 DC GE 00 EB P2 00 DD HE 00 EC IN ## DE FO 00 ED JU 00 DF SK ## EE UJ 00 E0 * PN ## EF Page < 256 CN 00 E1 PE 00 F0 PL nn E2 PH 00 F1 TM nn E3 SM ## F2 BM nn E4 * PN ## F3 Page >= 256 LI ## E5 * EOP ## F4 Page < 256 SS 00 E6 * EOP ## F5 Page >= 256 nn is 10th's of inches. (i.e. a line of 50 D8 means a Platen width(D8) of 80 (decimal) tenth's of inches, or 8.0 inches) ## is an actual number. (i.e. 05 DE means Indent(DE) 5 spaces) Note that the Code bytes FOLLOWS the Count byte. * For codes EF,F3,F4 & F5, ## can mean different things. First, EOP is the End of Page markers put in BY Appleworks when IT calculates page breaks from a oa-p or oa-k. If the page is less that 256, the code is F4 with ## representing the actual page number. If the page is greater than or equal to 256, then the code is F5 with ## being the actual page number minus 256. Likewise for PN, if the page is less than 256, ## is the actual page number, and if greater that or equal to 256, ## is page-256. Codes F6 through FF are invalid (as far as I can tell). A line of: ## D0 is a blank line with just ## spaces on it. FF FF marks the end of the file. -------End of Format I. Format II. This is a little more tricky. Each AWP line of text consists of four header bytes, followed by the actual text and character formatting codes. Here goes: This byte is the value of byte 03, plus 2. So if byte 03 held a 3C, then this byte _________________________________ would be 3E. This count does | NOT include the hi bit setting | of byte 03. So if 03 were C2, | the actual count would be 42, | and this byte would equal 44. | | This byte MUST be equal to | ________________________ zero if this line is a valid | | text line. | | | | This is the number of leading | | spaces on the line. i.e. if | | _______________ there were 15 blank spaces on | | | this line before the actual | | | text, this byte would be 0F. | | | | | | This is the actual # of char- | | | acters in the line, which in- | | | cludes any character formatting. | | | If this line is the last line | | | ______ of the paragraph, then the high | | | | bit is set. So if there were | | | | 4E characters in the line, then | | | | this byte would be CE. | | | | Start | | | | of | | | | These are the real bytes next | | | | of the line. line ------ ------ ------ ------ ------ ------ // ------ ------ Offset: 00 01 02 03 04 05 nn 00 Valid line characters: Any ASCII character in the range from $20 to $7E are valid characters in an Word Processing file. In addition to these bytes, the values $01 to $0C are used as character formatting codes as described below. 01 - Boldface Begin 02 - Boldface End 03 - SuperScript Begin 04 - SuperScript End 05 - SubScript Begin 06 - SubScript End 07 - Underline Begin 08 - Underline End 09 - Print Page Number 0A - Enter Keyboard 0B - Sticky Space 0C - Mail Merge If Omit Lines is yes, then the category is enclosed by []. If Omit Lines is no, then the category is enclosed by <>. Auxiliary Type: The Auxiliary Type field of the directory holds a special code which specifies how the file name is displayed in the AppleWorks list. Since ProDOS allows for only Capital letters, numbers, and the period in a file name, some conversion is needed to convert the spaces and lowercase characters into a valid ProDOS filename. AUX_TYPE: hi byte lo byte -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Char posn: 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 If a bit is set(1) in a character position, then there are two possibilities: 1. Convert a space in AppleWorks name to a period in the ProDOS file name. 2. Convert a lowercase character in AppleWorks name to uppercase character in the ProDOS file name. If the bit is clear(0) in a character position, the character is already either a capital letter or a period and does not need converting. That's all folks... Comming attractions "In the 'Works'" Part II : As the Record Turns (ADB file format) Part III : Death of a Cellsman (ASP file format) (maybe) Oh, by the way, this work is my own, and does in no way reflect the views of the University, my parents, or my roommate. ------- ********************************************************************** * Frank Uzzolino Penn State University * * F3U at PSUVM (prefered) 7 Hamilton Hall * * (or)at PSUVMB University Park, PA 16802 * **********************************************************************