FATQW@USU.BITNET (02/03/88)
A R C ARC/FLST/TREE An IFF File Archive Format prepared by Bryan Ford There are three sections in this document. The first one describes the ARC form, the second describes the FLST form, and the third one describes the TREE form. Each are independent, but they "make sense" more when they are put together. It is the almost unlimited nesting and expansion capability of the IFF format that makes this file format possible. Thanks EA! An archive file is made up of zero or more ARC chunks, with FLST chunks as their "children." In other words, the ARC chunks are the tree and branches, while the FLST chunks are the "leaves". Also, a TREE may be included in the beginning of an ARC to provide information about the archive which would normally require seeking through the entire archive. In addition, archives may be spread across multiple disks or other media for backup purposes. Section 1: FORM ARC - Archive The ARC form is a form for collecting more than one file into one file. It can also specify subdirectories to be created before it is unarced, and it can contain nested FORM ARCs as well as FLSTs. A TREE chunk may be put at the beginning of an ARC which would describe the content of the file without having to scan the entire archive. SBDR - Subdirectory. This chunk contains the same information as the SPEC chunk in the FORM FLST, specifying a subdirectory for this ARC to be unarced into. If the specified subdirectory does not already exist, the unarcing program will create it in the directory specified by the user (or the directory that a parent ARC was unarced into). However, unarcing programs should have the capability of overriding this and unarcing into the specified directory without looking at this. This chunk may or may not be included in the top ARC form, but it MUST be included in all sub-ARCs. SORT - Sorting. This chunk tells the unarchiver how the files are sorted. It contains one word of data specifying the sorting used: 0=no sorting, 1=filename, 2=date, 3=size, 4=sorting algorithm used,5=percentage of space freed by compression. If there is no SORT chunk present then it is assumed that the file is not sorted. This must ALWAYS reflect the truth. For example, if an unarchiver finds an archive with a SORT of 1, and it's looking for A*.*, then it needs not look anymore as soon as it gets to a B or whatever. Also remember that this sorting applies to this ARC and all sub-ARCs within this one, but no others. ANAM - Archiver name. This contains a null-terminated string telling the name of the program that created this archive. This chunk, if included at all, should only be included in the top-level ARC chunk. It is provided only for the benifit of the user. Unarchivers will simply ignore this chunk, except to print its contents at the user's request. Archivers may or may not include this chunk, according to the programmer. NEXT - Continuation chunk. This chunk is provided in order to split archives up over several volumes for backup purposes. It is used in combination with the PREV chunk. It has a null-terminated string giving the complete path to the next continuation file. Each file is completely separate, except they are linked with the NEXT and PREV chunks in the FORM ARC, and the files may be split up between archives, making it necessary to have access to more than one of the archives in order to extract one file. PREV - Previous chunk. This chunk points to the previous archive file in the chain. The NEXT chunk in the file pointed to by this PREV points to this file. In other words, the files form a doubly-linked list. FORM TREE - The archive tree. There may be only one of these in any archive, although it's not required. If included, it should be in the top-level ARC, and there should be one for every archive in a chain (see NEXT), describing the files it contains. They are provided to speed up access to the files in the archive by providing useful information at the beginning. This FORM must be before any ARCs or FLSTs. The format for the TREE is described in section 3. (Also from Miles Johnson) FORM ARC - A child ARC. An ARC may contain other ARCs. This is useful for "sub-archives" which, when unarced, go into various directories automatically. A child ARC doesn't necessarily need a SBDR chunk, but it makes little sense otherwise. FORM FLST - Files. These chunks contain the actual files which make up the archive. These are not necessary for an archive, and in some cases this may be useful. For one example, maybe an archive wants to create several subdirectories which contain files, but no files in the root. As another example, you may want a child ARC to be completely empty except for a SBDR chunk - for example a "saved games" directory without any saved games. In other words, this may be useful for just creating directories to be used later, but not put any files in it. Section 2: FORM FLST - File These FORMs contain files, and make up the archive "meat". Each may contain several chunks described below. They may be split up into multiple parts for data recovery purposes, as well as split over volumes to facilitate making backups. SPEC - Filespec. This chunk contains one longword length of the decoded image, a three longword DateStamp (days, mins, ticks), a longword for the protection bits, and a null-terminated filename. This is the only required chunk in the FORM FLST. The date is not changed by archiving or unarcing a file, only modifying it changes the date. CMNT - Comment. This is a null-terminated string containing any comment for the file. It should never be null - if the file doesn't have a comment, then this block shouldn't appear. SPLT - File splitting. This chunk is only required if the file has been split in any way. If it is not included, the default is zero for the first and third words, and one for the second (middle). It contains three words of data: the number of BODY chunks in previous archive files (denoted by the PREV chunk in the top ARC), the number of BODY chunks in this archive file, and the number of sections in archive files after this one (denoted by the NEXT chunk in the top ARC). This is actually a replacement for the older SECT chunk. It is used to split files into smaller, more manageable parts, and to split files between volumes, for backup purposes. If the first and third words are zero, this file has been split up, but not between volumes. If one or both is nonzero, it means that this file has been split up over one or more separate archive files, so to extract this file an unarchiver will have to access more than one archive. If the first word is nonzero, this file must be the first in this archive. If the last word is nonzero, it must be the last in the archive. Both are possible at the same time. They indicate how many BODY chunks come before this archive, in it, and after it. The unarchiver should use the NEXT and PREV chunks in the top ARC to find the other parts to this file. Note: If a file is split up over several archive files, the same header chunks MUST be exacly duplicated in all the parts. This includes NAME, LEN, CMNT, PSWD, DATE, and all other header chunks except those used for controlling the splitting. Splitting up files even in only one archive can be very valuable for data recovery. If part of a file is munged, the rest of the file may be salvaged. For example, an archiving program can break up the file into multiple compression chunks, and include a SPLT chunk with the number of compression chunks. Each compression chunk will contain, say, one page of data. Archivers should try to be intelligent when they split a file. For example, if it's a text file, split it after each page break. If it's an IFF file, split it between the chunks. If one part is bad, the rest will get put together, so the user might still get part of the file back. PROT - File protection. Warning: Hot subject. This chunk signifies to an unarchiver that this file is password protected, and it should let the user enter a password and then decode the file according to the password entered. The password is not actually stored anywhere in the file, so hackers can't write programs which simply look through the archive and print the passwords. Crackers have to "crack". Anyway, this contains one longword, which is an identifier for the type of encoding used, followed optionally by data which applies to the particular encoding method selected. When the user enters a password, the unarchiver will decode the file according to the password entered - if the password is wrong, the file will simply be decoded as garbage. Passwords must be mapped to uppercase by archivers and unarchivers. As of yet, I don't know of any specific methods of encrypting files, so it's up to you guys to fill in the blanks! CRC - CRC check. This chunk was modified from its original definition to accomodate multiple program sections. The chunk contains as many words of data as there BODY chunks in this archive file - one CRC for each section. Note that this includes only the BODY chunks included within this archive file, and not any that are in other archives, if the file is split between volumes. If there are too many CRC words, an unarchiver will ignore the rest. If there are too few, the unarchiver will check only the sections with CRCs supplied, and possibly give a warning to the user. If there is no CRC chunk, no checking will be done. FORM ILBM - Icons. This is a standard ILBM picture chunk describing an icon for this file, if any. This form may also have an ARC-specific property chunk, SPEC, which has exacly the same format as the SPEC chunk described above, but without the filename. It contains the date, protection, etc. for the icon file. Also, a CMNT chunk may appear in the icon chunk which is the comment for the .info file. BODY - These chunks comprise the actual data of the file, or the data of one section of the file. There will be only one BODY chunk unless the file is broken up into sections (see SPLT for details). This chunk always contains one longword at the beginning, which is an identifier for the compression format, and the compressed bytes of the file. The actual data of the file will depend on the compression used. The predefined compression algorithms are defined below. BODY/STOR - Storage without compression. This is usually used for very small files which would not gain anything in compression. The chunk's data is an exact duplicate of what will go into the file. BODY/PACK - Packing. This algorithm is the simplest, and simply sticks repetitious bytes together. BODY/LZIV - Lempel-Ziv encoding. This contains one byte which tells the "number of bits" used, followed by the data. Typically 12 (crunching) or 13 (squashing) bits. As of yet, I don't have the docs for this format, but as soon as I get them, I'll include them in a different doc file. BODY/HUFF - Huffman encoding. I don't have any docs for this one either, so I welcome any mail containing docs on this. BODY/TPAK - Text packing. This is my own format which I'm still working on. It will be specifically for documents and other human-type text. It will be able to crunch large documents down by a huge amount, but small ones won't do so well. It only works with text, as the 7th bit gets stripped, and it won't handle "words" more than 255 characters in length. Comments welcome as soon as I finish and post it, but not until then. :-) When better compression algorithms come out, they may be added to the BODY specification. However, remember that old archivers won't be able to use new compression formats. Also, archiving programs aren't required to analyze files to make sure they're getting the maximum efficiency. A user may want to just always use 12-bit Lempel-Ziv encoding, since it's the most often used. Archivers probably should have an option to disable analyzing the file. Section 3: FORM TREE - Directory and file trees This form may appear only in a top-level ARC. When used in an archive, it describes the content of that archive, and makes scanning the archive much faster. It may also be useful alone for creating directory trees. The chunks that apply specifically to ARCs are noted. DRNM - Direcotry name. This contains a null-terminated string giving the name of this volume or directory. It must appear before any other chunks in the FORM TREE. It is required for all sub-trees, but not required (although recommended) for the top-level TREE. FILE - Filespec. This chunk contains one longword containing the length of the file followed by a null-terminated string which names a file in this directory or volume. It may be followed by any of the following "modifier" chunks which describe the file in more detail. POS - Position. This chunk is valid only when used in archive files. It contains one longword of data, which is the absolute offest from the beginning of this archive file which the FORM FLST for this file will be found. This also may be used after subdirectory descriptions (FORM TREEs), which tell where in the archive file the FORM ARC will be found. Note that if there is any suspicion of data corruption, this should NOT be used by an unarchiver, since it uses absolute references into the file. Also, the user should have the option to disable both using these chunks when found, and writing them to files. FORM TREE - Subdirectories. This is describes a subdirectory within this directory. The name can be gotten from the NAME field within this FORM. Although this document is not copyrighted or anything, please don't redistribute it very much. This is because it's only a draft, and it will probably get changed, and we want EVERYONE to have the same thing. So if you decide to send it to local BBSs or something, please take the responsibility of updating them too. Thanks. Please feel free to email suggestions for this file. If someone would like to volunteer to cross-post this discussion to BIX, I would appreciate it, since I don't have access to BIX. And if anybody has the docs for Lempel-Ziv and Huffman encoding, or any other "interesting" formats, I'd appreciate it in my mailbox (address below). History date author changes -------- ------------------ --------------------------------------------- ???? Bryan Ford Gave birth to this file 01/08/88 Bryan Ford +ANAM SECT, ~CRC 01/17/88 Bryan Ford BODY=STOR+CRNC+PACK+SQEZ+SQSH, :-)8 01/20/88 Bryan Ford +FORM_TREE NEXT PREV CMNT DATE, SPLT=~SECT, :-)8 01/22/88 Bryan Ford SPEC=NAME+LEN, -LEVL, +PROT, BODY_HUFF=BODY_SQEZ BODY_LZIV=BODY_CRNC+BODY_SQSH, +AICN 01/26/88 Bryan Ford SPEC=SPEC+DATE, ~AICN 02/02/88 Bryan Ford ~SPEC, ~SBDR, ~FORM_ILBM=~AICN, +BODY_TPAK, +SORT, :-)8 . If you want to know what this means, it's a shorthand /|\ for English which I created - I can send mail to | curious people. -Bryan | THE END Bryan Ford ///// A computer does what \\\\\ Snail: 1790 East 1400 North ///// you tell it to do, not \\\\\ Logan, UT 84321 \\\XX/// what you want it to do. \\\XX/// Email: USU@FATQW.BITNET \XXXX/ Murphy's Law Calender 1986 \XXXX/