FATQW@USU.BITNET (01/24/88)
A R C ARC/FLST/TREE An IFF File Archive Format by Bryan Ford plus Lots of people other people Thanks for your comments! There are three sections in this document. The first one describes the ARC form, the second describes the FLST form, and the third one describes the TREE form. Each are independent, but they "make sense" more when they are put together. It is the almost unlimited nesting and expansion capability of the IFF format that makes this file format possible. Thanks EA! An archive file is made up of zero or more ARC chunks, with FLST chunks as their "children." In other words, the ARC chunks are the tree and branches, while the FLST chunks are the "leaves". Also, a TREE may be included in the beginning of an ARC to provide information about the archive which would normally require seeking through the entire archive. In addition, archives may be spread across multiple disks or other media for backup purposes. Section 1: FORM ARC - Archive The ARC form is a form for collecting more than one file into one file. It can also specify subdirectories to be created before it is unarced, and it can contain nested FORM ARCs as well as FLSTs. A TREE chunk may be put at the beginning of an ARC which would describe the content of the file without having to scan the entire archive. SBDR - Subdirectory. This chunk contains a string of characters terminated by a null, specifying a subdirectory for this ARC to be unarced into. If the specified subdirectory does not already exist, the unarcing program will create it in the directory specified by the user (or the directory that a parent ARC was unarced into). However, unarcing programs should have the capability of overriding this and unarcing into the specified directory without looking at this. This chunk may or may not be included in the top ARC form, but it MUST be included in all sub-ARCs. ANAM - Archiver name. This contains a null-terminated string telling the name of the program that created this archive. This chunk, if included at all, should only be included in the top-level ARC chunk. It is provided only for the benifit of the user. Unarchivers will simply ignore this chunk, except to print its contents at the user's request. Archivers may or may not include this chunk, according to the programmer. NEXT - Continuation chunk. This chunk is provided in order to split archives up over several volumes for backup purposes. It is used in combination with the PREV chunk. It has a null-terminated string giving the complete path to the next continuation file. Each file is completely separate, except they are linked with the NEXT and PREV chunks in the FORM ARC, and the files may be split up between archives, making it necessary to have access to more than one of the archives in order to extract one file. PREV - Previous chunk. This chunk points to the previous archive file in the chain. The NEXT chunk in the file pointed to by this PREV points to this file. In other words, the files form a doubly-linked list. FORM TREE - The archive tree. There may be only one of these in any archive, although it's not required. If included, it should be in the top-level ARC, and there should be one for every archive in a chain (see NEXT), describing the files it contains. They are provided to speed up access to the files in the archive by providing useful information at the beginning. This FORM must be before any ARCs or FLSTs. The format for the TREE is described in section 3. (Also from Miles Johnson) FORM ARC - A child ARC. An ARC may contain other ARCs. This is useful for "sub-archives" which, when unarced, go into various directories automatically. A child ARC doesn't necessarily need a SBDR chunk, but it makes little sense otherwise. FORM FLST - Files. These chunks contain the actual files which make up the archive. These are not necessary for an archive, and in some cases this may be useful. For one example, maybe an archive wants to create several subdirectories which contain files, but no files in the root. As another example, you may want a child ARC to be completely empty except for a SBDR chunk - for example a "saved games" directory without any saved games. In other words, this may be useful for just creating directories to be used later, but not put any files in it. Section 2: FORM FLST - File These FORMs contain files, and make up the archive "meat". Each may contain several chunks described below. They may be split up into multiple parts for data recovery purposes, as well as split over volumes to facilitate making backups. SPEC - Filespec. This chunk contains a longword length of the decoded image, followed by a null-terminated filename. This is the only required chunk in the FORM FLST. CMNT - Comment. This is a null-terminated string containing any comment for the file. It should normally be less than about 40 characters or so. DATE - Last modification date. Adding, extracting, or copying will not change this date. The file must have its data changed in order to have the date changed. The date is a standard AmigaDOS date consisting of three longwords. The first is the number of days elapsed since January 1, 1978. The second longword is the number of minutes elapsed since midnight of that day. The third longword is the number of "ticks" since the beginning of that minute. A tick is 1/50th of a second. Not required, but recommended. SPLT - File splitting. This chunk is only required if the file has been split in any way. If it is not included, the default is zero for the first and third words, and one for the second (middle). It contains three words of data: the number of BODY chunks in previous archive files (denoted by the PREV chunk in the top ARC), the number of BODY chunks in this archive file, and the number of sections in archive files after this one (denoted by the NEXT chunk in the top ARC). This is actually a replacement for the older SECT chunk. It is used to split files into smaller, more manageable parts, and to split files between volumes, for backup purposes. If the first and third words are zero, this file has been split up, but not between volumes. If one or both is nonzero, it means that this file has been split up over one or more separate archive files, so to extract this file an unarchiver will have to access more than one archive. If the first word is nonzero, this file must be the first in this archive. If the last word is nonzero, it must be the last in the archive. Both are possible at the same time. They indicate how many BODY chunks come before this archive, in it, and after it. The unarchiver should use the NEXT and PREV chunks in the top ARC to find the other parts to this file. Note: If a file is split up over several archive files, the same header chunks MUST be exacly duplicated in all the parts. This includes NAME, LEN, CMNT, PSWD, DATE, and all other header chunks except those used for controlling the splitting. Splitting up files even in only one archive can be very valuable for data recovery. If part of a file is munged, the rest of the file may be salvaged. For example, an archiving program can break up the file into multiple compression chunks, and include a SPLT chunk with the number of compression chunks. Each compression chunk will contain, say, one page of data. Archivers should try to be intelligent when they split a file. For example, if it's a text file, split it after each page break. If it's an IFF file, split it between the chunks. If one part is bad, the rest will get put together, so the user might still get part of the file back. PROT - File protection. Warning: Hot subject. This chunk signifies to an unarchiver that this file is password protected, and it should let the user enter a password and then decode the file according to the password entered. The password is not actually stored anywhere in the file, so hackers can't write programs which simply look through the archive and print the passwords. Crackers have to "crack". Anyway, this contains one longword, which is an identifier for the type of encoding used, followed optionally by data which applies to the particular encoding method selected. When the user enters a password, the unarchiver will decode the file according to the password entered - if the password is wrong, the file will simply be decoded as garbage. Passwords must be mapped to uppercase by archivers and unarchivers. As of yet, I don't know of any specific methods of encrypting files, so it's up to you guys to fill in the blanks! CRC - CRC check. This chunk was modified from its original definition to accomodate multiple program sections. The chunk contains as many words of data as there BODY chunks in this archive file - one CRC for each section. Note that this includes only the BODY chunks included within this archive file, and not any that are in other archives, if the file is split between volumes. If there are too many CRC words, an unarchiver will ignore the rest. If there are too few, the unarchiver will check only the sections with CRCs supplied, and possibly give a warning to the user. If there is no CRC chunk, no checking will be done. AICN - Amiga Icon. This chunk is specifically for the Amiga. It stores icon information for the file. It has exactly the same format as the .info files. Basically, its purpose is to package icons with programs without having to have two entries with every file. The difference between this and storing raw .info files would be so the user, when he gets an archive listing, gets somethings like "Test (with icon)", instead of "Test" and "Test.info". BODY - These chunks comprise the actual data of the file, or the data of one section of the file. There will be only one BODY chunk unless the file is broken up into sections (see SPLT for details). This chunk always contains one longword at the beginning, which is an identifier for the compression format, and the compressed bytes of the file. The actual data of the file will depend on the compression used. The predefined compression algorithms are defined below. BODY/STOR - Storage without compression. This is usually used for very small files which would not gain anything in compression. The chunk's data is an exact duplicate of what will go into the file. BODY/PACK - Packing. This algorithm is the simplest, and simply sticks repetitious bytes together. BODY/LZIV - Lempel-Ziv encoding. This contains one byte which tells the "number of bits" used, followed by the data. Typically 12 (crunching) or 13 (squashing) bits. As of yet, I don't have the docs for this format, but as soon as I get them, I'll include them in a different doc file. BODY/HUFF - Huffman encoding. I don't have any docs for this one either, so I welcome any mail containing docs on this. When better compression algorithms come out, they may be added to the BODY. However, these will NOT be forward compatible - programs which support them will not be compatible with programs which don't. Also, archiving programs aren't required to analyze files to make sure they're getting the maximum efficiency. A user may want to just always use 12-bit Lempel-Ziv encoding, since it's the most often used. Archivers probably should have an option to disable analyzing the file. Section 3: FORM TREE - Directory and file trees This form may appear only in a top-level ARC. When used in an archive, it describes the content of that archive, and makes scanning the archive much faster. It may also be useful alone for creating directory trees. The chunks that apply specifically to ARCs are noted. DRNM - Direcotry name. This contains a null-terminated string giving the name of this volume or directory. It must appear before any other chunks in the FORM TREE. It is required for all sub-trees, but not required (although recommended) for the top-level TREE. SPEC - Filespec. This chunk contains one longword containing the lengthe of the file followed by a null-terminated string which names a file in this directory or volume. It may be followed by any of the following "modifier" chunks which describe the file in more detail. POS - Position. This chunk is valid only when used in archive files. It contains one longword of data, which is the absolute offest from the beginning of this archive file which the FORM FLST for this file will be found. This also may be used after subdirectory descriptions (FORM TREEs), which tell where in the archive file the FORM ARC will be found. Note that if there is any suspicion of data corruption, this should NOT be used by an unarchiver, since it uses absolute references into the file. Also, the user should have the option to disable both using these chunks when found, and writing them to files. FORM TREE - Subdirectories. This is describes a subdirectory within this directory. The name can be gotten from the NAME field within this FORM. One final note: there is no requirement to sort archived files in any way, although archivers may want to sort them for the sake of the user. Although this document is not copyrighted or anything, please don't redistribute it very much. This is because it's only a draft, and it will probably get changed, and we want EVERYONE to have the same thing. So if you decide to send it to local BBSs or something, please take the responsibility of updating them too. Thanks. Please feel free to email suggestions for this file, as well as post to Usenet. I don't have access to BIX, so although there is no requirement that you keep it to Usenet, I won't be able to respond on BIX, unless somebody does some cross-posting for me. Oh, and if anybody has the docs for Lempel-Ziv and Huffman encoding, or any other "interesting" formats, I'd appreciate it in my mailbox (address below). History date author changes -------- ------------------ --------------------------------------------- ???? Bryan Ford Gave birth to this file 01/08/88 Bryan Ford +ANAM SECT, ~CRC 01/17/88 Bryan Ford BODY=STOR+CRNC+PACK+SQEZ+SQSH, :-)8 01/20/88 Bryan Ford +FORM_TREE NEXT PREV CMNT DATE, SPLT=~SECT, :-)8 01/22/88 Bryan Ford SPEC=NAME+LEN, -LEVL, +PROT, BODY_HUFF=BODY_SQEZ BODY_LZIV=BODY_CRNC+BODY_SQSH, +AICN . If you want to know what this means, it's a shorthand /|\ for English which I created - I can send mail to | curious people. -Bryan | THE END Bryan Ford ///// A computer does what \\\\\ Snail: 1790 East 1400 North ///// you tell it to do, not \\\\\ Logan, UT 84321 \\\XX/// what you want it to do. \\\XX/// Email: USU@FATQW.BITNET \XXXX/ Murphy's Law Calender 1986 \XXXX/
bryce@hoser.berkeley.edu (Bryce Nesbitt) (01/24/88)
In article <8801240527.AA15462@jade.berkeley.edu> FATQW@USU.BITNET writes: > > ...An IFF File Archive Format... > >...AICN - Amiga Icon. This chunk is specifically for the Amiga. It stores >icon information for the file. It has exactly the same format as the .info >files.... Please no! It is the same format as a "Disk Object", as retrieved by the "GetDiskObject()" libray call. The de-arcer must put this back with the "PutDiskObject()" library call. ".info" files are merely a side effect of the current Workbench implementation. The only defined interface for making icons is "PutDiskObject()". This is all in the "icon.library". See the "Workbench" chapter in the RKM. |\ /| . Ack! (NAK, SOH, EOT) {o O} . bryce@hoser.berkeley.EDU -or- ucbvax!hoser!bryce (or try "cogsci") (") U "As an engineer, I only set the value of a product... not the cost." -Bryce Nesbitt