haitex@pnet01.cts.com (Wade Bickel) (09/19/88)
cmcmanis%pepper@Sun.COM (Chuck McManis) writes: >First what was your misunderstanding? IFF can be parsed fairly readily by >most "types" of parsers primarily because it's grammar is self consistent. > >-> So the question I wish to pose is; Would you (the Amiga community) >->reject a re-design of the current IFF standard? > >Yes if you decided to redesign it simply because you blew it while reading >the documents. Please, don't be offended but there are already "other"... > >Please, let us know what the "flaw" is first and *then* ask us if we want Ok, here goes... -------- The problem with the current IFF is that it is not generic. To be more specific, a FORM specifier is not a chunk per say. Under EA's definition, an ILBM is defined as: +-----------------------------------+ | 'FORM' size | +-----------------------------------+ | 'ILBM' | +-----------------------------------+ | +-------------------------------+ | | | 'BMHD' size + | | | data | | | +-------------------------------+ | | | 'CMAP' size | | | | data | | | +-------------------------------+ | | pad bytes (if needed) | +-----------------------------------+ | 'BODY' size + | data | +-----------------------------------+ The difficulty is that the 'ILBM' specifier is a special case, it has no size specifier. This wreaks havic on a generic parser. It also results in a nesting depth limitation (ie: BMHD cannot contain chuncks.) Another problem is that no bad chunk management is done. If any chunk is bad, the whole file is bad. Why not make a reasonable effort to retain the valid chunks? If the CMAP is messed up do we really need to through away the BODY? Recovering the CMAP would, in many cases, take but minutes useing a tool such as Doug's Color Commander (Seven Seas' Software), whereas an artist might loose hours of careful manipulation of the BODY. By allocating a bit in each chunk header this can be easily accomodated. Another problem is that there are no dirty chunk provisions. I feel that dirty chunk tracking would be a valuable option. Dirty chunks would occure when, after finding some recognized chunks, unrecognized chunks are encountered. IFF '85 discards these chunks. I propose that as a user option unrecognized chunks be retained when a program modifies a partially understood IFF '88 file. This can be easily achieved by allocating two bits in each chunk header. When unrecognized chunks are written they're marked as dirty, and any chunks which have been modified are also noted. This would allow programs with new, or proprietary chunks, to be made more compatable with existing programs (certain paint programs come to mind...). { BTW: I got the idea for the need for dirty chunk handling from Carolyn Scheppner, so don't tell me I'm off the wall on this one, I just happen to agree with her and offer this as one solution. I'm very open to any better solutions. } In IFF '88 a LONGWORD (ie: 32 bits) would be included at the top of all chunks to maintain the "status" of the chunk. Consider the following IFF '88 proposed format, +-----------------------------------+ | 'FORM' size,status | | +-------------------------------+ | | | 'ILBM' size,status | | | | +---------------------------+ | | | | | 'BMHD' size,status | | | | | | data | | | | | +---------------------------+ | | | | | 'CMAP' size,status | | | | | | data | | | | | +---------------------------+ | | | | | 'BODY' size,status + | | | | | data | | | | | +---------------------------+ | | | +-------------------------------+ | +-----------------------------------+ (pad bytes not shown, but considered added at the end of any odd byte length chunk, checksum assumed included at the end of each chunk as well). This format allows a generic parser to reconize 'FORM' and 'ILBM' as just another chunk type. More importantly, it allows a much simpler parser design that is also much more versital. It is entirely possible to place chunks within ANY chunk type. Thus data structures such as B-Trees are easily and efficeintly supported. Example: +-----------------------------------------------------+ | 'FORM' size,status | | +-------------------------------------------------+ | | | '23BT' size,status | | | | +---------------------------------------------+ | | | | | 'NODE' size,status | | | | | | +-----------------------------------------+ | | | | | | | 'NDAT' size,status | | | | | | | | data | | | | | | | +-----------------------------------------+ | | | | | | | 'NODE' size,status | | | | | | | | +-------------------------------------+ | | | | | | | | | 'NDAT' size,status | | | | | | | | | | data | | | | | | | | | +-------------------------------------+ | | | | | | | | | 'NODE' size,status | | | | | | | | | | +---------------------------------+ | | | | | | | | | | | 'NDAT' size,status | | | | | | | | | | | | data | | | | | | | | | | | +---------------------------------+ | | | | | | | | | | | NODEs, etc. etc. etc... | | | | | | | | | | | | | | | | | | | | | | | |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^| | | | | | | | | | +-------------------------------------+ | | | | | | | +-----------------------------------------+ | | | | | | | 'NODE' size,status | | | | | | | | +-------------------------------------+ | | | | | | | | | {NDAT and 3 NODES...etc., etc. | | | | | | | | | +-------------------------------------+ | | | | | | | +-----------------------------------------+ | | | | | | | 'NODE' size,status | | | | | | | | +-------------------------------------+ | | | | | | | | | {NDAT and 3 NODEs...etc., ect. | | | | | | | | | +-------------------------------------+ | | | | | | | +-----------------------------------------+ | | | | | +---------------------------------------------+ | | | +-------------------------------------------------+ | +-----------------------------------------------------+ Amoung other things, this format would support quicker searchs of the file for a specific node, since nodes can be searched in a true tree like fassion. However this is not the point of the change. What I really want to do is create a purely Data driven mechanism, as opposed to the Code driven one in the current IFF. Rather than having to write code to handle each type of occurance, a structure would be initialized at run time, and this would be passed to the Reader or Writer parser to be handled. In this way it would never be necessary to update the Library(s). The following is a document specifying how the system is to work. ============================================================================= Conceptual Design Specification --------------------------------- Like its' predecessor, IFF '88 is a recursive descendant parser design. The primary differences between the old design and the new one is that while IFF '85 was code driven, IFF '88 is data driven. Whereas IFF '85 reader/writers' require re-compilation of the source to accomodate format updates, IFF '88 will not. IFF '88 also incorporates a more natural recursive descendant format. Basically, IFF '88 will consist of a number of libraries. In the simplest scheme there would be two libraries, one containing two parsers (read and write) and the other containing support routines. In a more complex scheme 5 libraries would be created, one for each parser, one for each set of related support routines, and the fifth for routines shared by both the reader and writer libraries. To use IFF '88 the developer will initialize a control stucture (a list of nodes) which will be used to read/write the files. Effectively, your program will write a program, which will be used to write or read the desired file. Initialization of the data structures will be simplified with routines provided in the support libraries. Defining a control structure will be acheived through calls much like those used to initialize intuition menue structures, which most of us are quite familiar with. The IFF '88 parser design is generic and performs no error checking on the validity of the control structure it is passed. It will be the responsiblility of the developer to ensure that a valid control structure is passed to the parser. The Writer Mechanism ------------------------ In order to write a file an implementation first creates and properly initializes a writer-structure, then calls the writer function which parses the structure and writes the file. ENTRIES in the Write Structure ---------------------------------- The basic element of the writer/reader structure will henceforth be called an "entry". An entry to the writer structure is simply the following record: StdProcPtr = POINTER TO PROCEDURE(ADDRESS); WrtAlgParamsPtr = POINTER TO WrtAlgParams; WrtAlgParams = RECORD DataAddr : ADDRESS; ByteCount : LONGCARD; END; WENTRY = RECORD ckID : ARRAY[0..4] OF CHAR; ckStatus : LONGWORD; PreCall, WrtAlg, PostCall : StdProcPtr; PreData, WrtData, PostData : ADDRESS; WLev : WLevelPtr; {defined later in this doc.} END; The fields have the following definitions: ckID : 4 byte ID as defined in IFF '85. ckStatus: 32 bits to be used for flags and such. I envision three flags to be used for "bad", "dirty", and "modified" chunk identification. WrtAlg : The algorithm used to write the chunk contents as referenced by the "WrtData" field. In the simplest case the WrtAlg will point at a standard WriteBytes routine. This routine is passed one parameter on the stack. In this way differences in compiler paramater passing conventions can be more easily resolved. PreCall : Normally NIL. Used for special cases to execute a pre-write function, and is passed the value held in "PreData" as its parameter. PostCall: As for PreCall, but called after a call WrtAlg. WrtData : Passed to the fuction pointed to by WrtAlg. There is no restriction on what this field is to be used for. However, as a general convention it will be used to hold the address of an initialized WrtAlgParams record. PreData : As WrtData, but used in conjunction with PreCall. PostData: As WrtData, but used in conjunction with PostCall. WLev : A pointer to a lower WLEVEL structure. If this pointer is NIL then this entry contains data and the other feilds of this entry are processed. If it is not NIL the other feilds in this entry are ignored, and the WLEVEL structure pointed at is parsed. A variant record could also be used, but this is easier and thus less prone to cause undesired results. LEVELs in the Write Structure --------------------------------- Levels in the write structure represent nesting control of the file writing mechanism. WLevelPtr = POINTER TO WLevel; WLevel = RECORD Entry : WENTRY; Next : WLEVELptr; END; Using levels in the write structure is quite simple. A level is composed of any number of WLevel nodes, linked together in a list, and defines how the parser should organize chunks. The following example should provide an efficeint explanation of the operational mechanism. Parsing an Example Initialized Write Strutructure --------------------------------------------------- The parser is very simple. The easiest way to decribe its function is through example so... First we need something to parse so consider the following initialized structure for writing a simple ILBM. The parser is passed a WLevelPtr which we will call root. Unintialized fields are not shown. Record types are shown in {} as in "{WLevel}" and are abstract (not part of the actual data). The contents of a record type are indented one space. Sorry for the lack of graphics in this doc. root \ \ {WLevel} {WEntry} ckID = "FORM"; WLev --------> {WLevel} Next = NIL; {WEntry} ckID = "ILBM"; WLev -----> {WLevel} Next = NIL; {WEntry} ckID = "BMHD"; WrtAlg = ADR(WriteBytes()); WrtData ---> {WrtAlgParams} Next ADR(BitMapHdr); | TSIZE(BitMapHdr); | V {WLevel} {WEntry} ckID = "CMAP" WrtAlg = ADR(WriteBytes()); WrtData ---> {WrtAlgParams} Next ADR(ColorTable); | nColors; | V {WLevel} {WEntry} ckID = "BODY"; WrtAlg = ADR(BodyWrtAlg()); WrtData ---> {WrtAlgParams} Next ADR(BitMap); | | V NIL Effectively each node in the level structure is a node in a simple binary tree. One of the descendant pointers is contained in the WLEVEL structure and is used to establish lists of entries at the same level. The other descendant pointer, WLev, is contained in the WENTRY structure. It is used to establish lower levels or specify that the chunk contains data (by being NIL). The reader is a bit more complicated, but follows the same general principals. The structure is more complex, allowing groupings of chunks. Level pointers can be connected to higher levels creating a recursive reader. What all this buys us is versatility. Because it is possible to link user routines into the writer or reader structures, it is not necessary to update the library to incorporate a new low-level algorithm, such as compression algorithms. Also, LISTS and CATS are unnecessary; simple extention through Levels is sufficient to write any file. It would probably be desirable to replace the "FORM" keyword with something new, such as "NIFF" or "IF88". Sorry this is not well organized, but I already spent more of the day on this than I have. There is undoubtedly room for improvement, suggestions? If there is any interest I'll go into more detail. Right now I have to get back to X-Specs 3D stuff. Thanks, UUCP: {cbosgd, hplabs!hp-sdd, sdcsvax, nosc}!crash!pnet01!haitex ARPA: crash!pnet01!haitex@nosc.mil INET: haitex@pnet01.CTS.COM Opionions expressed are mine, and not necessarily those of my employer.
shf@well.UUCP (Stuart H. Ferguson) (09/21/88)
Wade Bickel describes what he sees as problems with the current IFF standard and advantages of going with a different structure. | The problem with the current IFF is that it is not generic. The jist of Wade's argument here is that the types of chunks allowed under IFF 85 are limited and should be made more general by making the rules simpler. With IFF 85, chunks are defined as a four-byte identifer plus a longword byte count followed by that many bytes of data (plus an optional pad byte if the length is odd). This can be represented as ID { data } where ID is a four-character identifer (conforming to certain rules, like no leading or embedded spaces, etc.) and the { data } construct gets replaced by "# data [0]", where "#" is (long)sizeof(data) and [0] is the optional pad byte. This is all well known and Wade doesn't want to change this signifcantly except to add a status word containing bit flags and a checksum at the end of each chunk. 'Data' can be any block of bytes, but in particular for IFF 85, if the ID of the chunk is "FORM," "LIST," "CAT ," or "PROP," then 'data' is defined as another four-character identifer followed by a series of *chunks*. Thus the format is recursively defined. An IFF file is a single FORM, LIST or CAT chunk. For those into grammars: IFF File ::= FORM | LIST | CAT FORM ::= "FORM" { ID Chunk* } CAT ::= "CAT " { ID Chunk* } LIST ::= "LIST" { ID PROP* Chunk* } PROP ::= "PROP" { ID LocalChunk* } Chunk ::= FORM | LIST | CAT | LocalChunk LocalChunk ::= ID { <chunkdata> } ID ::= <four-character identifer> Wade suggests that this structure is limiting because you can only specify groups of chunks ("Chunk*") within a grouping chunk, namely one of type FORM, LIST or CAT. He wants a simpler grammar, one which allows nested chunks within any chunk: File ::= Chunk Chunk ::= ID { <chunkdata> | Chunk* } ID ::= <four-character identifer> Wade gives the example of an ILBM. An IFF 85 ILBM looks like: "FORM" { "ILBM" "BMHD" { bitmap header } "CMAP" { color map } "BODY" { bitplane data } } Wade's proposed format looks like this: "FORM" { "ILBM" { "BMHD" { bitmap header } "CMAP" { color map } "BODY" { bitplane data } } } (I don't know why he retained the "FORM" identifer. It seems redundant.) Since the grammar description makes this appear to be a simplification of the IFF standard, one which must have been obvious to it's designers, the question arises, how come IFF isn't like this to begin with? Why did the developers of the IFF 85 standard do it the way they did rather than this apparently simpler way? The answer is not trivial and trips at times into the vague and bizarre, but I find that if I try to rectify some of the diffculties that result from using Wade's "new" design, I find that I end up re-inventing IFF 85. The driving consideration here is that we want to be able to extract from a file whatever type of data we may be interested in. For example, consider the case of the ANIM format. The first frame of an ANIM is stored intrnally as an ILBM, like so: "FORM" { "ANIM" "FORM" { "ILBM" "BMHD" { ... } ... rest of the ILBM ... } ... rest of the ANIM ... } Now suppose you have a paint program which doesn't have any understanding of the ANIM format -- that is, it does not recognize the formtype ANIM nor any of its internal chunks. Such a program, if properly written, can still retrieve the first frame of the ANIM as an ILBM by parsing the standard part of the IFF grammar. To the paint program, the ANIM file looks like this: "FORM" { xxxx "FORM" { "ILBM" "BMHD" { ... } ... etc. ... } ... xxxx ... } where "xxxx" represents parts of the file not recognized by the paint program parser. In contrast, Wade's style of ANIM would look like this: "ANIM" { "ILBM" { "BMHD" { ... } ... etc. ... } ... } And the ILBM-understanding paint program reader would see this: xxxx { xxxx } In other words, a parser which didn't understand the ANIM identifer would not be able to look inside the ANIM chunk, because it cannot know whether the chunk contains more chunks or just data. A grammar like this is said to be "context-sensitive" and is undesireable for obvoius reasons. A way to make the grammar "context-free" would be to add a bit to the status word (now part of Wade's file format) to flag whether this chunk has sub-chunks or not. That way, if a reader doesn't understand the ANIM identifier, it can still look inside this chunk for other chunks that it may understand. But what we've just done is to distinguish between grouping and non-grouping chunks, just like IFF 85 does in distinguishing between the FORM, LIST, CAT and PROP id's and all other chunk ID's. One can argue that Wade's method would be more general, since now any chunk can be a grouping chunk. There is certainly a danger of someone setting the grouping bit inconsistently with the nature of the chunk. This danger does not exist with IFF 85 where the identifier is either FORM, LIST, CAT or PROP or else it's not a grouping chunk. So, having provided a mechanism for parsers to examine the internals of Wade's format files without understanding the constituent identifiers, the next problem is that of scope and context. Our ILBM seeking paint program might delve deeply into the structure of some unknown file and locate an ILBM someplace deep inside: xxxx { xxxx { } xxxx { } xxxx { xxxx { } "ILBM" { ... (rest of file unparsed) ... The dificulty here is that the ILBM seeker has no way of determining anything about the context of the ILBM it has found. It can't know, for example, if some other part of the file is modifying this ILBM in any way. It also cannot be certain that the chunk "ILBM" isn't a specific internal chunk for another chunk. The end result of this is that all chunk identifiers must be treated the same wherever they may occur. While this might have some good side-effects, this is a generally undesirable condition primarily because of the inevitable collisions that can occur when many concurrent developers use a flat name-space, especially one as limited as four-character IDs. To alleviate this problem, lets add another bit to the status word that indicates whether this chunk is a root chunk -- that is, if this chunk can stand on its own independant of its context. Chunks without this bit set would be dependent on their context and can not be considered independently. So if the internal ILBM chunk located above had this bit set, then it would be safe to read it as its own bitmap image. If you've been astute, you'll see that I've just re-inveted "formtypes" from the IFF 85 standard. IFF provides a two-tier name-space that effectivly elliminates the possibility of collision: formtypes and local chunk types that depend on their formtype. So, for example, the meaning of a CMAP chunk within a FORM with type ILBM is different from a CMAP within a formtype DRAW, or any other formtype for that matter. The common name-space is the that of the formtypes, and there are many fewer formtypes than there are chunk types. Also, the use of the formtype ID within the FORM chunk gives the same result as the "root" bit in the hypothetical Bickel file format. When a parser sees a FORM, it knows that this is a self-contained object being used as part of a larger structure that it doesn't have to understand. Like the ILBM form within the ANIM form. This hypothetical extension of Wade's format is formally equivalent to the IFF 85 standard -- that is, anything you can do with one can be done with the other. * Gasp! * Does that mean that IFF *doesn't* need to be changed? Yup, that's exactly what that means. As an example, consider Wade's example of a B-tree structure being encoded into his proposed format: "FORM" { "23BT" { "NODE" { "NDAT" "NODE" { data and 3 node chunks } "NODE" { data and 3 node chunks } "NODE" { data and 3 node chunks } } } } This could be encoded into IFF as: "FORM" { "23BT" "DATA" { data for this node } "FORM" { "23BT" "DATA" { more node data } "FORM" { another "23BT" FORM } } "FORM" { another "23BT" FORM } "FORM" { another "23BT" FORM } } But the crux of Wade's proposal is this: | What I really want to do is create a purely Data driven mechanism, as | opposed to the Code driven one in the current IFF. Rather than having to | write code to handle each type of occurance, a structure would be initialized | at run time, and this would be passed to the Reader or Writer parser to be | handled. In this way it would never be necessary to update the Library(s). He's primarily interested in the mechanism for reading/writing such files, and writes about it to great length in his article. If this mechanism were useful for reading and writing "IFF 88" files, then it would be equally applicable to existing IFF files just by changing the file grammar slightly, as I did above. | Like its' predecessor, IFF '88 is a recursive descendant | parser design. The primary differences between the old design and | the new one is that while IFF '85 was code driven, IFF '88 is data driven. I don't get this. First of all, there is nothing inheriently recursive- descent about IFF. In fact, the iff.library that Leo and I are developing is a finite state machine design, rather than recursive- descent. (First attempts were recursive-descent because they are easier to write, but this makes the client code messy ... anyway...) Also, Wade has used the phrase "data-driven" several times, but I don't see what he's talking about. | The Writer Mechanism | ------------------------ What Wade describes here is basically a tree-like data structure to control writing an IFF file. The tree would presumably be traversed by the writer library code and user functions would be called to write the actual bytes of data. Wade refers to this as "data driven." | Whereas IFF '85 reader/writers' require re-compilation of the | source to accomodate format updates, IFF '88 will not. But I don't get it. Sure, the data structure controls the chunk nesting, but the actual business of writing bytes gets handled by user code, so where's the extendability? I still have to have the code to write the chunks in my program which means re-compilation when something changes. | In order to write a file an implementation first creates and | properly initializes a writer-structure, then calls the writer function | which parses the structure and writes the file. Wade details an example of writing an ILBM using his mechanism. The client program builds a data structure representing the structure of the file to be written which includes function pointers for wrting each chunk. | {WLevel} | {WEntry} | ckID = "FORM"; | WLev --------> {WLevel} | Next = NIL; {WEntry} | ckID = "ILBM"; | WLev -----> {WLevel} | Next = NIL; {WEntry} | ckID = "BMHD"; | WrtAlg = ADR(WriteBytes()); | WrtData ---> {WrtAlgParams} | Next ADR(BitMapHdr); | | TSIZE(BitMapHdr); ... etc. ... However, since the structure of the file and the code to write the actual bytes are both provided by the client program, I fail to see how creating this structure and passing it to a generic writer is any different from just having the following piece of code in the client program: /* WriteILBM: bitmap, colormap */ PushChunk (iff, ID_FORM, ID_ILBM); PushChunk (iff, ID_BMHD, 0); WriteBMHD (bitmap); PopChunk (iff); PushChunk (iff, ID_CMAP, 0); WriteCMAP (colormap); PopChunk (iff); PushChunk (iff, ID_BODY, 0); WriteBODY (bitmap); PopChunk (iff); PopChunk (iff); (This is an actual example of the use of the iff.library. RSN!) It seems that for either method I need to have the writing code in my program, and I need to know the structure of the file I want. If anything, I would think that constructing a large tree data structure would be more difficult than just having code to write the file directly. What's the advantage? I'm genuinly interested in this *mechanism* for reading and writing files since it should work equally well for real IFF files. If there are real advantages to this, Wade, I missed them. Could you provide an example of a file format changing and user programs not needing to be recompiled? On separate issue, Wade talks about "dirty" chunks. | [Another problem with IFF] | is that there are no dirty chunk provisions. I feel | that dirty chunk tracking would be a valuable option. Dirty chunks would | occur when, after finding some recognized chunks, unrecognized chunks are | encountered. IFF '85 discards these chunks. I propose that as a user option | unrecognized chunks be retained when a program modifies a partially understood | IFF '88 file. [ ... ] | When unrecognized chunks are written they're marked as dirty, | and any chunks which have been modified are also noted. This issue is discussed somewhat in the EA IFF 85 specification on page B-31 of the Exec RKM. Their conclusion is that the data universe encompassed by IFF is too large to allow for standardization of the possible interactions of the various types of data envolved. While this is an interesting and valid idea, it really makes life miserable for programmers. They have to retain chunks they don't need, don't understand and can't use, just so they can write them out again trying to preserve the original IFF file as much as possible only to fail much of the time. It also means that all programs need to fully support standard chunks so that standard chunks will never be marked as dirty. It also means that programs that use "non-standard" chunks need to make some intelligent decisions about whether a chunk marked as dirty" is good within the context of a specific file. It might be possible, but it could also be a real headache. I'm just not convinced that the advantages are great enough to want to provide such a mechanism. This facility can be provided for any new IFF formtypes, however, by equiping them with a "MAP" chunk (or some such, but it should be consistent across FORMs) which contains a list of the chunks in the file and their status. It is not possible or even desireable to retro- fit this capability into existing formats. -- Stuart Ferguson (shf@well.UUCP) Action by HAVOC (shf@Solar.Stanford.EDU)
jojo@astroatc.UUCP (Jon Wesener) (09/22/88)
I'd like to see the dependency on byte-sex disappear, you know, Big Indian, Little Indian. IFF is only good on machines with the same byte sex as the 68000. This has fouled me up on a number of occasions. my 2 cents, --j -- ... {seismo | harvard } ! {uwvax | cs.wisc.edu} ! astroatc!jojo "They weren't just any women... They were women fresh from Hell, with wisps of devil-smoke curling behind their ears and unholy vapors hiding in their tight leather dresses, looking for Mr. Right."
phil@titan.rice.edu (William LeFebvre) (09/26/88)
In article <1199@astroatc.UUCP> jojo@astroatc.UUCP (Jon Wesener) writes: > > I'd like to see the dependency on byte-sex disappear, you know, >Big Indian, Little Indian. That's ENDIAN, not indian. William LeFebvre Department of Computer Science Rice University <phil@Rice.edu>
haitex@pnet01.cts.com (Wade Bickel) (09/26/88)
Thanks Stu, for the explanation of IFF '85. I had not looked at the sub specifier in that light, and now see that it is indeed a good idea. I still think it would be nice to incorporate the individual BAD chunk handling and Dirty/Modified chunk bits (please remember this was to be a user/program option), but not sufficeint to warrent re-doing IFF. A MAP chunk is of course one alternate solution, but seems to me to be clumsy and inefficient (what happens when parallel tracks of audio data are being stored?). >But the crux of Wade's proposal is this: > >| What I really want to do is create a purely Data driven mechanism, as >| opposed to the Code driven one in the current IFF. Rather than having to >| write code to handle each type of occurance, a structure would be initialized >| at run time, and this would be passed to the Reader or Writer parser to be >| handled. In this way it would never be necessary to update the Library(s). > >He's primarily interested in the mechanism for reading/writing such >files, and writes about it to great length in his article. If this >mechanism were useful for reading and writing "IFF 88" files, then it >would be equally applicable to existing IFF files just by changing the >file grammar slightly, as I did above. > Quite so. I had originally intended to apply this to the IFF '85 standard. Then, in my telephone discussion with you I had mistakenly come to the conclusion that nesting control was limited in IFF '85, and thus the first half of my proposal. >Wade has used the phrase "data-driven" several times, but I don't see >what he's talking about. Simple. In the current IFF system the knowledge to read/write IFF files is contained in the IFF code. Thus it is "code-driven". To extend the systems knowledge you must modify the code. In the "data-driven" system I propose the knowledge is contained in data (which may include code). Thus the data, not the code, is modified to extend the systems knowledge. >| The Writer Mechanism >| ------------------------ >What Wade describes here is basically a tree-like data structure to >control writing an IFF file. The tree would presumably be traversed by >the writer library code and user functions would be called to write the >actual bytes of data. Wade refers to this as "data driven." > >| Whereas IFF '85 reader/writers' require re-compilation of the >| source to accomodate format updates, IFF '88 will not. > >But I don't get it. Sure, the data structure controls the chunk >nesting, but the actual business of writing bytes gets handled by user >code, so where's the extendability? I still have to have the code to >write the chunks in my program which means re-compilation when something >changes. Not so. Only if you use some format which has not yet become part of the standard library. Under normal circumstances you would call routines found in the standard support librarys. Thus, a developer of a new format includes a library of private support routines to be used with the system. It would also be possible (though a bit cludgy) to include routines that are part of your active code (ie: not in a library). An AddFunc() type routine to add these calls to a table and assign an index might be desirable. >However, since the structure of the file and the code to write the >actual bytes are both provided by the client program, I fail to see how >creating this structure and passing it to a generic writer is any >different from just having the following piece of code in the client >program: > > /* WriteILBM: bitmap, colormap */ > > PushChunk (iff, ID_FORM, ID_ILBM); > > PushChunk (iff, ID_BMHD, 0); > WriteBMHD (bitmap); > PopChunk (iff); > > PushChunk (iff, ID_CMAP, 0); > WriteCMAP (colormap); > PopChunk (iff); > > PushChunk (iff, ID_BODY, 0); > WriteBODY (bitmap); > PopChunk (iff); > > PopChunk (iff); > >(This is an actual example of the use of the iff.library. RSN!) It >seems that for either method I need to have the writing code in my >program, and I need to know the structure of the file I want. If >anything, I would think that constructing a large tree data structure >would be more difficult than just having code to write the file >directly. What's the advantage? > Think about the control structure from the point of view of the file reader. It contains the sum of what the library and your code know about IFF files. Also, how about the following minor change in the way you handle file writing: PushChunk (iff, ID_FORM, ID_ILBM); PushChunk (iff, ID_BMHD, 0); WriteBMHD (bitmap); PopChunk (iff); PushChunk (iff, ID_CMAP, 0); WriteBytes (colortable,size); (* colortable is a ptr *) PopChunk (iff); PushChunk (iff, ID_BODY, 0); WriteBODY (bitmap); PopChunk (iff); PopChunk (iff); Of course, these are supported types. Lets pretend they're not. In this case it would look something like this: PushChunk (iff, ID_FORM, ID_ILBM); PushChunk (iff, ID_BMHD, 0); bmhd := MakeBMHD(bitmap); WriteBytes (bmhd, sizeofBMHD); PopChunk (iff); PushChunk (iff, ID_CMAP, 0); WriteBytes (colortable,sizeofCT); (* colortable is a ptr *) PopChunk (iff); PushChunk (iff, ID_BODY, 0); WriteCustom (rtn, bitmap); (* rtn is a ptr to code *) PopChunk (iff); PopChunk (iff); The chunks are now all generic. The library supports them even though they may not exist at the time of libary compilation. "rtn" is the address of a routine expecting one parameter on the stack, in this case a BitMap pointer. In this instance, it would be a routine in the library, but it could be any routine. I have not decided on the best way to handle custom routines, and would like suggestions. One parameter on the stack, push everything but D0, seems good to me. In practice it would probably be best to reference such routines from a jumptable. >I'm genuinly interested in this *mechanism* for reading and writing >files since it should work equally well for real IFF files. If there >are real advantages to this, Wade, I missed them. Could you provide an >example of a file format changing and user programs not needing to be >recompiled? I had not pictured the file format changing. Rather, I had been thinking of how to handle new IFF data types. With EA's code, I found it somewhat frustrating to have to go in, modify parts of the code, and re-compile it to add new chunks. What I want to be able to do is inform the system of a new data type (and the rules for reading it), without touching the original source. Why? Well, for one thing, having myriads of modified IFF reader code all based on the same originals, but containing different mods, is a mess. More importantly, library management, should your IFF library be made available, will be a mess if the library is constantly being updated to support new data types. Your example writer code looks superior to the writer structure method I described. The structure really comes into use for the reading of files, but since describing the writer was simpler than describing the reader, and we have to have a format before we can discuss how to read it, I chose to describe the writer. Please take a look at the writer system I described, and imagine using the same method for reading the files. Some method of creating "groups" of items on the same level would be desireable, which complicates the format, but is irrelevant as far a theory goes. Basically, a system of nodes is created which describes readable files, and how to read them. These nodes reference *knowledge*, which can be in the IFF library(s) or elsewhere. In this way, programs using new IFF data types can describe only the new knowledge to the system. For instance, lets suppose the library understands ILBMs but not ANIM's. The program defines ANIM by first calling library routines to generate a structure which reads ILBMs, then adds nodes which describe ANIM's and how to read them at the appropriate points. Now the system understands and can read ANIM. Periodically new types would be added to the library, as circumstances warrant. Standard calls to write/read standard types would also be in the library, to make things easy for the novice, and act as jumping off points for the designer of new IFF data types. ---------------------------------------- >On separate issue, Wade talks about "dirty" chunks. >While this is an interesting and valid idea, it really makes life >miserable for programmers. They have to retain chunks they don't need, >don't understand and can't use, just so they can write them out again >trying to preserve the original IFF file as much as possible only to >fail much of the time. It also means that all programs need to fully >support standard chunks so that standard chunks will never be marked as >dirty. It also means that programs that use "non-standard" chunks need >to make some intelligent decisions about whether a chunk marked as >dirty" is good within the context of a specific file. It might be >possible, but it could also be a real headache. I'm just not convinced >that the advantages are great enough to want to provide such a >mechanism. Simple, just turn it off. It should be a switch in the library, If the switch is on when the library reads a file, it tracks dirty chunks. If a file containing dirty chunks is read, it looks at the reader control structure (inited by the useing program) and determines which chunks are bad. If the switch is off unrecognized chunks or dirty chunks are ignored on reads. >This facility can be provided for any new IFF formtypes, however, by >equiping them with a "MAP" chunk (or some such, but it should be >consistent across FORMs) which contains a list of the chunks in the >file and their status. It is not possible or even desireable to retro- >fit this capability into existing formats. True. However, I think it was a mistake in the format not to leave a WORD for this. Not leaving a word for the future is shortsightedness. Using a Map chunk will be messy, making dirtychunk tracking harder than it would otherwise be. Also, what about data correction for chunks? It won't be that much longer and musicians will be using IFF to store CD quality sound. We can't be throughing away a file just because one bit has gone bad in a chunk. Thanks, Wade. UUCP: {cbosgd, hplabs!hp-sdd, sdcsvax, nosc}!crash!pnet01!haitex ARPA: crash!pnet01!haitex@nosc.mil INET: haitex@pnet01.CTS.COM Opionions expressed are mine, and not necessarily those of my employer.