[comp.sys.amiga.tech] New IFF format details

haitex@pnet01.cts.com (Wade Bickel) (09/19/88)

cmcmanis%pepper@Sun.COM (Chuck McManis) writes:
>First what was your misunderstanding? IFF can be parsed fairly readily by
>most "types" of parsers primarily because it's grammar is self consistent.
>
>->        So the question I wish to pose is;  Would you (the Amiga community)
>->reject a re-design of the current IFF standard?
>
>Yes if you decided to redesign it simply because you blew it while reading
>the documents. Please, don't be offended but there are already "other"... 
>
>Please, let us know what the "flaw" is first and *then* ask us if we want

        Ok, here goes...

        --------

        The problem with the current IFF is that it is not generic.

        To be more specific, a FORM specifier is not a chunk per say.
Under EA's definition, an ILBM is defined as:


                +-----------------------------------+
                | 'FORM'        size                |
                +-----------------------------------+
                | 'ILBM'                            |
                +-----------------------------------+
                | +-------------------------------+ |
                | | 'BMHD'      size              + |
                | |             data              | |
                | +-------------------------------+ |
                | | 'CMAP'      size              | |
                | |             data              | |
                | +-------------------------------+ |
                | pad bytes (if needed)             |
                +-----------------------------------+
                | 'BODY'        size                +
                |               data                |
                +-----------------------------------+


        The difficulty is that the 'ILBM' specifier is a special case, it has
no size specifier.  This wreaks havic on a generic parser.  It also results in
a nesting depth limitation (ie: BMHD cannot contain chuncks.)
	
	Another problem is that no bad chunk management is done.  If any chunk
is bad, the whole file is bad.  Why not make a reasonable effort to retain the
valid chunks?  If the CMAP is messed up do we really need to through away the
BODY?  Recovering the CMAP would, in many cases, take but minutes useing a 
tool such as Doug's Color Commander (Seven Seas' Software), whereas an artist
might loose hours of careful manipulation of the BODY.  By allocating a bit
in each chunk header this can be easily accomodated.

	Another problem is that there are no dirty chunk provisions.  I feel
that dirty chunk tracking would be a valuable option.  Dirty chunks would 
occure when, after finding some recognized chunks, unrecognized chunks are
encountered.  IFF '85 discards these chunks.  I propose that as a user option
unrecognized chunks be retained when a program modifies a partially understood
IFF '88 file.  This can be easily achieved by allocating two bits in each
chunk header.  When unrecognized chunks are written they're marked as dirty,
and any chunks which have been modified are also noted.  This would allow
programs with new, or proprietary chunks, to be made more compatable with
existing programs (certain paint programs come to mind...).

     { BTW:  I got the idea for the need for dirty chunk handling from
	     Carolyn Scheppner, so don't tell me I'm off the wall on this
	     one, I just happen to agree with her and offer this as one
	     solution.  I'm very open to any better solutions. }
	      

	In IFF '88 a LONGWORD (ie: 32 bits) would be included at the top of
all chunks to maintain the "status" of the chunk.

        Consider the following IFF '88 proposed format,

                +-----------------------------------+
                | 'FORM'         size,status        |
                | +-------------------------------+ |
                | | 'ILBM'       size,status      | |
                | | +---------------------------+ | |
                | | | 'BMHD'     size,status    | | |
                | | |            data           | | |
                | | +---------------------------+ | |
                | | | 'CMAP'     size,status    | | |
                | | |            data           | | |
                | | +---------------------------+ | |
                | | | 'BODY'     size,status    + | |
                | | |            data           | | |
                | | +---------------------------+ | |
                | +-------------------------------+ |
                +-----------------------------------+

                (pad bytes not shown, but considered added at the end
                    of any odd byte length chunk, checksum assumed included
		    at the end of each chunk as well).

        This format allows a generic parser to reconize 'FORM' and 'ILBM' as
just another chunk type.  More importantly, it allows a much simpler parser
design that is also much more versital.   It is entirely possible to place
chunks within ANY chunk type.  Thus data structures such as B-Trees are
easily and efficeintly supported.  Example:

            +-----------------------------------------------------+
            | 'FORM'                  size,status                 |
            | +-------------------------------------------------+ |
            | | '23BT'                size,status               | |
            | | +---------------------------------------------+ | |
            | | | 'NODE'              size,status             | | |
            | | | +-----------------------------------------+ | | |
            | | | | 'NDAT'            size,status           | | | |
            | | | |                   data                  | | | |
            | | | +-----------------------------------------+ | | |
            | | | | 'NODE'            size,status           | | | |
            | | | | +-------------------------------------+ | | | |
            | | | | | 'NDAT'         size,status          | | | | |
            | | | | |                data                 | | | | |
            | | | | +-------------------------------------+ | | | |
            | | | | | 'NODE'         size,status          | | | | |
            | | | | | +---------------------------------+ | | | | |
            | | | | | | 'NDAT'       size,status        | | | | | |
            | | | | | |              data               | | | | | |
            | | | | | +---------------------------------+ | | | | |
            | | | | | |  NODEs, etc. etc. etc...        | | | | | |
            | | | | | |                                 | | | | | |
            | | | | | |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^| | | | | |
            | | | | +-------------------------------------+ | | | |
            | | | +-----------------------------------------+ | | |
            | | | | 'NODE'             size,status          | | | |
            | | | | +-------------------------------------+ | | | |
            | | | | |  {NDAT and 3 NODES...etc., etc.     | | | | |
            | | | | +-------------------------------------+ | | | |
            | | | +-----------------------------------------+ | | |
            | | | | 'NODE'             size,status          | | | |
            | | | | +-------------------------------------+ | | | |
            | | | | |  {NDAT and 3 NODEs...etc., ect.     | | | | |
            | | | | +-------------------------------------+ | | | |
            | | | +-----------------------------------------+ | | |
            | | +---------------------------------------------+ | |
            | +-------------------------------------------------+ |
            +-----------------------------------------------------+

	Amoung other things, this format would support quicker searchs of
the file for a specific node, since nodes can be searched in a true tree like
fassion.  However this is not the point of the change.

	What I really want to do is create a purely Data driven mechanism, as
opposed to the Code driven one in the current IFF.  Rather than having to 
write code to handle each type of occurance, a structure would be initialized
at run time, and this would be passed to the Reader or Writer parser to be
handled.  In this way it would never be necessary to update the Library(s).

        The following is a document specifying how the system is to work.


 =============================================================================



		     Conceptual Design Specification
                    ---------------------------------

	Like its' predecessor, IFF '88 is a recursive descendant
parser design.  The primary differences between the old design and
the new one is that while IFF '85 was code driven, IFF '88 is data driven.
Whereas IFF '85 reader/writers' require re-compilation of the
source to accomodate format updates, IFF '88 will not.  IFF '88 also
incorporates a more natural recursive descendant format.

	Basically, IFF '88 will consist of a number of libraries.  In the 
simplest scheme there would be two libraries, one containing two parsers
(read and write) and the other containing support routines.  In a more
complex scheme 5 libraries would be created, one for each parser, one for
each set of related support routines, and the fifth for routines shared by
both the reader and writer libraries.

	To use IFF '88 the developer will initialize a control stucture
(a list of nodes) which will be used to read/write the files.  Effectively,
your program will write a program, which will be used to write or read the
desired file.  Initialization of the data structures will be simplified with
routines provided in the support libraries.  Defining a control structure
will be acheived through calls much like those used to initialize intuition
menue structures, which most of us are quite familiar with.

	The IFF '88 parser design is generic and performs no error checking
on the validity of the control structure it is passed.  It will be the
responsiblility of the developer to ensure that a valid control structure
is passed to the parser.


			The Writer Mechanism
	              ------------------------

	In order to write a file an implementation first creates and
properly initializes a writer-structure, then calls the writer function
which parses the structure and writes the file.


		    ENTRIES in the Write Structure
		  ----------------------------------

	The basic element of the writer/reader structure will henceforth be
called an "entry".  An entry to the writer structure is simply the following
record:
	   StdProcPtr		= POINTER TO PROCEDURE(ADDRESS);

	   WrtAlgParamsPtr	= POINTER TO WrtAlgParams;
           WrtAlgParams		= RECORD
				   DataAddr	: ADDRESS;
				   ByteCount	: LONGCARD;
				  END;

	   WENTRY		= RECORD
				   ckID		: ARRAY[0..4] OF CHAR;
				   ckStatus     : LONGWORD;
				   PreCall,
				   WrtAlg,
				   PostCall	: StdProcPtr;
				   PreData,
				   WrtData,
				   PostData	: ADDRESS;
				   WLev		: WLevelPtr; 
						   {defined later in this doc.}
				  END;

	The fields have the following definitions:

	   ckID	   :  4 byte ID as defined in IFF '85.

	   ckStatus:  32 bits to be used for flags and such.  I envision
			three flags to be used for "bad", "dirty", and
			"modified" chunk identification.
			
           WrtAlg  :  The algorithm used to write the chunk contents as
	   		referenced by the "WrtData" field.  In the simplest
			case the WrtAlg will point at a standard WriteBytes
			routine.  This routine is passed one parameter on the
			stack.  In this way differences in compiler paramater
			passing conventions can be more easily resolved.
			
	   PreCall :  Normally NIL.  Used for special cases to execute a
			pre-write function, and is passed the value held
			in "PreData" as its parameter.

	   PostCall:  As for PreCall, but called after a call WrtAlg.
	   
           WrtData :  Passed to the fuction pointed to by WrtAlg.  There is
	                no restriction on what this field is to be used for.
			However, as a general convention it will be used to
			hold the address of an initialized WrtAlgParams record.
			
	   PreData :  As WrtData, but used in conjunction with PreCall.
	   
	   PostData:  As WrtData, but used in conjunction with PostCall.

	   WLev :  A pointer to a lower WLEVEL structure.  If this pointer
                        is NIL then this entry contains data and the
			other feilds of this entry are processed.  If it is
			not NIL the other feilds in this entry are ignored,
			and the WLEVEL structure pointed at is parsed.  A
			variant record could also be used, but this is easier
			and thus less prone to cause undesired results.



			LEVELs in the Write Structure
		      ---------------------------------
		      
	Levels in the write structure represent nesting control
of the file writing mechanism.

	  WLevelPtr	=    POINTER TO WLevel;
	  WLevel	=	RECORD
                      	         Entry	:  WENTRY;
                                 Next	:  WLEVELptr;
                    	        END;

	Using levels in the write structure is quite simple.  A level is
composed of any number of WLevel nodes, linked together in a list, and
defines how the parser should organize chunks.  The following example
should provide an efficeint explanation of the operational mechanism.


            Parsing an Example Initialized Write Strutructure
           ---------------------------------------------------
	
	The parser is very simple.  The easiest way to decribe its function
is through example so...

	First we need something to parse so consider the following initialized
structure for writing a simple ILBM.  The parser is passed a WLevelPtr which
we will call root.  Unintialized fields are not shown.  Record types are shown
in {} as in "{WLevel}" and are abstract (not part of the actual data). The
contents of a record type are indented one space.  Sorry for the lack of
graphics in this doc.

 root 
   \
    \
   {WLevel}
    {WEntry}
      ckID = "FORM";
      WLev --------> {WLevel}
     Next = NIL;      {WEntry}
                        ckID = "ILBM";
			WLev -----> {WLevel}
		       Next = NIL;   {WEntry}
		                       ckID = "BMHD";
				       WrtAlg = ADR(WriteBytes());
				       WrtData ---> {WrtAlgParams}
				      Next	      ADR(BitMapHdr);
					|	      TSIZE(BitMapHdr);
				        |
					V
				    {WLevel}
				     {WEntry}
				       ckID = "CMAP"
				       WrtAlg = ADR(WriteBytes());
				       WrtData ---> {WrtAlgParams}
				      Next            ADR(ColorTable);
					|	      nColors;
				        |
					V
				    {WLevel}
				     {WEntry}
				       ckID = "BODY";
				       WrtAlg = ADR(BodyWrtAlg());
				       WrtData ---> {WrtAlgParams}
				      Next	      ADR(BitMap);
                                        |
				        |
				        V
				       NIL
	
	Effectively each node in the level structure is a node in a simple
binary tree. One of the descendant pointers is contained in the WLEVEL
structure and is used to establish lists of entries at the same level.  The
other descendant pointer, WLev, is contained in the WENTRY structure.  It is
used to establish lower levels or specify that the chunk contains data (by
being NIL).

	The reader is a bit more complicated, but follows the same general
principals.  The structure is more complex, allowing groupings of chunks.
Level pointers can be connected to higher levels creating a recursive reader.

	What all this buys us is versatility.  Because it is possible to link
user routines into the writer or reader structures, it is not necessary to
update the library to incorporate a new low-level algorithm, such as
compression algorithms.  Also, LISTS and CATS are unnecessary; simple
extention through Levels is sufficient to write any file.  It would probably
be desirable to replace the "FORM" keyword with something new, such as
"NIFF" or "IF88".

        Sorry this is not well organized, but I already spent more of the
day on this than I have.  There is undoubtedly room for improvement,
suggestions?

	If there is any interest I'll go into more detail.  Right now I have
to get back to X-Specs 3D stuff.

							Thanks,




UUCP: {cbosgd, hplabs!hp-sdd, sdcsvax, nosc}!crash!pnet01!haitex
ARPA: crash!pnet01!haitex@nosc.mil
INET: haitex@pnet01.CTS.COM
Opionions expressed are mine, and not necessarily those of my employer.

shf@well.UUCP (Stuart H. Ferguson) (09/21/88)

Wade Bickel describes what he sees as problems with the current IFF 
standard and advantages of going with a different structure.

|         The problem with the current IFF is that it is not generic.

The jist of Wade's argument here is that the types of chunks allowed
under IFF 85 are limited and should be made more general by making the
rules simpler.  With IFF 85, chunks are defined as a four-byte identifer
plus a longword byte count followed by that many bytes of data (plus an
optional pad byte if the length is odd).  This can be represented as

	ID { data }

where ID is a four-character identifer (conforming to certain rules, 
like no leading or embedded spaces, etc.) and the { data } construct 
gets replaced by "# data [0]", where "#" is (long)sizeof(data) and [0]
is the optional pad byte.  This is all well known and Wade doesn't want 
to change this signifcantly except to add a status word containing bit
flags and a checksum at the end of each chunk.

'Data' can be any block of bytes, but in particular for IFF 85, if the
ID of the chunk is "FORM," "LIST," "CAT ," or "PROP," then 'data' is
defined as another four-character identifer followed by a series of
*chunks*.  Thus the format is recursively defined.  An IFF file is a
single FORM, LIST or CAT chunk. 

For those into grammars:

	IFF File   ::=  FORM | LIST | CAT
	FORM       ::=  "FORM" { ID Chunk* }
	CAT        ::=  "CAT " { ID Chunk* }
	LIST       ::=  "LIST" { ID PROP* Chunk* }
	PROP       ::=  "PROP" { ID LocalChunk* }
	Chunk      ::=  FORM | LIST | CAT | LocalChunk
	LocalChunk ::=  ID { <chunkdata> }
	ID         ::=  <four-character identifer>

Wade suggests that this structure is limiting because you can only
specify groups of chunks ("Chunk*") within a grouping chunk, namely one
of type FORM, LIST or CAT. He wants a simpler grammar, one which allows
nested chunks within any chunk:

	File   ::=  Chunk
	Chunk  ::=  ID { <chunkdata> | Chunk* }
	ID     ::=  <four-character identifer>

Wade gives the example of an ILBM.  An IFF 85 ILBM looks like:

	"FORM" {
		"ILBM"
		"BMHD" { bitmap header	}
		"CMAP" { color map	}
		"BODY" { bitplane data	}
	}

Wade's proposed format looks like this:

	"FORM" {
		"ILBM" {
			"BMHD" { bitmap header	}
			"CMAP" { color map	}
			"BODY" { bitplane data	}
		}
	}

(I don't know why he retained the "FORM" identifer.  It seems 
redundant.)

Since the grammar description makes this appear to be a simplification
of the IFF standard, one which must have been obvious to it's designers,
the question arises, how come IFF isn't like this to begin with?  Why
did the developers of the IFF 85 standard do it the way they did rather
than this apparently simpler way? 

The answer is not trivial and trips at times into the vague and bizarre, 
but I find that if I try to rectify some of the diffculties that result
from using Wade's "new" design, I find that I end up re-inventing IFF
85.  The driving consideration here is that we want to be able to
extract from a file whatever type of data we may be interested in.  For
example, consider the case of the ANIM format.  The first frame of an
ANIM is stored intrnally as an ILBM, like so:

	"FORM" {
		"ANIM"
		"FORM" {
			"ILBM"
			"BMHD" { ... }
			... rest of the ILBM ...
		}
		... rest of the ANIM ...
	}

Now suppose you have a paint program which doesn't have any 
understanding of the ANIM format -- that is, it does not recognize the 
formtype ANIM nor any of its internal chunks.  Such a program, if 
properly written, can still retrieve the first frame of the ANIM as an
ILBM by parsing the standard part of the IFF grammar.  To the paint 
program, the ANIM file looks like this:

	"FORM" {
		xxxx
		"FORM" {
			"ILBM"
			"BMHD" { ... }
			... etc. ...
		}
		... xxxx ...
	}

where "xxxx" represents parts of the file not recognized by the paint 
program parser.  In contrast, Wade's style of ANIM would look like this:

	"ANIM" {
		"ILBM" {
			"BMHD" { ... }
			... etc. ...
		}
		...
	}

And the ILBM-understanding paint program reader would see this:

	xxxx { xxxx }

In other words, a parser which didn't understand the ANIM identifer 
would not be able to look inside the ANIM chunk, because it cannot know 
whether the chunk contains more chunks or just data.  A grammar like this
is said to be "context-sensitive" and is undesireable for obvoius
reasons.  A way to make the grammar "context-free" would be to add a bit
to the status word (now part of Wade's file format) to flag whether this 
chunk has sub-chunks or not.  That way, if a reader doesn't understand
the ANIM identifier, it can still look inside this chunk for other
chunks that it may understand. 

But what we've just done is to distinguish between grouping and
non-grouping chunks, just like IFF 85 does in distinguishing between the 
FORM, LIST, CAT and PROP id's and all other chunk ID's.  One can argue
that Wade's method would be more general, since now any chunk can be a
grouping chunk.  There is certainly a danger of someone setting the
grouping bit inconsistently with the nature of the chunk.  This danger
does not exist with IFF 85 where the identifier is either FORM, LIST,
CAT or PROP or else it's not a grouping chunk. 

So, having provided a mechanism for parsers to examine the internals of 
Wade's format files without understanding the constituent identifiers, 
the next problem is that of scope and context.  Our ILBM seeking paint 
program might delve deeply into the structure of some unknown file and 
locate an ILBM someplace deep inside:

	xxxx {
		xxxx { }
		xxxx { }
		xxxx {
			xxxx { }
			"ILBM" {

		... (rest of file unparsed) ...

The dificulty here is that the ILBM seeker has no way of determining 
anything about the context of the ILBM it has found.  It can't know, for
example, if some other part of the file is modifying this ILBM in any
way.  It also cannot be certain that the chunk "ILBM" isn't a specific 
internal chunk for another chunk.  The end result of this is that all
chunk identifiers must be treated the same wherever they may occur. 
While this might have some good side-effects, this is a generally
undesirable condition primarily because of the inevitable collisions
that can occur when many concurrent developers use a flat name-space,
especially one as limited as four-character IDs.

To alleviate this problem, lets add another bit to the status word that
indicates whether this chunk is a root chunk -- that is, if this chunk
can stand on its own independant of its context.  Chunks without this
bit set would be dependent on their context and can not be considered
independently.  So if the internal ILBM chunk located above had this bit 
set, then it would be safe to read it as its own bitmap image.

If you've been astute, you'll see that I've just re-inveted "formtypes" 
from the IFF 85 standard.  IFF provides a two-tier name-space 
that effectivly elliminates the possibility of collision: formtypes and
local chunk types that depend on their formtype.  So, for example, the 
meaning of a CMAP chunk within a FORM with type ILBM is different from a
CMAP within a formtype DRAW, or any other formtype for that matter.  The
common name-space is the that of the formtypes, and there are many fewer
formtypes than there are chunk types.  Also, the use of the formtype ID
within the FORM chunk gives the same result as the "root" bit in the
hypothetical Bickel file format.  When a parser sees a FORM, it knows
that this is a self-contained object being used as part of a larger
structure that it doesn't have to understand.  Like the ILBM form within 
the ANIM form.

This hypothetical extension of Wade's format is formally equivalent to
the IFF 85 standard -- that is, anything you can do with one can be done
with the other.  * Gasp! *  Does that mean that IFF *doesn't* need to be
changed?  Yup, that's exactly what that means.  As an example, consider
Wade's example of a B-tree structure being encoded into his proposed
format: 
 
	"FORM" {
		"23BT" {
			"NODE" {
				"NDAT"
				"NODE" { data and 3 node chunks }
				"NODE" { data and 3 node chunks }
				"NODE" { data and 3 node chunks }
			}
		}
	}

This could be encoded into IFF as:

	"FORM" {
		"23BT"
		"DATA" { data for this node }
		"FORM" {
			"23BT"
			"DATA" { more node data }
			"FORM" { another "23BT" FORM }
		}
		"FORM" { another "23BT" FORM }
		"FORM" { another "23BT" FORM }
	}

But the crux of Wade's proposal is this:

|         What I really want to do is create a purely Data driven mechanism, as
| opposed to the Code driven one in the current IFF.  Rather than having to 
| write code to handle each type of occurance, a structure would be initialized
| at run time, and this would be passed to the Reader or Writer parser to be
| handled.  In this way it would never be necessary to update the Library(s).

He's primarily interested in the mechanism for reading/writing such
files, and writes about it to great length in his article.  If this
mechanism were useful for reading and writing "IFF 88" files, then it
would be equally applicable to existing IFF files just by changing the
file grammar slightly, as I did above.

|         Like its' predecessor, IFF '88 is a recursive descendant
| parser design.  The primary differences between the old design and
| the new one is that while IFF '85 was code driven, IFF '88 is data driven.

I don't get this.  First of all, there is nothing inheriently recursive-
descent about IFF.  In fact, the iff.library that Leo and I are
developing is a finite state machine design, rather than recursive-
descent.  (First attempts were recursive-descent because they are easier
to write, but this makes the client code messy ... anyway...)  Also,
Wade has used the phrase "data-driven" several times, but I don't see
what he's talking about. 

|                         The Writer Mechanism
|                       ------------------------

What Wade describes here is basically a tree-like data structure to
control writing an IFF file.  The tree would presumably be traversed by
the writer library code and user functions would be called to write the
actual bytes of data.  Wade refers to this as "data driven."

| Whereas IFF '85 reader/writers' require re-compilation of the
| source to accomodate format updates, IFF '88 will not.

But I don't get it.  Sure, the data structure controls the chunk
nesting, but the actual business of writing bytes gets handled by user
code, so where's the extendability?  I still have to have the code to
write the chunks in my program which means re-compilation when something
changes.

|         In order to write a file an implementation first creates and
| properly initializes a writer-structure, then calls the writer function
| which parses the structure and writes the file.

Wade details an example of writing an ILBM using his mechanism.  The
client program builds a data structure representing the structure of the
file to be written which includes function pointers for wrting each
chunk.  

|    {WLevel}
|     {WEntry}
|       ckID = "FORM";
|       WLev --------> {WLevel}
|      Next = NIL;      {WEntry}
|                         ckID = "ILBM";
|                         WLev -----> {WLevel}
|                        Next = NIL;   {WEntry}
|                                        ckID = "BMHD";
|                                        WrtAlg = ADR(WriteBytes());
|                                        WrtData ---> {WrtAlgParams}
|                                       Next            ADR(BitMapHdr);
|                                         |             TSIZE(BitMapHdr);
					... etc. ...

However, since the structure of the file and the code to write the
actual bytes are both provided by the client program, I fail to see how
creating this structure and passing it to a generic writer is any
different from just having the following piece of code in the client
program: 

	/* WriteILBM: bitmap, colormap */

	PushChunk (iff, ID_FORM, ID_ILBM);

		PushChunk (iff, ID_BMHD, 0);
		WriteBMHD (bitmap);
		PopChunk (iff);

		PushChunk (iff, ID_CMAP, 0);
		WriteCMAP (colormap);
		PopChunk (iff);

		PushChunk (iff, ID_BODY, 0);
		WriteBODY (bitmap);
		PopChunk (iff);

	PopChunk (iff);

(This is an actual example of the use of the iff.library.  RSN!)  It
seems that for either method I need to have the writing code in my
program, and I need to know the structure of the file I want.  If
anything, I would think that constructing a large tree data structure
would be more difficult than just having code to write the file
directly.  What's the advantage?

I'm genuinly interested in this *mechanism* for reading and writing
files since it should work equally well for real IFF files.  If there
are real advantages to this, Wade, I missed them.  Could you provide an
example of a file format changing and user programs not needing to be 
recompiled?


On separate issue, Wade talks about "dirty" chunks.

|  [Another problem with IFF]
|   is that there are no dirty chunk provisions.  I feel
| that dirty chunk tracking would be a valuable option.  Dirty chunks would 
| occur when, after finding some recognized chunks, unrecognized chunks are
| encountered.  IFF '85 discards these chunks.  I propose that as a user option
| unrecognized chunks be retained when a program modifies a partially understood
| IFF '88 file.
 [ ... ]
| When unrecognized chunks are written they're marked as dirty,
| and any chunks which have been modified are also noted.

This issue is discussed somewhat in the EA IFF 85 specification on page
B-31 of the Exec RKM.  Their conclusion is that the data universe
encompassed by IFF is too large to allow for standardization of the
possible interactions of the various types of data envolved.

While this is an interesting and valid idea, it really makes life 
miserable for programmers.  They have to retain chunks they don't need, 
don't understand and can't use, just so they can write them out again
trying to preserve the original IFF file as much as possible only to
fail much of the time.  It also means that all programs need to fully 
support standard chunks so that standard chunks will never be marked as 
dirty.  It also means that programs that use "non-standard" chunks need
to make some intelligent decisions about whether a chunk marked as 
dirty" is good within the context of a specific file.  It might be
possible, but it could also be a real headache.  I'm just not convinced 
that the advantages are great enough to want to provide such a
mechanism.

This facility can be provided for any new IFF formtypes, however, by 
equiping them with a "MAP" chunk (or some such, but it should be
consistent across FORMs) which contains a list of the chunks in the
file and their status.  It is not possible or even desireable to retro-
fit this capability into existing formats. 
-- 
		Stuart Ferguson		(shf@well.UUCP)
		Action by HAVOC		(shf@Solar.Stanford.EDU)

jojo@astroatc.UUCP (Jon Wesener) (09/22/88)

	I'd like to see the dependency on byte-sex disappear, you know,
Big Indian, Little Indian.  IFF is only good on machines with the same
byte sex as the 68000.  This has fouled me up on a number of occasions.

my 2 cents,
--j
-- 
... {seismo | harvard } ! {uwvax | cs.wisc.edu} ! astroatc!jojo
    "They weren't just any women...  They were women fresh from Hell, with
	wisps of devil-smoke curling behind their ears and unholy vapors hiding 
	in their tight leather dresses, looking for Mr. Right."

phil@titan.rice.edu (William LeFebvre) (09/26/88)

In article <1199@astroatc.UUCP> jojo@astroatc.UUCP (Jon Wesener) writes:
>
>	I'd like to see the dependency on byte-sex disappear, you know,
>Big Indian, Little Indian.

That's ENDIAN, not indian.

			William LeFebvre
			Department of Computer Science
			Rice University
			<phil@Rice.edu>

haitex@pnet01.cts.com (Wade Bickel) (09/26/88)

	Thanks Stu, for the explanation of IFF '85.  I had not looked at the
sub specifier in that light, and now see that it is indeed a good idea.
I still think it would be nice to incorporate the individual BAD chunk
handling and Dirty/Modified chunk bits (please remember this was to be a
user/program option), but not sufficeint to warrent re-doing IFF.  A MAP
chunk is of course one alternate solution, but seems to me to be clumsy
and inefficient (what happens when parallel tracks of audio data are being
stored?).
 
>But the crux of Wade's proposal is this:
>
>|         What I really want to do is create a purely Data driven mechanism, as
>| opposed to the Code driven one in the current IFF.  Rather than having to 
>| write code to handle each type of occurance, a structure would be initialized
>| at run time, and this would be passed to the Reader or Writer parser to be
>| handled.  In this way it would never be necessary to update the Library(s).
>
>He's primarily interested in the mechanism for reading/writing such
>files, and writes about it to great length in his article.  If this
>mechanism were useful for reading and writing "IFF 88" files, then it
>would be equally applicable to existing IFF files just by changing the
>file grammar slightly, as I did above.
>
	Quite so.  I had originally intended to apply this to the IFF '85
standard.  Then, in my telephone discussion with you I had mistakenly come
to the conclusion that nesting control was limited in IFF '85, and thus
the first half of my proposal.

>Wade has used the phrase "data-driven" several times, but I don't see
>what he's talking about. 

	Simple.  In the current IFF system the knowledge to read/write IFF
files is contained in the IFF code.  Thus it is "code-driven".  To extend
the systems knowledge you must modify the code.  In the  "data-driven" system
I propose the knowledge is contained in data (which may include code).  Thus
the data, not the code, is modified to extend the systems knowledge.

>|                         The Writer Mechanism
>|                       ------------------------
>What Wade describes here is basically a tree-like data structure to
>control writing an IFF file.  The tree would presumably be traversed by
>the writer library code and user functions would be called to write the
>actual bytes of data.  Wade refers to this as "data driven."
>
>| Whereas IFF '85 reader/writers' require re-compilation of the
>| source to accomodate format updates, IFF '88 will not.
>
>But I don't get it.  Sure, the data structure controls the chunk
>nesting, but the actual business of writing bytes gets handled by user
>code, so where's the extendability?  I still have to have the code to
>write the chunks in my program which means re-compilation when something
>changes.
	
	Not so.  Only if you use some format which has not yet become part
of the standard library.  Under normal circumstances you would call routines
found in the standard support librarys.  Thus, a developer of a new format
includes a library of private support routines to be used with the
system.  It would also be possible (though a bit cludgy) to include routines
that are part of your active code (ie: not in a library).  An AddFunc() type
routine to add these calls to a table and assign an index might be desirable.

>However, since the structure of the file and the code to write the
>actual bytes are both provided by the client program, I fail to see how
>creating this structure and passing it to a generic writer is any
>different from just having the following piece of code in the client
>program: 
>
>	/* WriteILBM: bitmap, colormap */
>
>	PushChunk (iff, ID_FORM, ID_ILBM);
>
>		PushChunk (iff, ID_BMHD, 0);
>		WriteBMHD (bitmap);
>		PopChunk (iff);
>
>		PushChunk (iff, ID_CMAP, 0);
>		WriteCMAP (colormap);
>		PopChunk (iff);
>
>		PushChunk (iff, ID_BODY, 0);
>		WriteBODY (bitmap);
>		PopChunk (iff);
>
>	PopChunk (iff);
>
>(This is an actual example of the use of the iff.library.  RSN!)  It
>seems that for either method I need to have the writing code in my
>program, and I need to know the structure of the file I want.  If
>anything, I would think that constructing a large tree data structure
>would be more difficult than just having code to write the file
>directly.  What's the advantage?
>
	Think about the control structure from the point of view of the
file reader.  It contains the sum of what the library and your code know
about IFF files.

        Also, how about the following minor change in the way
you handle file writing:

	PushChunk (iff, ID_FORM, ID_ILBM);

		PushChunk (iff, ID_BMHD, 0);
		WriteBMHD (bitmap);
		PopChunk (iff);

		PushChunk (iff, ID_CMAP, 0);
		WriteBytes (colortable,size);  (* colortable is a ptr *)
		PopChunk (iff);

		PushChunk (iff, ID_BODY, 0);
		WriteBODY (bitmap);
		PopChunk (iff);

	PopChunk (iff);
	
Of course, these are supported types.  Lets pretend they're not.  In this
case it would look something like this:


	PushChunk (iff, ID_FORM, ID_ILBM);

		PushChunk (iff, ID_BMHD, 0);
		bmhd := MakeBMHD(bitmap);
		WriteBytes (bmhd, sizeofBMHD); 
		PopChunk (iff);

		PushChunk (iff, ID_CMAP, 0);
		WriteBytes (colortable,sizeofCT);  (* colortable is a ptr *)
		PopChunk (iff);

		PushChunk (iff, ID_BODY, 0);
		WriteCustom (rtn, bitmap);     (* rtn is a ptr to code *)
		PopChunk (iff);

	PopChunk (iff);
	
	
The chunks are now all generic.  The library supports them even though they
may not exist at the time of libary compilation.  "rtn" is the address of
a routine expecting one parameter on the stack, in this case a BitMap pointer.
In this instance, it would be a routine in the library, but it could be any
routine.

I have not decided on the best way to handle custom routines, and would like
suggestions.  One parameter on the stack, push everything but D0,
seems good to me.  In practice it would probably be best to reference such
routines from a jumptable.

>I'm genuinly interested in this *mechanism* for reading and writing
>files since it should work equally well for real IFF files.  If there
>are real advantages to this, Wade, I missed them.  Could you provide an
>example of a file format changing and user programs not needing to be 
>recompiled?

I had not pictured the file format changing.  Rather, I had been thinking
of how to handle new IFF data types.  With EA's code, I found it somewhat
frustrating to have to go in, modify parts of the code, and re-compile
it to add new chunks.  What I want to be able to do is inform the system
of a new data type (and the rules for reading it), without touching the
original source.

Why?  Well, for one thing, having myriads of modified IFF reader code
all based on the same originals, but containing different mods, is a mess.
More importantly, library management, should your IFF library be made available,
will be a mess if the library is constantly being updated to support new
data types.

Your example writer code looks superior to the writer structure method
I described.  The structure really comes into use for the reading of 
files, but since describing the writer was simpler than describing the
reader, and we have to have a format before we can discuss how to read
it, I chose to describe the writer.  Please take a look at the writer
system I described, and imagine using the same method for reading the
files.  Some method of creating "groups" of items on the same level
would be desireable, which complicates the format, but is irrelevant
as far a theory goes.

Basically, a system of nodes is created which describes readable files, and
how to read them.  These nodes reference *knowledge*, which can be in the
IFF library(s) or elsewhere.  In this way, programs using new IFF data
types can describe only the new knowledge to the system. 

For instance, lets suppose the library understands ILBMs but not ANIM's.
The program defines ANIM by first calling library routines to generate a 
structure which reads ILBMs, then adds nodes which describe ANIM's and
how to read them at the appropriate points.  Now the system understands
and can read ANIM. 

Periodically new types would be added to the library, as circumstances warrant.

Standard calls to write/read standard types would also be in the library,
to make things easy for the novice, and act as jumping off points for the
designer of new IFF data types.

----------------------------------------

>On separate issue, Wade talks about "dirty" chunks.

>While this is an interesting and valid idea, it really makes life 
>miserable for programmers.  They have to retain chunks they don't need, 
>don't understand and can't use, just so they can write them out again
>trying to preserve the original IFF file as much as possible only to
>fail much of the time.  It also means that all programs need to fully 
>support standard chunks so that standard chunks will never be marked as 
>dirty.  It also means that programs that use "non-standard" chunks need
>to make some intelligent decisions about whether a chunk marked as 
>dirty" is good within the context of a specific file.  It might be
>possible, but it could also be a real headache.  I'm just not convinced 
>that the advantages are great enough to want to provide such a
>mechanism.

	Simple, just turn it off.  It should be a switch in the library,
If the switch is on when the library reads a file, it tracks dirty chunks.
If a file containing dirty chunks is read, it looks at the reader control
structure (inited by the useing program) and determines which chunks are
bad.  If the switch is off unrecognized chunks or dirty chunks are ignored
on reads.

>This facility can be provided for any new IFF formtypes, however, by 
>equiping them with a "MAP" chunk (or some such, but it should be
>consistent across FORMs) which contains a list of the chunks in the
>file and their status.  It is not possible or even desireable to retro-
>fit this capability into existing formats. 

True.  However, I think it was a mistake in the format not to leave a WORD for
this.  Not leaving a word for the future is shortsightedness.  Using a Map
chunk will be messy, making dirtychunk tracking harder than it would otherwise
be.

Also, what about data correction for chunks?  It won't be that much longer
and musicians will be using IFF to store CD quality sound.  We can't be
throughing away a file just because one bit has gone bad in a chunk.


						Thanks,
						
							Wade.


UUCP: {cbosgd, hplabs!hp-sdd, sdcsvax, nosc}!crash!pnet01!haitex
ARPA: crash!pnet01!haitex@nosc.mil
INET: haitex@pnet01.CTS.COM
Opionions expressed are mine, and not necessarily those of my employer.