[comp.lang.c] Binary data files -- a compromise.

peter@ficc.uu.net (Peter da Silva) (05/03/89)

This is kind of getting peripheral to comp.lang.c, so observe followups.

A compromise between binary and ascii data is a tagged binary data file,
where objects (simple or complex) are tagged with a type. The one such
format I'm familiar with is the Electronic Arts Interchange File Format
which has become a standard on the Commodore Amiga. Basically the file
format is a series of chunks, each of which is tagged with a 4-character
type and a 4-byte big-endian length.

	struct chunk {
		char ChunkID[4]; /* Type of chunk */
		unsigned long ChunkSize; /* Size of chunk in bytes */
		whatever_type_it_is data[]; /* Not legal 'C' */
	};

The whole file is stored in a superchunk tagged with the ChunkID 'FORM',
'CAT ', or 'LIST' ('xxxx' is intended to indicate a non-null-terminated
string). This superchunk has the following structure.

	struct superchunk {
		char ID[4]; /* Type 'FORM', 'CAT ', or 'LIST' */
		unsigned long Size; /* in bytes */
		char Type[4]; /* Type of stuff */
		struct chunk data[]; /* DEFINITELY not legal 'C' */
	};

Then things start getting complicated, because LISTs and CATs contain FORMs,
with LISTs being homogenous (all forms must have the type of the LIST) and
CATs being heterogenous. LISTs can also contain PROPs which contain defaults
for the FORMs... but the basic idea is quite simple, and in practice LISTs
and CATs aren't often used.

On the Amiga all sorts of binary data uses this format: images, samples,
instruments, songs, animations, and so on. Programs can share data fairly
easily, without the overhead of text which would be prohibitive on a floppy-
only system.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.