[comp.lang.c] Proposing a tool to make graph-structure I/O easy -- want to help?

eric@snark.UUCP (Eric S. Raymond) (09/23/88)

The recent discussion of byte-stream vs. record file access has reminded me of
an idea I've been kicking around for a couple of years and might actually be
ready to do for comp.sources.unix sometime soon (like, when the news B3.0 beta
test is done). This posting is to solicit email comments and assistance on it. 

I don't generally consider record-access libraries per se a win; they tend to
intruduce a lot of complexity for gains that are marginal at best. But I see 
a lot of uses for a sort of record-access-library generator, a non-procedural
declarative language like YACC or LEX that would grind out a pair of C
functions to read/write an arbitrarily complex C data structure *including*
(this is the tricky part) pointer links, and only the filled parts of arrays.

The paradigmatic immediate application I have in mind for this, BTW, is to
replace a lot of malloced graph structure I/O code I use for save/restore
of simulation states in a game I've built. So think in terms of an output side
that parses a collection of C struct and array declarations, and generates code
that'll pick up any graph or array of them and blat it out to disk in some
magic format that the input side can pick its way through to reassemble in
core.

Here's how it would work:

The specification language would have to declare, in each case, some particular
fixed-length object as the 'handle' of the graph-like things we're generating
I/O code for. You'd feed the generated output function the address of one of
these, and it would then traverse the DAG implied by the handle-object's
pointers in some canonical way, writing stuff as it goes and doing the magic
needed to represent link fields in a restorable form (see below).

1. In the degenerate case where the declared 'handle' type has no pointers,
tool should generate trivial functions using fread/fwrite.

2. In the slightly less degenerate case of an array with accessible length
field, it should read/write a format consisting of a length-containing header
block followed by the array contents.

3. Unions could be written out after a leading 'variant-tag' header specifying
which alternative to assume when the input side is trying to restore pointer
links.

4. Now, here's the tricky part. To represent graph pointers, write all the
graph nodes of each kind into an array, with the pointers to that kind replaced
by bits having the numeric value of array indices.

5. All other cases can be built up by recursion from 1-4.

Note that we can get other things from the parse process -- such as pretty-
printer for the graph type, and a pair of alloc/dealloc functions for it.

I've even got a pretty good idea where I can swipe working code for most of the
declaration parse phase -- cdecl. It has to be doing the same kind of recursive
walk-through of nested C types I'm contemplating, n'est ce pas?

Now, onto the design of the specification language:

Obviously, we need to be able to declare a given C type as the 'handle' of the
graph-type we want to do I/O on.

The specification language would also have some way of associating arrays with
scalar length fields, so the output code could know about actual lengths of
arrays, and the input code know where to find them.

Similarly, it must be able to declare associations of unions with 'type' fields
within or outside the union.

For convenience, you want to be able to declare the names of the generated I/O
function pair.

This may be all it needs. But I have remaining open questions you can structure
your *email* (that's a hint; I'll summarize to the net) reply around.

1. Is this really a new idea, or has someone already written it?.

2. If the answer to 1 is 'no', do you see holes in the idea that I don't?

3. It would be nice if the generated file formats were network-portable, but
   that's orthogonal to the issues discussed above. How does this proposal fit
   with things like Sun's XDR or ASN.1?

4. Are you willing to help me do it? Are you willing to do it while I look over
   your shoulder? (I'm not lazy, it's just that B3.0 eats up all my time...).

Again, what I'd like to do is write this (or help someone else write it) for
comp.sources.unix under something like a GNU copyleft. Suggestions for
features, code we can build on, or even just a cute name for the tool would
be welcomed.
-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: ...!{uunet,att,rutgers}!snark!eric = eric@snark.UUCP
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718