[comp.lang.c] I/O of complex data structures in C

jasonf@cetemp.Eng.Sun.COM (Jason Freund) (08/03/90)

-- somewhat hypothetical situation representing a real problem --

	Ok.  Suppose I am writing a game in ansi C that saves a bunch of 
different maze levels as separate files.  The player walks around, changes
some things, and then leaves.  When the game starts, I want to load level 1.
Everytime he changes levels, I load up the new level and save the old one.

	Basically, a maze is a complex data structure (a 2D array of and array
of pointers to blah, blah... (it's deep)).  So that means I want to use fread()
and fwrite() (right?)  My programming book says *very* little on those
commands, but what they do say leads me to believe that those are the commands
I want.  

	When you save data in a database, does the program just go:
"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically
saves every piece of data (specified in the arguments) in such a way that it
will be able to read in every piece of data back into their correct cells in
the data structure?  That is what I want to do -- and I want to know if fread
and fwrite can do it.

	Could someone explain in some detail what the arguments mean? Or point
me to a source that could?

Thanks,

Jason Freund, Sun Microsystems,  jasonf@cetemp.Corp.sun.com  <== summer address
Deprtmnt of Computer Science, Univ California, Davis. freund@sakura.ucdavis.edu
Quantum Link: JasonF5,  Compu$erve: 72007,244, 690 Erie Dr, Sunnyvale, CA 94087
-------------------------------------------------------------------------------
STOLEN QUOTES -- Please give the authors credit if you know who they are!    
"To understand recursion, you need to understand recursion."
"Wow!  Virtual memory!  Now I'm gonna build me a REALLY big ram disk!"
"My other computer is a SUN3/50."  "E. Pluribus UNIX"   -- authors unkown 

chris@mimsy.umd.edu (Chris Torek) (08/04/90)

In article <140087@sun.Eng.Sun.COM> jasonf@cetemp.Eng.Sun.COM
(Jason Freund) writes:
>Followup-to: s

(There is no such newsgroup.  I put followups back in the groups in
which the original appeared.)

>Basically, a maze is a complex data structure (a 2D array of and array of
>pointers to blah, blah... (it's deep)).  So that means I want to use fread()
>and fwrite() (right?).

Maybe; indeed, even probably:

>When you save data in a database, does the program just go:
>"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically
>saves every piece of data (specified in the arguments) in such a way that it
>will be able to read in every piece of data back into their correct cells in
>the data structure?

No.

The primary rule of magic is this: `There is no magic'.  Fread and fwrite
read and write binary data (given a binary stream, as opened via, e.g.,
fopen(name, "wb")) by writing `nitems' objects, each of whose size is
as given, from the location given by the pointer argument.  If each object
is composed of simple bytes (e.g., ASCII or EBCDIC or ISO Latin 1 text),
those bytes will be written directly to the stream.  If each object is
composed of something more complicated, the bytes making up that object
will be written directly to the stream, whether or not that is a sensible
thing to do.

In particular, if the bytes composing the object are in the form of a
pointer, the resulting pointer (when read back via fread) is not
guaranteed to be sensible.  It will have the same bit pattern that the
original pointer had, but that bit pattern may no longer be a valid
pointer value---even if the reading is done by the same run of the
same program (garbage collecting implementations of C are legal, if
rare).  If the reading is done by a different run, or---consider the
effect of fixing a bug in the game---a different but similar program,
the chances are great that the bit pattern will not be useful.

So what can you do?  There are many approaches.  You can assume (as the
Unix `rogue' game does) that different runs of the same program will
be able to make use of the old values, and instead of writing out just
what you need, write out all data.  This approach has its pitfalls,
as anyone who had a winning game of rogue and saved it for the 17th
time will know.  You can convert pointers into indicies (provided that
the pointers point into contiguous memory regions), and write only
the data you need.  You can assume that the values of pointers can be
used to uniquely identify every object, no matter what its type, and
write the data `as is' but convert the pointers when reading back.

We used this latter approach to save arbitrary Lisp data in files for
later recovery, although in this case the saving routine had to worry
about circular data structures and was therefore more complicated---
the output format was a sequence of <id,tag,bytes> triples.  The id was a
unique identifier---probably actually the original pointer value---and
the tag and bytes appeared if and only if the id had not yet appeared
in the save file.  Id 0 was nil.  In this case the tag said what kind
of Lisp object the bytes represented, and if the object had pointers,
e.g., a dotted pair, the <bytes> were themselves <id,tag,bytes>
sequences.  Thus, a list (a b a) was represented as, e.g.,

	id=1, tag=DTPR, bytes=(
	 car: id=2, tag=ATOM, bytes="a";
	 cdr: id=3, tag=DTPR, bytes=(
	  car: id=4, tag=ATOM, bytes="b";
	  cdr: id=5, tag=DTPR, bytes=(
	   car: id=2;
	   cdr: id=0
	  )
	 )
	)

(I simplified this example; the atoms were actually composed of
name, boundp, value-if-bound, property-list sequences.)

This sort of format is pretty much ideal for portability (particularly
if you record numeric values in printed form).  The major disadvantage
is that producing and reading such files tends to be slow.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris
	(New campus phone system, active sometime soon: +1 301 405 2750)