[comp.sys.amiga.tech] I/O of complex data structures in C

jasonf@cetemp.Eng.Sun.COM (Jason Freund) (08/03/90)

-- somewhat hypothetical situation representing a real problem --

	Ok.  Suppose I am writing a game in ansi C that saves a bunch of 
different maze levels as separate files.  The player walks around, changes
some things, and then leaves.  When the game starts, I want to load level 1.
Everytime he changes levels, I load up the new level and save the old one.

	Basically, a maze is a complex data structure (a 2D array of and array
of pointers to blah, blah... (it's deep)).  So that means I want to use fread()
and fwrite() (right?)  My programming book says *very* little on those
commands, but what they do say leads me to believe that those are the commands
I want.  

	When you save data in a database, does the program just go:
"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically
saves every piece of data (specified in the arguments) in such a way that it
will be able to read in every piece of data back into their correct cells in
the data structure?  That is what I want to do -- and I want to know if fread
and fwrite can do it.

	Could someone explain in some detail what the arguments mean? Or point
me to a source that could?

Thanks,

Jason Freund, Sun Microsystems,  jasonf@cetemp.Corp.sun.com  <== summer address
Deprtmnt of Computer Science, Univ California, Davis. freund@sakura.ucdavis.edu
Quantum Link: JasonF5,  Compu$erve: 72007,244, 690 Erie Dr, Sunnyvale, CA 94087
-------------------------------------------------------------------------------
STOLEN QUOTES -- Please give the authors credit if you know who they are!    
"To understand recursion, you need to understand recursion."
"Wow!  Virtual memory!  Now I'm gonna build me a REALLY big ram disk!"
"My other computer is a SUN3/50."  "E. Pluribus UNIX"   -- authors unkown

jmeissen@oregon.oacis.org ( Staff OACIS) (08/04/90)

In article <140087@sun.Eng.Sun.COM> jasonf@cetemp.Eng.Sun.COM (Jason Freund) writes:
>	When you save data in a database, does the program just go:
>"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically
>saves every piece of data (specified in the arguments) in such a way that it
>will be able to read in every piece of data back into their correct cells in
>the data structure?  That is what I want to do -- and I want to know if fread
>and fwrite can do it.
>
When you call fwrite, it treats the memory pointed to (in this case, your
structure) as a contiguous block of memory of the specified size. It makes
no attempt to interpret the elements of the structure. If the structure contains
only data, then this is not a problem. If, however, it contains pointers, then
you have a problem because the items pointed to will not be in the same location
when you read the data back in.

>	Could someone explain in some detail what the arguments mean? Or point
>me to a source that could?
>
pointer = the address of the data block to write out to the file
sizeof(*pointer) = the size of what the pointer points to, in other words the
                   size of the data block being written
items   = the number of blocks of that size contained in the buffer
stream  = the file handle for the open file to write to.

In other words, the routine writes (items * sizeof(*pointer)) bytes to file
"stream" from the buffer pointed to by "pointer".


-- 
John Meissen .............................. Oregon Advanced Computing Institute
jmeissen@oacis.org        (Internet) | "That's the remarkable thing about life;
..!sequent!oacis!jmeissen (UUCP)     |  things are never so bad that they can't
jmeissen                  (BIX)      |  get worse." - Calvin & Hobbes

eeh@btr.BTR.COM (Eduardo E. Horvath eeh@btr.com) (08/04/90)

[ Eat this you !#%&* line-eater ]

In article <140087@sun.Eng.Sun.COM> jasonf@cetemp.Eng.Sun.COM (Jason Freund) writes:
>-- somewhat hypothetical situation representing a real problem --

>	Basically, a maze is a complex data structure (a 2D array of and array
>of pointers to blah, blah... (it's deep)).  So that means I want to use fread()
>and fwrite() (right?)  My programming book says *very* little on those

  ********** NO DON'T DO THAT!!!!! ************

	fread() and fwrite() simply write out the image of whatever they point
to.  fwrite(char *b, int bsize, int n, FILE *fp) works like this: it will take
whatever data is at location <*b> and put <bsize * n> bytes of it into file
<*fp>. 

	This would work fine if the program was always loaded into the same place.
On almost all computers (including the PC and Mac and especially (sp?) on the
Amiga) programs are relocated when they are loaded because different parts of 
memory are being used for something else and the program must find other memory.
You can never be certain that a program will load in the same place twice.
The Amiga can have other programs that have grabbed previously free pieces
of RAM.  PC can have TSR's, and the Mac OS does run-time relocations of data.

	What does this mean to you?  Your pointers will most certainly be well
and truly trashed.  Your database will be garbagy gobbldygook.  What you must
do is traverse your database and save each node separately.  (fwrite() may work.)
You must do this in a way that the pointers are not necessary for reconstructing
the database.  Then when you read it back in, your database must be carefully
reconstructed by adding one node at a time.

>Jason Freund, Sun Microsystems,  jasonf@cetemp.Corp.sun.com  <== summer address
>Deprtmnt of Computer Science, Univ California, Davis. freund@sakura.ucdavis.edu
>Quantum Link: JasonF5,  Compu$erve: 72007,244, 690 Erie Dr, Sunnyvale, CA 94087


---------------------------------------------------------------------
Edurardo Horvath			eeh@btr.com
					..!{decwrl,mips,fernwood}!btr!eeh
	"Trust me, I know what I'm doing." 	- Sledge Hammer
---------------------------------------------------------------------

chris@mimsy.umd.edu (Chris Torek) (08/04/90)

In article <140087@sun.Eng.Sun.COM> jasonf@cetemp.Eng.Sun.COM
(Jason Freund) writes:
>Followup-to: s

(There is no such newsgroup.  I put followups back in the groups in
which the original appeared.)

>Basically, a maze is a complex data structure (a 2D array of and array of
>pointers to blah, blah... (it's deep)).  So that means I want to use fread()
>and fwrite() (right?).

Maybe; indeed, even probably:

>When you save data in a database, does the program just go:
>"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically
>saves every piece of data (specified in the arguments) in such a way that it
>will be able to read in every piece of data back into their correct cells in
>the data structure?

No.

The primary rule of magic is this: `There is no magic'.  Fread and fwrite
read and write binary data (given a binary stream, as opened via, e.g.,
fopen(name, "wb")) by writing `nitems' objects, each of whose size is
as given, from the location given by the pointer argument.  If each object
is composed of simple bytes (e.g., ASCII or EBCDIC or ISO Latin 1 text),
those bytes will be written directly to the stream.  If each object is
composed of something more complicated, the bytes making up that object
will be written directly to the stream, whether or not that is a sensible
thing to do.

In particular, if the bytes composing the object are in the form of a
pointer, the resulting pointer (when read back via fread) is not
guaranteed to be sensible.  It will have the same bit pattern that the
original pointer had, but that bit pattern may no longer be a valid
pointer value---even if the reading is done by the same run of the
same program (garbage collecting implementations of C are legal, if
rare).  If the reading is done by a different run, or---consider the
effect of fixing a bug in the game---a different but similar program,
the chances are great that the bit pattern will not be useful.

So what can you do?  There are many approaches.  You can assume (as the
Unix `rogue' game does) that different runs of the same program will
be able to make use of the old values, and instead of writing out just
what you need, write out all data.  This approach has its pitfalls,
as anyone who had a winning game of rogue and saved it for the 17th
time will know.  You can convert pointers into indicies (provided that
the pointers point into contiguous memory regions), and write only
the data you need.  You can assume that the values of pointers can be
used to uniquely identify every object, no matter what its type, and
write the data `as is' but convert the pointers when reading back.

We used this latter approach to save arbitrary Lisp data in files for
later recovery, although in this case the saving routine had to worry
about circular data structures and was therefore more complicated---
the output format was a sequence of <id,tag,bytes> triples.  The id was a
unique identifier---probably actually the original pointer value---and
the tag and bytes appeared if and only if the id had not yet appeared
in the save file.  Id 0 was nil.  In this case the tag said what kind
of Lisp object the bytes represented, and if the object had pointers,
e.g., a dotted pair, the <bytes> were themselves <id,tag,bytes>
sequences.  Thus, a list (a b a) was represented as, e.g.,

	id=1, tag=DTPR, bytes=(
	 car: id=2, tag=ATOM, bytes="a";
	 cdr: id=3, tag=DTPR, bytes=(
	  car: id=4, tag=ATOM, bytes="b";
	  cdr: id=5, tag=DTPR, bytes=(
	   car: id=2;
	   cdr: id=0
	  )
	 )
	)

(I simplified this example; the atoms were actually composed of
name, boundp, value-if-bound, property-list sequences.)

This sort of format is pretty much ideal for portability (particularly
if you record numeric values in printed form).  The major disadvantage
is that producing and reading such files tends to be slow.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris
	(New campus phone system, active sometime soon: +1 301 405 2750)

cmcmanis@stpeter.Eng.Sun.COM (Chuck McManis) (08/06/90)

In article <140087@sun.Eng.Sun.COM> (Jason Freund) writes:
>	When you save data in a database, does the program just go:
>"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically
>saves every piece of data (specified in the arguments) in such a way that it
>will be able to read in every piece of data back into their correct cells in
>the data structure?  That is what I want to do -- and I want to know if fread
>and fwrite can do it.

No, they can't do that. You see they context of what the pointers mean is
lost in the file. fread/write work fine for "flat" datastructures, ie ones
that don't have pointers in them. You need to marshall the data before
writing it to disk and that means some sort of code that understands your
data structures. 

You can cheat a bit (and save yourself some coding time) by using XDR 
(as in Sun RPC/XDR) to marsall the data into a byte stream and write
that stream to disk. Unmarsalling is accomplished by the same functions
so you would do something like :

	myfile = fopen("level1.xdr", "w");
	xdrstdio_create(&xdrs, myfile, XDR_ENCODE);
	xdr_foo_struct(&xdrs, &foo_struct);
	fclose(myfile);
	xdr_destroy(&xdrs);

	myfile = fopen("level2.xdr", "r");
	xdrstdio_create(&xdrs, myfile, XDR_DECODE);
	xdr_foo_struct(&xdrs, &foo_struct);
	fclose(myfile);
	xdr_destroy(&xdrs);

Source to the rpc/xdr package is available on rice.titan.edu for anonymous
ftp. Porting the XDR routines to the amiga is fairly trivial.

--
--Chuck McManis						    Sun Microsystems
uucp: {anywhere}!sun!cmcmanis   BIX: <none>   Internet: cmcmanis@Eng.Sun.COM
These opinions are my own and no one elses, but you knew that didn't you.
"I tell you this parrot is bleeding deceased!"

dillon@overload.Berkeley.CA.US (Matthew Dillon) (08/07/90)

>In article <25884@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>In article <140087@sun.Eng.Sun.COM> jasonf@cetemp.Eng.Sun.COM
>(Jason Freund) writes:
>>"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically
>>saves every piece of data (specified in the arguments) in such a way that it
>>will be able to read in every piece of data back into their correct cells in
>>the data structure?
>
>No.
>
>The primary rule of magic is this: `There is no magic'.  Fread and fwrite
> ...

    fwrite(ptr, blockSize, numBlocks, fi);

    const void *ptr;		pointer into memory, fwrite does not know or
				care what the pointer is actually pointing
				to!  const means the memory is not modified
				by fwrite() (obvious).

    int blockSize;		fwrite writes blockSize * numBlocks bytes
    int numBlocks;		of data from the memory location given by
				'ptr' to the file, no interpretation of the
				data is done.

    FILE *fi;			da file.


    Why am I interjecting what has already been said?  Because nobody has
    yet explained the one point that confused the hell out of me when I
    first started learning C, and that is WHY are there two size arguments?!!

    fwrite() simply multiplies numBlocks * blockSize to determine how many
    bytes to write to the file.  The ONLY difference between these two
    arguments is that fwrite() returns the number of successful blocks that
    were written, i.e. it returns  bytes_written_without_error / blockSize
    or a negative number on total error.  I do not even think fwrite() tries
    to keep low level flushes (write()s) on block boundries.

						    -Matt

--


    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

aduncan@rhea.trl.oz.au (Allan Duncan) (08/10/90)

From article <dillon.5326@overload.Berkeley.CA.US>, by dillon@overload.Berkeley.CA.US (Matthew Dillon):
>     fwrite(ptr, blockSize, numBlocks, fi);
 
>     const void *ptr;		pointer into memory, fwrite does not know or
> 				care what the pointer is actually pointing
> 				to!  const means the memory is not modified
> 				by fwrite() (obvious).
 
>     int blockSize;		fwrite writes blockSize * numBlocks bytes
>     int numBlocks;		of data from the memory location given by
> 				'ptr' to the file, no interpretation of the
> 				data is done.

Where did you find this?  It doesn't exist in K&R.  From V7 UNIX
Programmer's Manual :

fwrite( ptr, sizeof( *ptr ), nitems, stream )

fwrite appends at most nitems of data of the type *ptr beginning at ptr
to the named output stream.  It returns the number of items actually
written.

So you can see that the two are _not_ the same.  In detail, you either
write an item, or you don't - no half written items are permitted.
I looked at the source for the Manx call, and it uses two for loops to
call fputc(), not multiply block by items, then one loop.

Allan Duncan	ACSnet	a.duncan@trl.oz
(03) 541 6708	ARPA	a.duncan%trl.oz.au@uunet.uu.net
		UUCP	{uunet,hplabs,ukc}!munnari!trl.oz.au!a.duncan
Telecom Research Labs, PO Box 249, Clayton, Victoria, 3168, Australia.

peter@sugar.hackercorp.com (Peter da Silva) (08/10/90)

In article <dillon.5326@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>     Why am I interjecting what has already been said?  Because nobody has
>     yet explained the one point that confused the hell out of me when I
>     first started learning C, and that is WHY are there two size arguments?!!

So it can return the number of ojbects written to be compatible with fread?

:->

What *I* want to know is why all the stdio routines put the FILE* at the end
instead of the beginning so they can be compatible with fprintf.

And explain the rationale for the assymetry between gets() and fgets().

Sigh.
-- 
Peter da Silva.   `-_-'
<peter@sugar.hackercorp.com>.

mwm@raven.pa.dec.com (Mike (Real Amigas have keyboard garages) Meyer) (08/11/90)

In article <6323@sugar.hackercorp.com> peter@sugar.hackercorp.com (Peter da Silva) writes:
   What *I* want to know is why all the stdio routines put the FILE* at the end
   instead of the beginning so they can be compatible with fprintf.

   And explain the rationale for the assymetry between gets() and fgets().

Because some of them come from the Portable IO library, and were added
to the Standard IO library afterwards for backwards compatability.

BTW, use of gets() is considered a bug in some quarters (mostly those
that lost a weekend to the internet worm).

	<mike
--
Es brillig war. Die schlichte Toven			Mike Meyer
Wirrten und wimmelten in Waben;				mwm@relay.pa.dec.com
Und aller-mumsige Burggoven				decwrl!mwm
Die mohmem Rath' ausgraben.

dillon@overload.Berkeley.CA.US (Matthew Dillon) (08/14/90)

>Where did you find this?  It doesn't exist in K&R.  From V7 UNIX
>Programmer's Manual :
>
>fwrite( ptr, sizeof( *ptr ), nitems, stream )
>
>fwrite appends at most nitems of data of the type *ptr beginning at ptr
>to the named output stream.  It returns the number of items actually
>written.
>
>So you can see that the two are _not_ the same.  In detail, you either
>write an item, or you don't - no half written items are permitted.

    I believe I said that the only real difference is in the return value.
    Realistically it is not possible to guarentee that only a integral
    number of objSize bytes will be written because not even low level I/O
    (write()) will guarentee that!

    So while you can say conceptually that fwrite() will not write half-blocks
    when an error occurs, in real life it actually might.

>I looked at the source for the Manx call, and it uses two for loops to
>call fputc(), not multiply block by items, then one loop.

    Manx's fwrite() is a piece of c**p then... extremely slow.  But even
    using two for loops does not guarentee that half-writes will not occur
    on error.

					-Matt

>Allan Duncan	ACSnet	a.duncan@trl.oz
>(03) 541 6708  ARPA    a.duncan%trl.oz.au@uunet.uu.net
>		UUCP	{uunet,hplabs,ukc}!munnari!trl.oz.au!a.duncan
>Telecom Research Labs, PO Box 249, Clayton, Victoria, 3168, Australia.

				    -Matt

--


    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA