[comp.lang.lisp] The best way to save data

sbc@wucs1.wustl.edu (Steve B Cousins) (12/15/88)

Has anyone thought about the best way to save structures built
during an interactive lisp session so that they can be *efficiently*
loaded at a later date?  I'm wondering how writing the data as LISP
structures so that the resulting file can be loaded (using lisp load)
compares to just writing the data as a series of lines.  Is there
a better way that I haven't thought of?

I've been using the method of writing s-expressions to a file and using
load to bring them back, but it seems very slow to me.  Thanks for any
comments. [even moderate flames :-)]

Steve Cousins			sbc@wucs1.WUSTL.EDU
Washington University		"Procrastination is the root of all nighters"

barmar@think.COM (Barry Margolin) (12/16/88)

The easiest way to program it is by using WRITE and READ (not LOAD --
that expects the file to contain executable forms, not just random
data).  If portability is important, this is probably your best bet.
But the generality comes at a price: numeric data is not very compact,
and parsing is expensive.  Writing the data is usually pretty quick,
though (just make sure you specify :PRETTY NIL :CIRCLE NIL, or WRITE
will have to make multiple passes over the data).

If speed is more important than portability, though, you'd probably
want to write more specialized routines.  How easy this is will depend
on your data.  The expensive part of READ is that it must parse the
data in order to determine its type.  You can speed things up by
writing a single-byte type code followed by the data.  For
variable-length data you could include a length byte after the type
code.  You can then have type-specific input routines that read the
object as efficiently as you can devise.  Symbols could be written
just as their print names, without escape prefixes (since the length
code tells you exactly where the symbol name ends).  Small integers
could be encoded in a single byte, and longer integers could easily be
written with a length code followed by that many bytes of the actual
data, rather than encoding it in decimal as WRITE does.  You can even
handle data types that aren't ordinarily readable by READ.

This is quite a bit of work, but if you'll be writing and reading lots
of data it can be worthwhile.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

ruffwork@urania.CS.ORST.EDU (Ritchey Ruff) (12/16/88)

I'm in lisp and build a large data structure, then want to
save it to a file to read in later.  assume it's bound to
the atom *foo*, the problem is the reader is slow and unless
you want it even slower (by setting *PRINT-CIRCLE*) you
loose shared structure when you read it back in.  The next trick
gets around both.  This is what "foo.lisp" looks like -

	;;; foo.lisp assumes *foo* is bound to the data structure
	;;; you want to read back in later.  compile-file this
	;;; file to get a fasl file that will load quickly and
	;;; maintain shared structure that was in the original *foo*
	;;;
	(in-package 'foo)	; or whatever you want...
	;;;
	(setq *foo* '#.*foo*)

then from lisp you do 

	> (setq *foo* <some huge data structure>)
	(...)
	> (compile-file "foo")

Of course this is not human readable, and the output will
only read back into the lisp it was compiled from, but if
you are like me and need to dump 2 to 10 Meg structures to
files then read them back in this saves 2 to 5 minutes of
load time...

good luck,
--ritchey ruff

jeff@aipna.ed.ac.uk (Jeff Dalton) (12/19/88)

In article <7943@orstcs.CS.ORST.EDU> ruffwork@urania.CS.ORST.EDU (Ritchey Ruff) writes:
	    > (setq *foo* <some huge data structure>)
	    (...)
	    > (compile-file "foo")

    Of course this is not human readable, and the output will
    only read back into the lisp it was compiled from, but if
    you are like me and need to dump 2 to 10 Meg structures to
    files then read them back in this saves 2 to 5 minutes of
    load time...

The problem with this trick is that it's not guaranteed to work
in every Common Lisp.  You can do it, but some CL's (e.g., KCL)
use PRINT and READ to dump and load data in compiled files, so
you won't necessarily save anything over using PRINT and READ
yourself.  And even if CL changes on the way to standardization
to rule out using READ to load compiled data, there's still
no guarantee that the operation will be much faster than READ.

BTW, this suggests that the dump/load used for compiled data
should be available directly, without compiling a file.  But
it's quite common in Common Lisp for such operations (where
you know they're there and want to get at them) to be unavailable.

vaughan@cadillac.CAD.MCC.COM (Paul Vaughan) (12/20/88)

In article <414@aipna.ed.ac.uk>, jeff@aipna.ed.ac.uk (Jeff Dalton) writes:

> should be available directly, without compiling a file.  But
> it's quite common in Common Lisp for such operations (where
> you know they're there and want to get at them) to be unavailable.

It depends on whose "Common" Lisp you use.  In Symbolics Common Lisp, there is si:dump-forms-to-file that does this rather satisfactorily and certainly beats working out your own data encoding scheme.  Sorry if this doesn't help.

An interesting point about this whole technique is that the code that is dumped must reference a free variable in order to create a side effect on the program that loads it.  It's not possible, for instance, to just dump a data structure, or even a form for creating the data structure and then read it in as if it were an object.  The file must be LOADED, not read, and the code in it must perform some sort of side effect on the current environment in order to accomplish anything at all.
-- 

 Paul Vaughan, MCC CAD Program | ARPA: vaughan@mcc.com | Phone: [512] 338-3639
 Box 200195, Austin, TX 78720  | UUCP: ...!cs.utexas.edu!milano!cadillac!vaughan

mdb@cpsc.ucalgary.ca (Mark Brinsmead) (12/23/88)

In article <414@aipna.ed.ac.uk>, jeff@aipna.ed.ac.uk (Jeff Dalton) writes:

> BTW, this suggests that the dump/load used for compiled data
> should be available directly, without compiling a file.  But
> it's quite common in Common Lisp for such operations (where
> you know they're there and want to get at them) to be unavailable.

   In fact, these functions are available in Symbolics Common Lisp,
and I just happen to have used them in a function which saves the state
of a program, some of whose "state" variables happen to be hash tables.
Symbolics *does* document this as non-standard and likely to be non-portable,
but it was still very handy. Just for my 2 cents worth, it would be a real
bonus to be able to save (in binary format) *any* data structure.

                              Mark Brinsmead @ UofC