[comp.lang.lisp] big programs? data use?

wilson@uicbert.eecs.uic.edu (01/06/89)

I am trying to get a handle on overall memory usage by serious Lisp programs.

I used to thing that serious programs on big systems (e.g., Symbolics, TI
Explorer) often had a large amount of data that was live at any given time.
Now I am not so sure.

I am moving toward the view that most programs never use more than a few
megabytes of data, and that very few ever have even one megabyte of data
that is live at any given time.  (That is, even if they allocate a dozen
or so megabytes, over 90% of it dies very fast and there's really not
all that much around.)

Does anybody want to disabuse me of this impression?  Have your programs
ever had a megabyte survive a garbage collection?  (Without scavenging
system code and data, that is.)

Has anybody out there ever had a program with ten megabytes of live data?
Twenty?  (Including big arrays for numerical programs, but maybe not
counting pixmaps for graphics.)

If anybody has any real data on this, or reasonable anecdotal
reports, please respond.  If your gc gives statistics after a scavenge,
I'd be interested in the kind of numbers you see, both typically and
occasionally.  (Things like the size of older generations and amount
surviving, about how often they have to be collected, etc.).

I'm trying to figure out how much the memory requirements and paging costs
in Lisp systems is because of the programs they run, and how much is
accounted for by the system itself.  I would like to get a better handle
on what accounts for most locality problems:  system vs. user, code vs.
data, etc.

Note:  I'm not implying systems shouldn't use considerable resources if
       it's necessary to provide the desired functionality.  And I know
       that increased size of the system (e.g. more builtin functions)
       can decrease the apparent size of user code.  But knowing
       where the problems are is always good, even if they're there
       for good reasons.


Paul R. Wilson                         
Human-Computer Interaction Laboratory
U. of Illin. at C. EECS Dept. (M/C 154)   wilson%uicbert@uxc.cso.uiuc.edu
Box 4348   Chicago,IL 60680

leverich@randvax.UUCP (Brian Leverich) (01/10/89)

Knowledge-based simulation models in LISP (like simulation models in
general) can require megs of active memory.  That's because the models
contain hundreds or thousands of objects, each object has dozens of
attributes, and an average attribute may tie up anywhere from 2 to several
thousand cons cells.

For all you garbage collection gurus out there, you might have some fun
interacting with the KBSim community.  Turns out that attributes almost
never point into the center of structures, and attributes are generally
trashed as a whole.  Seems like you could use these characteristics to
make particularly efficient garbage collectors for KBSim applications.

Cheers, -B
-- 
  "Simulate it in ROSS"
  Brian Leverich                       | U.S. Snail: 1700 Main St.
  ARPAnet:     leverich@rand-unix      |             Santa Monica, CA 90406
  UUCP/usenet: decvax!randvax!leverich | Ma Bell:    (213) 393-0411 X7769

wilson@uicbert.eecs.uic.edu (01/14/89)

I've been wondering about this issue, both for KB simulation and for
various other kinds of AI programs.

The responses to my original posting *seem* to be pretty consistent
with what I'd been thinking:  *most* people don't often have a lot
of live program data.  A few people have a tremendous amount, though.

There are problems with taking this unscientific poll seriously --
I have no idea how representative the responses are, and I don't
know which is the chicken and which the egg.  Would people use
enormous amounts of memory if it cost significantly less. Maybe a
lot of people need a lot of memory and they're all using FORTRAN,
but will all switch to Common Lisp soon.  Who knows? 

Actually, for my purposes, huge amounts of data are ok if access patterns
are reasonably uneven.  That is, if there's decent locality of reference
and particularly of *changes* to old data.  I'm working on virtual
copy mechanisms coordinated with garbage collection, and design
tradeoffs depend on whether modified locations in older generations
are typically pretty close to each other.  (This is also important
for garbage collectors that scan dirty pages of old memory to find
all of the pointers into new memory.)

Can anybody who has a program that keeps a lot of live data around
tell me whether much of it is altered, and especially whether much
is altered very long after it's created?

I'm particularly interested in locality in AI systems like RETE matchers,
and in simulation languages.  I don't know either of them well enough
to guess.  My understanding is that in Prolog systems, most changed
fields are in an activation stack or near to it.

I'm pretty firmly convinced that most programs that have a huge amount
of data don't ever modify most of it, but instead just keep it sitting
around (and search through it) until it becomes garbage.  In the extreme,
this is trivially true because if a program creates objects fast
enough, it won't have any time to modify them.  Short of that extreme,
I'm trying to get a handle on typical distributions of changes among
data objects (and especially clumps of data objects allocated around
the same times).

Does anybody have any counterexamples?  That is, programs that generate
a lot of data that lives quite a while AND which go back through that
data making widely-distributed changes now and then?  Maybe an enormous
cellular automaton or something?  Do normal simulation programs often
have such nasty tendencies?  Any impressions are welcome.

   -- Paul


Paul R. Wilson                         
Human-Computer Interaction Laboratory    lab ph.: (312) 413-0042
U. of Ill. at Chi. EECS Dept. (M/C 154)  wilson@uicbert.eecs.uic.edu
Box 4348   Chicago,IL 60680