wilson@uicbert.eecs.uic.edu (01/06/89)
I am trying to get a handle on overall memory usage by serious Lisp programs. I used to thing that serious programs on big systems (e.g., Symbolics, TI Explorer) often had a large amount of data that was live at any given time. Now I am not so sure. I am moving toward the view that most programs never use more than a few megabytes of data, and that very few ever have even one megabyte of data that is live at any given time. (That is, even if they allocate a dozen or so megabytes, over 90% of it dies very fast and there's really not all that much around.) Does anybody want to disabuse me of this impression? Have your programs ever had a megabyte survive a garbage collection? (Without scavenging system code and data, that is.) Has anybody out there ever had a program with ten megabytes of live data? Twenty? (Including big arrays for numerical programs, but maybe not counting pixmaps for graphics.) If anybody has any real data on this, or reasonable anecdotal reports, please respond. If your gc gives statistics after a scavenge, I'd be interested in the kind of numbers you see, both typically and occasionally. (Things like the size of older generations and amount surviving, about how often they have to be collected, etc.). I'm trying to figure out how much the memory requirements and paging costs in Lisp systems is because of the programs they run, and how much is accounted for by the system itself. I would like to get a better handle on what accounts for most locality problems: system vs. user, code vs. data, etc. Note: I'm not implying systems shouldn't use considerable resources if it's necessary to provide the desired functionality. And I know that increased size of the system (e.g. more builtin functions) can decrease the apparent size of user code. But knowing where the problems are is always good, even if they're there for good reasons. Paul R. Wilson Human-Computer Interaction Laboratory U. of Illin. at C. EECS Dept. (M/C 154) wilson%uicbert@uxc.cso.uiuc.edu Box 4348 Chicago,IL 60680
leverich@randvax.UUCP (Brian Leverich) (01/10/89)
Knowledge-based simulation models in LISP (like simulation models in general) can require megs of active memory. That's because the models contain hundreds or thousands of objects, each object has dozens of attributes, and an average attribute may tie up anywhere from 2 to several thousand cons cells. For all you garbage collection gurus out there, you might have some fun interacting with the KBSim community. Turns out that attributes almost never point into the center of structures, and attributes are generally trashed as a whole. Seems like you could use these characteristics to make particularly efficient garbage collectors for KBSim applications. Cheers, -B -- "Simulate it in ROSS" Brian Leverich | U.S. Snail: 1700 Main St. ARPAnet: leverich@rand-unix | Santa Monica, CA 90406 UUCP/usenet: decvax!randvax!leverich | Ma Bell: (213) 393-0411 X7769
wilson@uicbert.eecs.uic.edu (01/14/89)
I've been wondering about this issue, both for KB simulation and for various other kinds of AI programs. The responses to my original posting *seem* to be pretty consistent with what I'd been thinking: *most* people don't often have a lot of live program data. A few people have a tremendous amount, though. There are problems with taking this unscientific poll seriously -- I have no idea how representative the responses are, and I don't know which is the chicken and which the egg. Would people use enormous amounts of memory if it cost significantly less. Maybe a lot of people need a lot of memory and they're all using FORTRAN, but will all switch to Common Lisp soon. Who knows? Actually, for my purposes, huge amounts of data are ok if access patterns are reasonably uneven. That is, if there's decent locality of reference and particularly of *changes* to old data. I'm working on virtual copy mechanisms coordinated with garbage collection, and design tradeoffs depend on whether modified locations in older generations are typically pretty close to each other. (This is also important for garbage collectors that scan dirty pages of old memory to find all of the pointers into new memory.) Can anybody who has a program that keeps a lot of live data around tell me whether much of it is altered, and especially whether much is altered very long after it's created? I'm particularly interested in locality in AI systems like RETE matchers, and in simulation languages. I don't know either of them well enough to guess. My understanding is that in Prolog systems, most changed fields are in an activation stack or near to it. I'm pretty firmly convinced that most programs that have a huge amount of data don't ever modify most of it, but instead just keep it sitting around (and search through it) until it becomes garbage. In the extreme, this is trivially true because if a program creates objects fast enough, it won't have any time to modify them. Short of that extreme, I'm trying to get a handle on typical distributions of changes among data objects (and especially clumps of data objects allocated around the same times). Does anybody have any counterexamples? That is, programs that generate a lot of data that lives quite a while AND which go back through that data making widely-distributed changes now and then? Maybe an enormous cellular automaton or something? Do normal simulation programs often have such nasty tendencies? Any impressions are welcome. -- Paul Paul R. Wilson Human-Computer Interaction Laboratory lab ph.: (312) 413-0042 U. of Ill. at Chi. EECS Dept. (M/C 154) wilson@uicbert.eecs.uic.edu Box 4348 Chicago,IL 60680