[comp.databases] "swizzling"

wilson@carcoar.Stanford.EDU (Paul Wilson) (08/22/89)
If anybody's interested in how to swizzle persistent pointers into
transient ones, I've got a tech report that says how to do it
with virtual memory tricks on stock hardware.

My technique is very similar to the one used in Appel, Ellis, and
Li's (SIGPLAN '88) incremental garbage collector for stock
hardware.  The basic idea is that you translate the pointers in
a page when (and only when) the program tries to access that
page.  That entails creating a mapping for any pages referred
to by pointers within objects in the page you're translating.

This system keeps any pages that contain persistent pointers
access-protected, so that the running program only sees
normal hardware-supported "transient" pointers.  As a side
benefit, the data formats can be different, so that for
example persistent pointers can specify a huge address space
(being translated from 64 bits to 32 at page fault time.)

This scheme allows programs to execute at normal hardware
speeds unless and until they access an untranslated page --
no continual overhead in pointer checking or indirections.

It requires some nice OS hooks, but features like
Mach's will do just fine.

Efficiency is hard to estimate and depends very much on locality.
I'm working on some scavenging (gc) algorithms that should improve
efficiency, partly by keeping the number of off-page pointers low.

(Something rather like Moon's "approximately depth-first" algorithm
should greatly reduce the number of other pages each page holds
pointers into, and therefore the cost of creating translation
tables.)

One disanalogy between this scheme and Appel, Ellis, and Li's
gc is that the unit of relocation is the page, not the object.
That is, when you're doing pointer translations for a page
that the program has tried to reference, you create and address
translation (persistent -> transient) for each referenced
*page*, not each referenced object. 

(Actually, the pagewise relocation is Ralph Johnson's idea --
he independently came up with a scheme that is very similar
to -- and better than -- my own.  I'd been following the 
Appel, Ellis, and Li gc model too closely.)


We also want to look at some novel paging and prefetching algorithms
for such a system, in hopes that we can heuristically and adaptively
approximate some more knowledgeable caching strategies.  (E.g., the "hot set"
model when transiently traversing a large amount of data, switching
to an LRU-like or WS-like model when locality gets normal again.)
We'd like to support a simple heap abstraction in a way that is
efficient for moderately large amounts of data.  (We assume that for
huge amounts of data you're going to punt and do specialized
DBMS memory management.)


Eventually, we'd like to combine this all with our Demonic Memory
system (SIGPLAN '89), which would allow us to reconstruct any
past state of the system.  By efficiently supporting a huge address
space on standard hardware, we hope to be able to uniquely address
every object that ever exists.

Anybody else out there working on any similar schemes?


Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.edu
Box 4348   Chicago,IL 60680