[comp.os.research] compress/decompress paging? prepaging?

wilson@carcoar.stanford.edu (Paul Wilson) (06/13/89)

I'm looking for information on paging with compression.  Does anybody
do it?  (Has anybody threatened to, even?)

What I mean is compressing pages of main memory before they're swapped
out to disk, and uncompressing pages when they're faulted in.

There are two benefits to this, as I see it.  The obvious and not thrilling
benefit is that it saves you disk space (and/or interconnect bandwidth).

The more interesting possibility is to use a compressed-page cache as
a new layer of the memory hierarchy, intermediate in speed between
RAM and disk.  This is increasingly attractive as CPU speeds increase
faster than disk speeds.

(My particular scheme is for garbage collected languages, and uses a
little knowledge of the gc's copying algorithm to allow most pointers
to be encoded with zero (yes, zero) address bits.)

I'm also interested in references to prepaging algorithms.  My compression
scheme would allow you to win with a somewhat less effective prepaging
algorithm, because the space cost of a bad guess wouldn't be prohibitive.

So what are the classic references and the state of the art in prepaging?

[ I can see some scary problems here.  1) it has to be FAST.  2) now your ]
[ pages going to secondary store are not of fixed size.  --DL             ]


thanks prematurely,

   Paul


Paul R. Wilson                         
Human-Computer Interaction Laboratory     lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@carcoar.stanford.edu
Box 4348   Chicago,IL 60680 

pardo@june.cs.washington.edu (David Keppel) (06/15/89)

wilson@carcoar.stanford.edu (Paul Wilson) writes:
>[I'm looking for information on paging with compression.]
>[Goal: use compression for RAM caching of pages]

Our moderator writes:
>[It has to be FAST]
>[Pages going to 2ndary store aren't fixed-size]

Here's my guess:

* Pages are compressed in-core.  Unless you know a lot about the data
  you're compressing, a fast algorithm probably won't buy much more
  than 50%.  Probably separate compression algorithms are in order for
  the code space and the data space(s).

* The pages are stored in-core in adjacent memory.  (This is what you
  would assume:  If page 1 is 100 bytes compressed and stored at
  location 2000, then page 2 is stored starting at location 2100).

* You'll want to store the compressed data pages on full-page
  allocations.  When you want page _x_, you don't want to have to
  figure out which sectors it lives on, you just want to go get it.
  Indeed, if bandwidth is high, you might want to store them on the
  disk in *uncompressed* form.  If you need it, you're going to need
  it uncompressed (unless you are doing clever prefetching).

* You may have some problems on some architectures.  The VAX, for
  instance, stores 4 bytes for each page.  The bytes include some
  protection info, so there are less than 32 address bits per page.
  Those 32 bits are used normally for either (a) the physical page
  frame number (22 bits) or for a disk block number (?? bits, < 32).
  If the pages are compressed, then they will live at odd addresses
  and will most generally require 30+ bits for each compressed page
  address.

Hope this is useful.

	;-D on  ( A page from the history books )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

wilson@carcoar.stanford.edu (Paul Wilson) (07/23/89)

Several people have responded to my posting about compressed paging,
so here's a little more on the idea:

  Heap-based languages generally have a high degree of regularity in
their object layouts, with most things being collections of one-word
cells that can contain (tagged) immediate data or (tagged) pointers.
There have been some measurements (particularly for Lisp) that show
that there's really not that much information stored there.  Immediate
data values tend to be one of a few special constants (True, False, Nil)
or a limited subrange of another immediate type (most integers are
very small positive integers.

  Pointers are a little more interesting and tricky.  My scheme is to
exploit the regularity that results from the way copying garbage collectors
place objects in memory.  They typically use a breadth-first traversal,
So that objects' placement is determined by the order they are reached
when deterministically traversing live objects according to that algorithm.
But there are better algorithms, particularly Moon's "approximately
depth first" traversal and an algorithm of my own that I call "hierarchical
decomposition" scavenging.  These algorithms attempt to traverse local
areas of the graph of live objects, placing local chunks of the data
structures on a page.  For large subgraphs, the upper levels of the
subgraph are located on a page (in some deterministic order).  For
small subgraphs, the whole subgraph is grouped on a page. (It's not
really very complicated -- you've just gotta see the picture.  Basically
it's a very simple graftal kind of thing.)

   (I expect this to work very well, at least for Lisp; it appears that it's
not hard to make over 90% of lisp pointers point on-page.  Data for Smalltalk
aren't as clear, but I think it will work out fine.)

   These algorithms have better locality than a simple breadth-first
traversal (or, presumably, depth-first, but I haven't done the measurements
yet).  What's more interesting, though, is that the ordering of objects
on a page is deterministically dependent on the local topology of the
graph; the only exceptions occur when the algorithm fails to group
related objects on one page and pointers across pages result.  I think
these can be compressed pretty well, too, because a given page will
generally contain pointers to only a few other pages.

  Suppose you use an algorithm that orders objects within a page using a
breadth-first traversal, and suppose we have only one "starting" object
placed on a page.  If we record only the type information for each pointer
in the object, we can still figure out where its offspring are located --
its first child follows it immediately, its second child follows that,
and so on.  Then the offspring of its offspring follow in the usual
breadth-first way.  So usually we can leave out the actual addresses,
because they're implicit in the topology of the graph.

   A highly optimized algorithm should be able to compress or decompress
a page in a single forward pass through the data, and do it pretty fast
(a few tens of instructions per word), without any specialized hardware.

   I *think* we'll be able to get a compression factor of about four,
judging from the data that are available, but we'll have to do our
own measurements after we've hacked on our garbage collector.

   For most non-pointer data, and for off-page pointers, I think a simple
adaptive algorithm should work quite well.  You keep a running record of
the most recently-seen patterns of high-order bits (a little LRU cache),
and encode things as one of those patterns plus a few low-order bits.
I think this will work well for integers and floats (assuming exponents
are in the high-order bits) and even for such things as class pointers.
(Typically, there should be high locality of type information, because
most data structures only hold a few kinds of things.)

   The adaptive nature of this encoding is important, because it can
adapt quickly if it crosses from one subgraph to another -- sometimes
you get several subgraphs on a page because they're all smaller than
the page size.

   I think that a similar algorithm may even work for general memory
contents, but I have no data to back it up at this point, just a
few intuitive arguments.  (e.g., there should be considerable locality
in data structures just because most data structures are created by
one or a few phases of a program, and allocated from one or a few
chunks of memory.

   This sort of compression should be especially valuable in the
new (and coming) machines with 64-bit words.  When you have more bits,
you usually don't really put much more information in them, you
just encode them more horizontally.  On such machines, it might
even be a win to adaptively compress traffic between cache memory
and main memory, if you have a two-level cache and the larger cache
has large blocks.  Then you could use a 32-bit interface between
the lower-level cache and the connection to main memory, as well
as reducing memory requirements.

   For anybody who's interested, I have a draft of a tech report
that describes this stuff, as well as some other funny ideas about
heap management in hierarchical memories.  I also talk about using
page faults to load heaps into memory incrementally, doing address
translation at paging time.  This lets you use a huge virtual address
space for disk-based stuff, but real (short) hardware pointers for
things that are actually in memory.  (This huge-address space data
on disk could also benefit from compression.  It might actually take
up less space that stuff in RAM, just encoded differently.)
There's also a section on problems with cache effectiveness for garbage
collected systems (especially multiprocessors) and what to do about it.

   (If you ask for the report, don't be surprised if it takes a couple
of weeks to get because I'm out of town right now.  Anybody who'll be
at the SIGPLAN conference in Portland can talk to me there.)

   I would be interested in any comments on these things, or pointers
to relevant data.

   (Paul Hudson sent me mail saying that Acorn's UNIX box compresses
pages of executable code.  Does anybody know anything more on that?
Has anything been published?  I'd been thinking about something
like that for VLIW's (which have atrociously fat code), but haven't
really worked anything out.  It would seem that good encodings would
be very dependent on both the architecture and the compiler.  Any
thoughts about that?)


Thanks prematurely,

   Paul


Paul R. Wilson                         
Human-Computer Interaction Laboratory  
U. of Ill. at Chi. EECS Dept. (M/C 154) 
Box 4348   Chicago,IL 60680              wilson@carcoar.stanford.edu
Paul R. Wilson                         
Human-Computer Interaction Laboratory    lab ph.: (312) 996-6013
U. of Ill. at Chi. EECS Dept. (M/C 154) 
Box 4348   Chicago,IL 60680              wilson@carcoar.stanford.edu