wilson@carcoar.stanford.edu (Paul Wilson) (06/13/89)
I'm looking for information on paging with compression. Does anybody do it? (Has anybody threatened to, even?) What I mean is compressing pages of main memory before they're swapped out to disk, and uncompressing pages when they're faulted in. There are two benefits to this, as I see it. The obvious and not thrilling benefit is that it saves you disk space (and/or interconnect bandwidth). The more interesting possibility is to use a compressed-page cache as a new layer of the memory hierarchy, intermediate in speed between RAM and disk. This is increasingly attractive as CPU speeds increase faster than disk speeds. (My particular scheme is for garbage collected languages, and uses a little knowledge of the gc's copying algorithm to allow most pointers to be encoded with zero (yes, zero) address bits.) I'm also interested in references to prepaging algorithms. My compression scheme would allow you to win with a somewhat less effective prepaging algorithm, because the space cost of a bad guess wouldn't be prohibitive. So what are the classic references and the state of the art in prepaging? [ I can see some scary problems here. 1) it has to be FAST. 2) now your ] [ pages going to secondary store are not of fixed size. --DL ] thanks prematurely, Paul Paul R. Wilson Human-Computer Interaction Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@carcoar.stanford.edu Box 4348 Chicago,IL 60680
pardo@june.cs.washington.edu (David Keppel) (06/15/89)
wilson@carcoar.stanford.edu (Paul Wilson) writes: >[I'm looking for information on paging with compression.] >[Goal: use compression for RAM caching of pages] Our moderator writes: >[It has to be FAST] >[Pages going to 2ndary store aren't fixed-size] Here's my guess: * Pages are compressed in-core. Unless you know a lot about the data you're compressing, a fast algorithm probably won't buy much more than 50%. Probably separate compression algorithms are in order for the code space and the data space(s). * The pages are stored in-core in adjacent memory. (This is what you would assume: If page 1 is 100 bytes compressed and stored at location 2000, then page 2 is stored starting at location 2100). * You'll want to store the compressed data pages on full-page allocations. When you want page _x_, you don't want to have to figure out which sectors it lives on, you just want to go get it. Indeed, if bandwidth is high, you might want to store them on the disk in *uncompressed* form. If you need it, you're going to need it uncompressed (unless you are doing clever prefetching). * You may have some problems on some architectures. The VAX, for instance, stores 4 bytes for each page. The bytes include some protection info, so there are less than 32 address bits per page. Those 32 bits are used normally for either (a) the physical page frame number (22 bits) or for a disk block number (?? bits, < 32). If the pages are compressed, then they will live at odd addresses and will most generally require 30+ bits for each compressed page address. Hope this is useful. ;-D on ( A page from the history books ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
wilson@carcoar.stanford.edu (Paul Wilson) (07/23/89)
Several people have responded to my posting about compressed paging, so here's a little more on the idea: Heap-based languages generally have a high degree of regularity in their object layouts, with most things being collections of one-word cells that can contain (tagged) immediate data or (tagged) pointers. There have been some measurements (particularly for Lisp) that show that there's really not that much information stored there. Immediate data values tend to be one of a few special constants (True, False, Nil) or a limited subrange of another immediate type (most integers are very small positive integers. Pointers are a little more interesting and tricky. My scheme is to exploit the regularity that results from the way copying garbage collectors place objects in memory. They typically use a breadth-first traversal, So that objects' placement is determined by the order they are reached when deterministically traversing live objects according to that algorithm. But there are better algorithms, particularly Moon's "approximately depth first" traversal and an algorithm of my own that I call "hierarchical decomposition" scavenging. These algorithms attempt to traverse local areas of the graph of live objects, placing local chunks of the data structures on a page. For large subgraphs, the upper levels of the subgraph are located on a page (in some deterministic order). For small subgraphs, the whole subgraph is grouped on a page. (It's not really very complicated -- you've just gotta see the picture. Basically it's a very simple graftal kind of thing.) (I expect this to work very well, at least for Lisp; it appears that it's not hard to make over 90% of lisp pointers point on-page. Data for Smalltalk aren't as clear, but I think it will work out fine.) These algorithms have better locality than a simple breadth-first traversal (or, presumably, depth-first, but I haven't done the measurements yet). What's more interesting, though, is that the ordering of objects on a page is deterministically dependent on the local topology of the graph; the only exceptions occur when the algorithm fails to group related objects on one page and pointers across pages result. I think these can be compressed pretty well, too, because a given page will generally contain pointers to only a few other pages. Suppose you use an algorithm that orders objects within a page using a breadth-first traversal, and suppose we have only one "starting" object placed on a page. If we record only the type information for each pointer in the object, we can still figure out where its offspring are located -- its first child follows it immediately, its second child follows that, and so on. Then the offspring of its offspring follow in the usual breadth-first way. So usually we can leave out the actual addresses, because they're implicit in the topology of the graph. A highly optimized algorithm should be able to compress or decompress a page in a single forward pass through the data, and do it pretty fast (a few tens of instructions per word), without any specialized hardware. I *think* we'll be able to get a compression factor of about four, judging from the data that are available, but we'll have to do our own measurements after we've hacked on our garbage collector. For most non-pointer data, and for off-page pointers, I think a simple adaptive algorithm should work quite well. You keep a running record of the most recently-seen patterns of high-order bits (a little LRU cache), and encode things as one of those patterns plus a few low-order bits. I think this will work well for integers and floats (assuming exponents are in the high-order bits) and even for such things as class pointers. (Typically, there should be high locality of type information, because most data structures only hold a few kinds of things.) The adaptive nature of this encoding is important, because it can adapt quickly if it crosses from one subgraph to another -- sometimes you get several subgraphs on a page because they're all smaller than the page size. I think that a similar algorithm may even work for general memory contents, but I have no data to back it up at this point, just a few intuitive arguments. (e.g., there should be considerable locality in data structures just because most data structures are created by one or a few phases of a program, and allocated from one or a few chunks of memory. This sort of compression should be especially valuable in the new (and coming) machines with 64-bit words. When you have more bits, you usually don't really put much more information in them, you just encode them more horizontally. On such machines, it might even be a win to adaptively compress traffic between cache memory and main memory, if you have a two-level cache and the larger cache has large blocks. Then you could use a 32-bit interface between the lower-level cache and the connection to main memory, as well as reducing memory requirements. For anybody who's interested, I have a draft of a tech report that describes this stuff, as well as some other funny ideas about heap management in hierarchical memories. I also talk about using page faults to load heaps into memory incrementally, doing address translation at paging time. This lets you use a huge virtual address space for disk-based stuff, but real (short) hardware pointers for things that are actually in memory. (This huge-address space data on disk could also benefit from compression. It might actually take up less space that stuff in RAM, just encoded differently.) There's also a section on problems with cache effectiveness for garbage collected systems (especially multiprocessors) and what to do about it. (If you ask for the report, don't be surprised if it takes a couple of weeks to get because I'm out of town right now. Anybody who'll be at the SIGPLAN conference in Portland can talk to me there.) I would be interested in any comments on these things, or pointers to relevant data. (Paul Hudson sent me mail saying that Acorn's UNIX box compresses pages of executable code. Does anybody know anything more on that? Has anything been published? I'd been thinking about something like that for VLIW's (which have atrociously fat code), but haven't really worked anything out. It would seem that good encodings would be very dependent on both the architecture and the compiler. Any thoughts about that?) Thanks prematurely, Paul Paul R. Wilson Human-Computer Interaction Laboratory U. of Ill. at Chi. EECS Dept. (M/C 154) Box 4348 Chicago,IL 60680 wilson@carcoar.stanford.edu Paul R. Wilson Human-Computer Interaction Laboratory lab ph.: (312) 996-6013 U. of Ill. at Chi. EECS Dept. (M/C 154) Box 4348 Chicago,IL 60680 wilson@carcoar.stanford.edu