mark@acorn.co.uk (Mark Taunton) (01/22/91)
In article <3977@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes: >In article <118868@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes: >>I once suggested to Chris Torek that the kernel should execute >>compressed programs. He groaned. > >This has been done by Acorn in their RISC iX port of 4.3. The compression >is done in such a way that the file can still be demand paged. > >Perhaps someone from Acorn could give more details? Yes. Here they are: The machines on which RISC iX is currently implemented have "interesting" memory management hardware. Each bank of 4Mbytes of physical memory is handled by a MEMC (Memory Controller) chip in such a way that every physical page is always mapped to some logical page (yes, it *is* that way round). This is done entirely on-chip, using a CAM in which each entry controls one physical page: if the entry's contents (logical page number) matches the logical page address put out by the ARM CPU, then the associated physical page (identified by the index of the entry in the CAM table) is used for the data transfer, provided that the associated protection bits allow. In the technology used in 1986 when MEMC was first made the practical limit on CAM entries was 128. Hence the page size for a 4Mb physical memory chunk comes out at a whopping 32Kb. If you are prepared to use more MEMCs and have each control less memory, the page size goes down (to a minimum of 4Kb, for 512Kb/MEMC); of course this adds to system cost for several reasons so Acorn chose not to. Now the consequences of a 32Kb page size are in many respects unpleasant, but it is possible to turn some of these to advantage. In particular since the first RISC iX machine (the Acorn R140, launched in early 1989 for approx. 3.5k pounds) had only 50Mb of disc and we wanted to put as much as possible of 4.3 BSD onto it, we needed to do some shoehorning. The problem is exacerbated because ARM, in common with most RISC architectures, has fairly poor code density (though not as bad as some). The solution was to use a code compression technique which is tuned to the particular typical bit patterns of ARM instruction sequences and which achieves a reduction of around 50%. The saving in disc space comes by applying this compression technique on a per-page basis to demand paged executable files. Each page as stored on disc would normally occupy 32Kb or four disc blocks on an 8Kb blocksize filesystem, but compression reduces this to around 16Kb on average, taking up two or perhaps three 8Kb blocks. (With the newer R260 machine we moved to 4Kb blocks in the shipped filesystem, which further improves the average disc space saving.) The unused blocks in a page prototype simply become "holes". In conjunction with a simple shared library scheme, the actual disc occupancy of executables becomes quite tolerable. The compression operation - we call it "squeezing" - is applied to a normal demand-load format program as a separate operation (the linker produces normal uncompressed output, but may immediately invoke the squeeze program if desired). An unsqueeze program performs the inverse operation if required (e.g. for debugging purposes). Compressed executables comprise a header, the text, the data and an optional symbol table. The header includes a magic number to identify the compressed format and two tables of values for the decompression algorithm, and is zero padded to the next page boundary. Each page of text or data is compressed "in place", i.e. the compressed data starts at the same offset in the file that the uncompressed data did, and the symbol table if present also starts at a page-size offset. The kernel recognises compressed executables by the magic number at exec time, and reads in the decompression tables in the header at this point, via the buffer cache. The tables are typically not very big, and are held on disc in a compacted format which is expanded by the kernel. To save memory, since most programs use much less than 32Kb of stack, the expanded tables are kept in user space just before the actual process's user stack. To demand load a page, the kernel reads in the blocks containing the compressed code or data directly into the new page frame (not through the buffer cache) then applies the decompression algorithm on the data. Since the data starts at the same file address regardless of compression, there is no problem finding it. A checksum kept with the compressed data helps protect against problems such as the possibility, practically never seen, of a wayward program corrupting the decompression tables in its own stack area. The page read operation takes advantage of any adjacency of disc blocks (which with careful tuning of the filesystem as shipped happens most of the time) and of course holes take no time to load! The decompression algorithm is extremely quick, and the net result is that a page can be ready for use in less wall-clock time than it would have taken to read in a whole 32Kb of uncompressed data directly. So the technique saves both disc space and time, although of course if the page size were smaller these benefits would probably be outweighed by other considerations. ------------------------------------------------------------------------ Mark Taunton Email: mark@acorn.co.uk RISC iX Development Group Acorn Computers Ltd Cambridge, England.
vic@grep.co.uk (Victor Gavin) (01/23/91)
In article <3977@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes: >In article <118868@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes: >>I once suggested to Chris Torek that the kernel should execute >>compressed programs. He groaned. > >This has been done by Acorn in their RISC iX port of 4.3. The compression >is done in such a way that the file can still be demand paged. And perhaps they could also explain why it doesn't slow the system down, coz if the processor has enough time to decompress something coming off the disk, it has enough time to go and run another program. Every Acorn person I ask this question of, waffles and side-tracks. vic
bhoughto@pima.intel.com (Blair P. Houghton) (01/24/91)
What about _encrypted_ executables? --Blair "Yes, I do have a real application in mind."
richard@aiai.ed.ac.uk (Richard Tobin) (01/25/91)
In article <1991Jan23.123808.22159@grep.co.uk> vic@grep.co.uk (Victor Gavin) writes: >And perhaps they could also explain why it doesn't slow the system down, coz if >the processor has enough time to decompress something coming off the disk, it >has enough time to go and run another program. It seems pretty clear that it's trading cpu time for disk space and disk accesses. On a reasonably fast workstation with small slow disks, it seems likely to be a winning tradeoff. -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
vic@grep.co.uk (Victor Gavin) (01/25/91)
In article <4001@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes: >It seems pretty clear that it's trading cpu time for disk space and >disk accesses. On a reasonably fast workstation with small slow >disks, it seems likely to be a winning tradeoff. I would agree with you if Acorn actually had a range of machines with this as a low entry machine. When you start comparing against SPARCs (for instance) you get the distinct impression that Acorn aren't totally commited yet. I think the reason they've made the R140/R260 the way they have is because of their background in building small, personal machines. As an aside, who would like to see an ARM based X-terminal. I suggested this to Acorn a long time ago (and again quite recently) that they do this, but they never seemed interested. I don't know enough about X-windows, or I'd do it myself :-) The way X terminal manufacturers have sprung up, their is definitely a market (and that means $$$$) and it would also give Acorn a market presence, and help raise the Acorn name in commercial unix environments. vic
richard@aiai.ed.ac.uk (Richard Tobin) (01/25/91)
In article <2051@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes: >What about _encrypted_ executables? I have a vague recollection of someone (maybe Richard O'Keefe?) suggesting a processor that permuted its opcodes differently for each program... -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
hwt@bwdlh490.BNR.CA (Henry Troup) (01/25/91)
Of course, if you can implement compression in special hardware between the disk and the CPU, you can do this with no impact on CPU performance. Henry Troup - BNR owns but does not share my opinions | The .signature is the P.O. Box 3511, Stn. C. Ottawa, Ontario, Canada K1Y 4H7| lowest form of humour uunet!bnrgate!hwt%bwdlh490 HWT@BNR.CA +1 613-765-2337 |
andrew@acwbust.incom.de (Andrew Wheadon) (01/26/91)
In article <1991Jan23.123808.22159@grep.co.uk> vic@grep.co.uk (Victor Gavin) writes: >In article <3977@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes: >And perhaps they could also explain why it doesn't slow the system down, coz if >the processor has enough time to decompress something coming off the disk, it >has enough time to go and run another program. > > vic Welllllll, perhaps they have a slow disk: The smaller the file, the less time it takes to read from disk, the more time there is to decompress :-). (Is actually true on some PC's without HD-Caching) -- Andrew Wheadon | andrew@acwbust.incom.de | I support the troops in the Golf
cjc@ulysses.att.com (Chris Calabrese) (01/28/91)
In article <4001@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes: >In article <1991Jan23.123808.22159@grep.co.uk> vic@grep.co.uk (Victor Gavin) writes: >>And perhaps they could also explain why it doesn't slow the system down, coz if >>the processor has enough time to decompress something coming off the disk, it >>has enough time to go and run another program. > >It seems pretty clear that it's trading cpu time for disk space and >disk accesses. On a reasonably fast workstation with small slow >disks, it seems likely to be a winning tradeoff. Yes. One very big win for compressed executables is when the workstation has 'disks' mounted over a very slow link (say 9600 baud). In fact, one might even want a compressed file system type for mounting remote disks from a home machine. You can usually get something like 3:1 on executables and 10:1 on English text, so a workstation with a 9.6kb link with most of the executables local and much of the documents, etc. remote would behave like it had a 96kb link. Remember, this is a direct pipe (not shared like ethernet), so you could actually get reasonable performance. Name: Christopher J. Calabrese Brain loaned to: AT&T Bell Laboratories, Murray Hill, NJ att!ulysses!cjc cjc@ulysses.att.com Obligatory Quote: ``pher - gr. vb. to schlep. phospher - to schlep light.philosopher - to schlep thoughts.''
rheiger@chvax.uucp (Eiger Richard) (01/30/91)
In article <5453@bwdls58.UUCP> hwt@bwdlh490.BNR.CA (Henry Troup) writes: >Of course, if you can implement compression in special hardware between the >disk and the CPU, you can do this with no impact on CPU performance. > I thought of the exact same thing. While we're at it, why not put hardware compressing onto the drives? This could be implemented using schemes that are used successfully in image processing systems. One of the problems I can think of however is the need to be able to access raw data directly on the disk. Comments? Experiences? Richard
martin@mwtech.UUCP (Martin Weitzel) (01/31/91)
In article <1991Jan30.123119.15448@chvax.uucp> rheiger@chvax.UUCP (Eiger Richard) writes: :In article <5453@bwdls58.UUCP> hwt@bwdlh490.BNR.CA (Henry Troup) writes: :>Of course, if you can implement compression in special hardware between the :>disk and the CPU, you can do this with no impact on CPU performance. :> :I thought of the exact same thing. While we're at it, why not put :hardware compressing onto the drives? This could be implemented using :schemes that are used successfully in image processing systems. One of :the problems I can think of however is the need to be able to access raw :data directly on the disk. Comments? Experiences? Some other problem: You can never be sure how much free space you have on disk. Worse, even if a file remains the same size (or slightly shrinks) it may occupy more space than before and the disk space is no more sufficient. (Of course, for disks which contain more or less "read-only"-stuff this may be no big problem). -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
rbj@uunet.UU.NET (Root Boy Jim) (02/01/91)
>In article <1991Jan30.123119.15448@chvax.uucp> rheiger@chvax.UUCP (Eiger Richard) writes: >:In article <5453@bwdls58.UUCP> hwt@bwdlh490.BNR.CA (Henry Troup) writes: >:>Of course, if you can implement compression in special hardware between the >:>disk and the CPU, you can do this with no impact on CPU performance. >:> >:I thought of the exact same thing. While we're at it, why not put >:hardware compressing onto the drives? The problem here is that disk drivers access data by block, optimizing sector rotation and cylinder head motion. If you're gonna send it all to the disk anyway, you don't save any transfer time. Compressing the data changes the boundarys where the sectors fall. Many people think it a bad idea to move layout policy out of the kernel; others would rather let a smart caching disk do it. -- Root Boy Jim Cottrell <rbj@uunet.uu.net> Close the gap of the dark year in between