[comp.unix.internals] Compressed executables

mark@acorn.co.uk (Mark Taunton) (01/22/91)

In article <3977@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>In article <118868@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes:
>>I once suggested to Chris Torek that the kernel should execute
>>compressed programs. He groaned.
>
>This has been done by Acorn in their RISC iX port of 4.3.  The compression
>is done in such a way that the file can still be demand paged.  
>
>Perhaps someone from Acorn could give more details?

Yes.  Here they are:

The machines on which RISC iX is currently implemented have
"interesting" memory management hardware.  Each bank of 4Mbytes of
physical memory is handled by a MEMC (Memory Controller) chip in such
a way that every physical page is always mapped to some logical page
(yes, it *is* that way round).  This is done entirely on-chip, using a
CAM in which each entry controls one physical page: if the entry's
contents (logical page number) matches the logical page address put
out by the ARM CPU, then the associated physical page (identified by
the index of the entry in the CAM table) is used for the data
transfer, provided that the associated protection bits allow.  In the
technology used in 1986 when MEMC was first made the practical limit
on CAM entries was 128.  Hence the page size for a 4Mb physical memory
chunk comes out at a whopping 32Kb.  If you are prepared to use more
MEMCs and have each control less memory, the page size goes down (to a
minimum of 4Kb, for 512Kb/MEMC); of course this adds to system cost
for several reasons so Acorn chose not to.

Now the consequences of a 32Kb page size are in many respects
unpleasant, but it is possible to turn some of these to advantage.  In
particular since the first RISC iX machine (the Acorn R140, launched
in early 1989 for approx. 3.5k pounds) had only 50Mb of disc and we
wanted to put as much as possible of 4.3 BSD onto it, we needed to do
some shoehorning.  The problem is exacerbated because ARM, in common
with most RISC architectures, has fairly poor code density (though not
as bad as some).  The solution was to use a code compression technique
which is tuned to the particular typical bit patterns of ARM
instruction sequences and which achieves a reduction of around 50%.
The saving in disc space comes by applying this compression technique
on a per-page basis to demand paged executable files.  Each page as
stored on disc would normally occupy 32Kb or four disc blocks on an
8Kb blocksize filesystem, but compression reduces this to around 16Kb
on average, taking up two or perhaps three 8Kb blocks.  (With the
newer R260 machine we moved to 4Kb blocks in the shipped filesystem,
which further improves the average disc space saving.)  The unused
blocks in a page prototype simply become "holes".  In conjunction with
a simple shared library scheme, the actual disc occupancy of
executables becomes quite tolerable.

The compression operation - we call it "squeezing" - is applied to a
normal demand-load format program as a separate operation (the linker
produces normal uncompressed output, but may immediately invoke the
squeeze program if desired).  An unsqueeze program performs the
inverse operation if required (e.g. for debugging purposes).

Compressed executables comprise a header, the text, the data and an
optional symbol table.  The header includes a magic number to identify
the compressed format and two tables of values for the decompression
algorithm, and is zero padded to the next page boundary.  Each page of
text or data is compressed "in place", i.e. the compressed data starts
at the same offset in the file that the uncompressed data did, and the
symbol table if present also starts at a page-size offset.  

The kernel recognises compressed executables by the magic number at
exec time, and reads in the decompression tables in the header at this
point, via the buffer cache.  The tables are typically not very big,
and are held on disc in a compacted format which is expanded by the
kernel.  To save memory, since most programs use much less than 32Kb
of stack, the expanded tables are kept in user space just before the
actual process's user stack.  To demand load a page, the kernel reads
in the blocks containing the compressed code or data directly into the
new page frame (not through the buffer cache) then applies the
decompression algorithm on the data.  Since the data starts at the
same file address regardless of compression, there is no problem
finding it.  A checksum kept with the compressed data helps protect
against problems such as the possibility, practically never seen, of a
wayward program corrupting the decompression tables in its own stack
area.  The page read operation takes advantage of any adjacency of
disc blocks (which with careful tuning of the filesystem as shipped
happens most of the time) and of course holes take no time to load!
The decompression algorithm is extremely quick, and the net result is
that a page can be ready for use in less wall-clock time than it would
have taken to read in a whole 32Kb of uncompressed data directly.  So
the technique saves both disc space and time, although of course if
the page size were smaller these benefits would probably be outweighed
by other considerations.

------------------------------------------------------------------------
Mark Taunton 					Email:	mark@acorn.co.uk
RISC iX Development Group			
Acorn Computers Ltd
Cambridge, England.

vic@grep.co.uk (Victor Gavin) (01/23/91)

In article <3977@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>In article <118868@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes:
>>I once suggested to Chris Torek that the kernel should execute
>>compressed programs. He groaned.
>
>This has been done by Acorn in their RISC iX port of 4.3.  The compression
>is done in such a way that the file can still be demand paged.  

And perhaps they could also explain why it doesn't slow the system down, coz if
the processor has enough time to decompress something coming off the disk, it
has enough time to go and run another program.

Every Acorn person I ask this question of, waffles and side-tracks.

		vic

bhoughto@pima.intel.com (Blair P. Houghton) (01/24/91)

What about _encrypted_ executables?

				--Blair
				  "Yes, I do have a real
				   application in mind."

richard@aiai.ed.ac.uk (Richard Tobin) (01/25/91)

In article <1991Jan23.123808.22159@grep.co.uk> vic@grep.co.uk (Victor Gavin) writes:
>And perhaps they could also explain why it doesn't slow the system down, coz if
>the processor has enough time to decompress something coming off the disk, it
>has enough time to go and run another program.

It seems pretty clear that it's trading cpu time for disk space and
disk accesses.  On a reasonably fast workstation with small slow
disks, it seems likely to be a winning tradeoff.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

vic@grep.co.uk (Victor Gavin) (01/25/91)

In article <4001@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>It seems pretty clear that it's trading cpu time for disk space and
>disk accesses.  On a reasonably fast workstation with small slow
>disks, it seems likely to be a winning tradeoff.

I would agree with you if Acorn actually had a range of machines with this as a
low entry machine. When you start comparing against SPARCs (for instance) you
get the distinct impression that Acorn aren't totally commited yet. I think the
reason they've made the R140/R260 the way they have is because of their
background in building small, personal machines.

As an aside, who would like to see an ARM based X-terminal. I suggested this to
Acorn a long time ago (and again quite recently) that they do this, but they
never seemed interested. I don't know enough about X-windows, or I'd do it
myself :-)

The way X terminal manufacturers have sprung up, their is definitely a market
(and that means $$$$) and it would also give Acorn a market presence, and help
raise the Acorn name in commercial unix environments.

			vic

richard@aiai.ed.ac.uk (Richard Tobin) (01/25/91)

In article <2051@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes:
>What about _encrypted_ executables?

I have a vague recollection of someone (maybe Richard O'Keefe?)
suggesting a processor that permuted its opcodes differently
for each program...

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

hwt@bwdlh490.BNR.CA (Henry Troup) (01/25/91)

Of course, if you can implement compression in special hardware between the 
disk and the CPU, you can do this with no impact on CPU performance.

Henry Troup - BNR owns but does not share my opinions | The .signature is the
P.O. Box 3511, Stn. C. Ottawa, Ontario, Canada K1Y 4H7| lowest form of humour
uunet!bnrgate!hwt%bwdlh490 HWT@BNR.CA +1 613-765-2337 |

andrew@acwbust.incom.de (Andrew Wheadon) (01/26/91)

In article <1991Jan23.123808.22159@grep.co.uk> vic@grep.co.uk (Victor Gavin) writes:
>In article <3977@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>And perhaps they could also explain why it doesn't slow the system down, coz if
>the processor has enough time to decompress something coming off the disk, it
>has enough time to go and run another program.
>
>		vic
Welllllll, perhaps they have a slow disk:
The smaller the file, the less time it takes to read from disk, the more time
there is to decompress :-). (Is actually true on some PC's without HD-Caching)
-- 
Andrew Wheadon | andrew@acwbust.incom.de | I support the troops in the Golf

cjc@ulysses.att.com (Chris Calabrese) (01/28/91)

In article <4001@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>In article <1991Jan23.123808.22159@grep.co.uk> vic@grep.co.uk (Victor Gavin) writes:
>>And perhaps they could also explain why it doesn't slow the system down, coz if
>>the processor has enough time to decompress something coming off the disk, it
>>has enough time to go and run another program.
>
>It seems pretty clear that it's trading cpu time for disk space and
>disk accesses.  On a reasonably fast workstation with small slow
>disks, it seems likely to be a winning tradeoff.

Yes.  One very big win for compressed executables is when the
workstation has 'disks' mounted over a very slow link (say 9600 baud).

In fact, one might even want a compressed file system type for
mounting remote disks from a home machine.  You can usually get
something like 3:1 on executables and 10:1 on English text, so a
workstation with a 9.6kb link with most of the executables local
and much of the documents, etc. remote would behave like it had a 96kb
link.  Remember, this is a direct pipe (not shared like ethernet), so
you could actually get reasonable performance.

Name:			Christopher J. Calabrese
Brain loaned to:	AT&T Bell Laboratories, Murray Hill, NJ
att!ulysses!cjc		cjc@ulysses.att.com
Obligatory Quote:	``pher - gr. vb. to schlep.  phospher - to schlep light.philosopher - to schlep thoughts.''

rheiger@chvax.uucp (Eiger Richard) (01/30/91)

In article <5453@bwdls58.UUCP> hwt@bwdlh490.BNR.CA (Henry Troup) writes:
>Of course, if you can implement compression in special hardware between the 
>disk and the CPU, you can do this with no impact on CPU performance.
>
I thought of the exact same thing. While we're at it, why not put
hardware compressing onto the drives? This could be implemented using
schemes that are used successfully in image processing systems. One of
the problems I can think of however is the need to be able to access raw
data directly on the disk. Comments? Experiences?

	Richard

martin@mwtech.UUCP (Martin Weitzel) (01/31/91)

In article <1991Jan30.123119.15448@chvax.uucp> rheiger@chvax.UUCP (Eiger Richard) writes:
:In article <5453@bwdls58.UUCP> hwt@bwdlh490.BNR.CA (Henry Troup) writes:
:>Of course, if you can implement compression in special hardware between the 
:>disk and the CPU, you can do this with no impact on CPU performance.
:>
:I thought of the exact same thing. While we're at it, why not put
:hardware compressing onto the drives? This could be implemented using
:schemes that are used successfully in image processing systems. One of
:the problems I can think of however is the need to be able to access raw
:data directly on the disk. Comments? Experiences?

Some other problem: You can never be sure how much free space you
have on disk. Worse, even if a file remains the same size (or slightly
shrinks) it may occupy more space than before and the disk space is
no more sufficient. (Of course, for disks which contain more or less
"read-only"-stuff this may be no big problem).
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

rbj@uunet.UU.NET (Root Boy Jim) (02/01/91)

>In article <1991Jan30.123119.15448@chvax.uucp> rheiger@chvax.UUCP (Eiger Richard) writes:
>:In article <5453@bwdls58.UUCP> hwt@bwdlh490.BNR.CA (Henry Troup) writes:
>:>Of course, if you can implement compression in special hardware between the 
>:>disk and the CPU, you can do this with no impact on CPU performance.
>:>
>:I thought of the exact same thing. While we're at it, why not put
>:hardware compressing onto the drives? 

The problem here is that disk drivers access data by block, optimizing
sector rotation and cylinder head motion. If you're gonna send it all to
the disk anyway, you don't save any transfer time. Compressing the
data changes the boundarys where the sectors fall. Many people think
it a bad idea to move layout policy out of the kernel; others
would rather let a smart caching disk do it.
-- 

	Root Boy Jim Cottrell <rbj@uunet.uu.net>
	Close the gap of the dark year in between