[comp.sys.apollo] Why is a cmpexe file so big?

hanche@imf.unit.no (Harald Hanche-Olsen) (10/05/90)

I just tried building a cmpexe file using the `xar' command.  The
result is interesting:

   % xar -cv pw pw.a88k pw.m68k
   Added element tagged m68k from 'pw.m68k'
   Added element tagged a88k from 'pw.a88k'
   % xar -tv pw
   type        offset      size  alignment  last-modified   tag
   coff             0      5311     524288  90/09/16.10:27  a88k
   coff        262144      5066     262144  90/09/16.10:40  m68k
   % ls -l pw*
   -rwxr-xr-x   1 hanche     270420 Oct  5 13:48 pw*
   -rwxr-xr-x   1 hanche       5311 Sep 16 10:27 pw.a88k*
   -rwxr-xr-x   1 hanche       5066 Sep 16 10:40 pw.m68k*

If my arithmetic is not way off, we are talking a 2500% overhead here!
All right, I will admit I have been unfair here -- a test with a
larger program reveals that the overhead seems to be fairly constant,
i.e. about 250K.  So for large programs the overhead isn't too bad
percantagewise, but it's obviously a killer if you try to build up a
large library of small cmpexe'd programs.  Does anyone know why it
has to be this way?  Or does it?  Ought I to submit an APR on it?

- Harald Hanche-Olsen <hanche@imf.unit.no>
  Division of Mathematical Sciences
  The Norwegian Institute of Technology
  N-7034 Trondheim, NORWAY

dbfunk@ICAEN.UIOWA.EDU (David B Funk) (10/12/90)

In posting <HANCHE.90Oct5143121@hufsa.imf.unit.no> Harald Hanche-Olsen asks:

> I just tried building a cmpexe file using the `xar' command.  The
> result is interesting:
> 
>    % xar -cv pw pw.a88k pw.m68k
>    Added element tagged m68k from 'pw.m68k'
>    Added element tagged a88k from 'pw.a88k'
>    % xar -tv pw
>    type        offset      size  alignment  last-modified   tag
>    coff             0      5311     524288  90/09/16.10:27  a88k
>    coff        262144      5066     262144  90/09/16.10:40  m68k
>    % ls -l pw*
>    -rwxr-xr-x   1 hanche     270420 Oct  5 13:48 pw*
>    -rwxr-xr-x   1 hanche       5311 Sep 16 10:27 pw.a88k*
>    -rwxr-xr-x   1 hanche       5066 Sep 16 10:40 pw.m68k*
> 
> If my arithmetic is not way off, we are talking a 2500% overhead here!
> All right, I will admit I have been unfair here -- a test with a
> larger program reveals that the overhead seems to be fairly constant,
> i.e. about 250K.  So for large programs the overhead isn't too bad
> percantagewise, but it's obviously a killer if you try to build up a
> large library of small cmpexe'd programs.  Does anyone know why it
> has to be this way?  Or does it?  Ought I to submit an APR on it?

  Actually things arn't really as bad as they look, "ls" just isn't telling
you everything. A "cmpexe" file is sparse and may actually have lots of
"empty space" inside of it that doesn't use up disk blocks. If you look
at that "cmpexe" file via "/com/ld -a" you'll see the real answer:

$ /com/ld -a pw

sys   type      blocks  current
type  uid         used   length   attr rights       name

file  cmpexe        20    270420  P    prwx-        pw

1 entry listed, 20 blocks used.

So the amount of disk space actually used up is only 20480 bytes,
it is just that there is 260180 bytes worth of "space" between
the block numbers allocated for the first 10k bytes and the
block numbers allocated to hold the last 10k bytes. This is done
so that the data will fall on memory segment aligned boundaries
which speeds up the run time loading process. Look at the
"alignment" and "offset" fields in the "xar -tv" output. Thus
these files don't actually "cost" you the disk space that "ls"
seems to tell you they do. Try this little experiment: move the
file into a new directory so that it is all by itself. Now do
an "ls -l" and look at the "total" value at the top of the
"ls" output. Note that it matches the "blocks used" output from
"/com/ld".
  For another example of sparse files, look at almost any "mbx"
type file. If you use the Apollo alarm server, look at its
message mail-box file in /tmp:

$ ls -l
total 6
-rwxrwxrw-+  1 dbfunk     104208 Oct 12 03:33 alarm_server.msg_mbx
drwxrwxrwx   1 root         1024 Jun  9 01:49 layers
-rw-rw-rwx+  1 root          144 Oct  9 13:59 llbdbase.dat
$ /com/ld -a

Directory "/sys/node_data/tmp":

sys   type      blocks  current
type  uid         used   length   attr rights       name

file  mbx            4    104208  P    prwx-        alarm_server.msg_mbx
dir   nil            1      1024  P    prwx-        layers
file  unstruct       1       144  P    prwx-        llbdbase.dat

3 entries, 6 blocks used.

It looks like that file has over 100k bytes allocated to it but it
uses up only 4 disk blocks. The contents of `node_data/systmp
contain other examples of these things.

  However there is one way that you can lose "big time" on this stuff.
If you use any Unix type program, such as "cp", to read these type files,
the empty space may be allocated and filled in with real disk blocks.
For example:

$ xar -tv garp
type        offset      size  alignment  last-modified   tag
coff             0      3552     524288  90/08/16.01:59  a88k
coff         32768      3390      32768  90/08/16.01:52  m68k
$ /com/ld -a

Directory "/test":

sys   type      blocks  current
type  uid         used   length   attr rights       name

file  cmpexe        14     36948  P    prwx-        garp

1 entry, 14 blocks used.
$ /com/cpf garp gork
$ /com/ld -a

Directory "/test":

sys   type      blocks  current
type  uid         used   length   attr rights       name

file  cmpexe        14     36948  P    prwx-        garp
file  cmpexe        14     36948  P    prwx-        gork

2 entries, 28 blocks used.
$ /bin/cp gork guck
$ /com/ld -a

Directory "/test":

sys   type      blocks  current
type  uid         used   length   attr rights       name

file  cmpexe        14     36948  P    prwx-        garp
file  cmpexe        38     36948  P    prwx-        gork
file  unstruct      38     36948  P    prwx-        guck

3 entries, 90 blocks used.


Note that after the "/bin/cp" the blocks used by the file "gork"
changed from 14 to 38, even though it was the source of the
copy operation (if you use the "-o" flag on "cp" this problem is
avoided).

Dave Funk

hanche@imf.unit.no (Harald Hanche-Olsen) (10/12/90)

Thanks for you letter.  Below is my own follow-up, just posted a few
minutes ago...:

In article <HANCHE.90Oct5143121@hufsa.imf.unit.no> hanche@imf.unit.no (Harald Hanche-Olsen) writes:

   I just tried building a cmpexe file using the `xar' command.  The
   result is interesting:

      % xar -cv pw pw.a88k pw.m68k
      Added element tagged m68k from 'pw.m68k'
      Added element tagged a88k from 'pw.a88k'

and I proceded to show the output of ls -l and rant and rave about
the horrible waste of space here.  Since that, several people have
written to me, pointing out that the file is actually sparse, so it
doesn't use as much disk space as ls -l suggests.  Well, I checked
it out a little more:

   % ls -ls pw*
    176 -rwxr-xr-x   1 hanche     270420 Oct 12 10:24 pw*
      8 -rwxr-xr-x   1 hanche       5311 Sep 16 10:27 pw.a88k*
      8 -rwxr-xr-x   1 hanche       5066 Sep 16 10:40 pw.m68k*

   % /com/ld -a pw*

   sys   type      blocks  current
   type  uid         used   length   attr rights       name

   file  cmpexe        44    270420  P    prwx-        pw
   file  coff           8      5311  P    prwx-        pw.a88k
   file  coff           8      5066  P    prwx-        pw.m68k

   3 entries listed, 60 blocks used.

Now that's interesting.  Which one is a liar -- /bin/ls, or /com/ld?
Even if /com/ld tells the truth, there is some wastage -- but not
quite as horrible as I thought.

By the way, I had neglected to tell you we are running SR10.3 (beta 3)
here.  Sorry about that.  Time to apr, perhaps.  If I only knew which
program to complain about...

- Harald Hanche-Olsen <hanche@imf.unit.no>
  Division of Mathematical Sciences
  The Norwegian Institute of Technology
  N-7034 Trondheim, NORWAY