hanche@imf.unit.no (Harald Hanche-Olsen) (10/05/90)
I just tried building a cmpexe file using the `xar' command. The result is interesting: % xar -cv pw pw.a88k pw.m68k Added element tagged m68k from 'pw.m68k' Added element tagged a88k from 'pw.a88k' % xar -tv pw type offset size alignment last-modified tag coff 0 5311 524288 90/09/16.10:27 a88k coff 262144 5066 262144 90/09/16.10:40 m68k % ls -l pw* -rwxr-xr-x 1 hanche 270420 Oct 5 13:48 pw* -rwxr-xr-x 1 hanche 5311 Sep 16 10:27 pw.a88k* -rwxr-xr-x 1 hanche 5066 Sep 16 10:40 pw.m68k* If my arithmetic is not way off, we are talking a 2500% overhead here! All right, I will admit I have been unfair here -- a test with a larger program reveals that the overhead seems to be fairly constant, i.e. about 250K. So for large programs the overhead isn't too bad percantagewise, but it's obviously a killer if you try to build up a large library of small cmpexe'd programs. Does anyone know why it has to be this way? Or does it? Ought I to submit an APR on it? - Harald Hanche-Olsen <hanche@imf.unit.no> Division of Mathematical Sciences The Norwegian Institute of Technology N-7034 Trondheim, NORWAY
dbfunk@ICAEN.UIOWA.EDU (David B Funk) (10/12/90)
In posting <HANCHE.90Oct5143121@hufsa.imf.unit.no> Harald Hanche-Olsen asks: > I just tried building a cmpexe file using the `xar' command. The > result is interesting: > > % xar -cv pw pw.a88k pw.m68k > Added element tagged m68k from 'pw.m68k' > Added element tagged a88k from 'pw.a88k' > % xar -tv pw > type offset size alignment last-modified tag > coff 0 5311 524288 90/09/16.10:27 a88k > coff 262144 5066 262144 90/09/16.10:40 m68k > % ls -l pw* > -rwxr-xr-x 1 hanche 270420 Oct 5 13:48 pw* > -rwxr-xr-x 1 hanche 5311 Sep 16 10:27 pw.a88k* > -rwxr-xr-x 1 hanche 5066 Sep 16 10:40 pw.m68k* > > If my arithmetic is not way off, we are talking a 2500% overhead here! > All right, I will admit I have been unfair here -- a test with a > larger program reveals that the overhead seems to be fairly constant, > i.e. about 250K. So for large programs the overhead isn't too bad > percantagewise, but it's obviously a killer if you try to build up a > large library of small cmpexe'd programs. Does anyone know why it > has to be this way? Or does it? Ought I to submit an APR on it? Actually things arn't really as bad as they look, "ls" just isn't telling you everything. A "cmpexe" file is sparse and may actually have lots of "empty space" inside of it that doesn't use up disk blocks. If you look at that "cmpexe" file via "/com/ld -a" you'll see the real answer: $ /com/ld -a pw sys type blocks current type uid used length attr rights name file cmpexe 20 270420 P prwx- pw 1 entry listed, 20 blocks used. So the amount of disk space actually used up is only 20480 bytes, it is just that there is 260180 bytes worth of "space" between the block numbers allocated for the first 10k bytes and the block numbers allocated to hold the last 10k bytes. This is done so that the data will fall on memory segment aligned boundaries which speeds up the run time loading process. Look at the "alignment" and "offset" fields in the "xar -tv" output. Thus these files don't actually "cost" you the disk space that "ls" seems to tell you they do. Try this little experiment: move the file into a new directory so that it is all by itself. Now do an "ls -l" and look at the "total" value at the top of the "ls" output. Note that it matches the "blocks used" output from "/com/ld". For another example of sparse files, look at almost any "mbx" type file. If you use the Apollo alarm server, look at its message mail-box file in /tmp: $ ls -l total 6 -rwxrwxrw-+ 1 dbfunk 104208 Oct 12 03:33 alarm_server.msg_mbx drwxrwxrwx 1 root 1024 Jun 9 01:49 layers -rw-rw-rwx+ 1 root 144 Oct 9 13:59 llbdbase.dat $ /com/ld -a Directory "/sys/node_data/tmp": sys type blocks current type uid used length attr rights name file mbx 4 104208 P prwx- alarm_server.msg_mbx dir nil 1 1024 P prwx- layers file unstruct 1 144 P prwx- llbdbase.dat 3 entries, 6 blocks used. It looks like that file has over 100k bytes allocated to it but it uses up only 4 disk blocks. The contents of `node_data/systmp contain other examples of these things. However there is one way that you can lose "big time" on this stuff. If you use any Unix type program, such as "cp", to read these type files, the empty space may be allocated and filled in with real disk blocks. For example: $ xar -tv garp type offset size alignment last-modified tag coff 0 3552 524288 90/08/16.01:59 a88k coff 32768 3390 32768 90/08/16.01:52 m68k $ /com/ld -a Directory "/test": sys type blocks current type uid used length attr rights name file cmpexe 14 36948 P prwx- garp 1 entry, 14 blocks used. $ /com/cpf garp gork $ /com/ld -a Directory "/test": sys type blocks current type uid used length attr rights name file cmpexe 14 36948 P prwx- garp file cmpexe 14 36948 P prwx- gork 2 entries, 28 blocks used. $ /bin/cp gork guck $ /com/ld -a Directory "/test": sys type blocks current type uid used length attr rights name file cmpexe 14 36948 P prwx- garp file cmpexe 38 36948 P prwx- gork file unstruct 38 36948 P prwx- guck 3 entries, 90 blocks used. Note that after the "/bin/cp" the blocks used by the file "gork" changed from 14 to 38, even though it was the source of the copy operation (if you use the "-o" flag on "cp" this problem is avoided). Dave Funk
hanche@imf.unit.no (Harald Hanche-Olsen) (10/12/90)
Thanks for you letter. Below is my own follow-up, just posted a few
minutes ago...:
In article <HANCHE.90Oct5143121@hufsa.imf.unit.no> hanche@imf.unit.no (Harald Hanche-Olsen) writes:
I just tried building a cmpexe file using the `xar' command. The
result is interesting:
% xar -cv pw pw.a88k pw.m68k
Added element tagged m68k from 'pw.m68k'
Added element tagged a88k from 'pw.a88k'
and I proceded to show the output of ls -l and rant and rave about
the horrible waste of space here. Since that, several people have
written to me, pointing out that the file is actually sparse, so it
doesn't use as much disk space as ls -l suggests. Well, I checked
it out a little more:
% ls -ls pw*
176 -rwxr-xr-x 1 hanche 270420 Oct 12 10:24 pw*
8 -rwxr-xr-x 1 hanche 5311 Sep 16 10:27 pw.a88k*
8 -rwxr-xr-x 1 hanche 5066 Sep 16 10:40 pw.m68k*
% /com/ld -a pw*
sys type blocks current
type uid used length attr rights name
file cmpexe 44 270420 P prwx- pw
file coff 8 5311 P prwx- pw.a88k
file coff 8 5066 P prwx- pw.m68k
3 entries listed, 60 blocks used.
Now that's interesting. Which one is a liar -- /bin/ls, or /com/ld?
Even if /com/ld tells the truth, there is some wastage -- but not
quite as horrible as I thought.
By the way, I had neglected to tell you we are running SR10.3 (beta 3)
here. Sorry about that. Time to apr, perhaps. If I only knew which
program to complain about...
- Harald Hanche-Olsen <hanche@imf.unit.no>
Division of Mathematical Sciences
The Norwegian Institute of Technology
N-7034 Trondheim, NORWAY