[net.lang.c] The cost of large uninitialized static/global arrays

petera@utcsri.UUCP (Smith) (05/21/86)

     If you are programming in C and are worried about the size of
your executable file and the time required to load this file off disk
ie as with an editor or other frequently used utility, consider a
reasonably large C program with a declaration of a global or static
array which is pretty big relative to the size of the executable 
program. For example consider a program of around 80K with a 64K
array declared but uninitialized until run time eg: something like:

     char *bigarray[16384];  

     Depending on the linker/loader/compiler we may end up with the
executable program occupying 80K+64K on disk. Where 64K of the disk 
space is NULL. Hence everytime we execute it, it takes nearly twice
as long to load off disk! And, what does it do during that time? It loads
NULL's into memory! What a complete waste of time and space. Especially
if you are executing the program frequently, nearly half of the load
time is wasted, not to mention nearly half the space occupied by each 
copy of the dormant executable file. 

     While this may be obvious to many experienced assembly language
programmers, it may not be obvious to other programmers. I had learnt
the lesson while programming in 8080 for my old H-8 where memory and
disk space were scarce but had completely forgotten about it because
of the large amount of memory now available on most machines. It only
just hit me again when I happened to look at a uuencoded copy of one of my 
programs. The uuencoding was nearly all blanks ie compressed NULLs.

     Some compilers/linkers/loaders may avoid this problem by not 
storing large uninitialized static/global data in the executable 
file, but rather expanding it at load time. In the case of MS-DOS
this appears not to be done. So, the lesson learned is to allocate
your large uninitialized static/global space at run time and then
you do not have to worry about how smart the loader is. This is
probably one major factor why some C programs compile to a large 
executable file on one machine but to a significantly smaller file
on another.

     Peter Ashwood-Smith
     University Of Toronto.        

farren@hoptoad.UUCP (05/29/86)

In article <2810@utcsri.UUCP> petera@utcsri.UUCP (Smith) writes:
<<  A whole lot about executables in MS-DOS which have vast numbers of
    zeros embedded in the file >>

    If you're using Microsoft C, or the Microsoft Macro Assembler, you
might like to look at the EXEPACK utility, which addresses this very
problem by encoding long runs of identical bytes.  This can result in
quite significant improvement in .EXE file sizes.  (There is also supposed
to be a switch in the latest versions of the linker which will automatically
produce packed executables, but I don't know what it is.)

Mike Farren
{hplabs, dual, lll-crg}!well!farren
hoptoad!farren

nather@ut-sally.UUCP (Ed Nather) (05/29/86)

In article <842@hoptoad.uucp>, farren@hoptoad.uucp (Mike Farren) writes:
> In article <2810@utcsri.UUCP> petera@utcsri.UUCP (Smith) writes:
> <<  A whole lot about executables in MS-DOS which have vast numbers of
>     zeros embedded in the file >>
> 
>     If you're using Microsoft C, or the Microsoft Macro Assembler, you
> might like to look at the EXEPACK utility [...]

If you're using Microsoft C you don't get global arrays full of nulls
anyway.  I have a program with a global array of 20000 characters plus a
lot of code.  The executable is 10K in size.

-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
nather@astro.AS.UTEXAS.EDU