[comp.sys.amiga] Getting smaller binaries out of Lattice C

mwm@eris.UUCP (04/12/87)

I know there are others out there using Lattice. You'll appreciate the
hints to be found herein.

	<mike

"Oh, Say Can You C"
   by John Toebes

This issue, I detour from further improvements to PRINT to examine
a problem common to many programs and programmers--how to produce small
code modules in C. A number of programs out there for Amiga(tm) take up
extraordinary amounts of disk space (and memory) only because the
authors didn't know or failed to use some simple tricks to reduce the
size of finished code.  I'll show the techniques on small programs
(anything under 32K falls into this category); they're equally
useful on larger programs.

Fundamental: Believe it or not, the fastest way to reduce the size of
a program is to recompile and relink it!  Many people forget to delete
the debugging data (pulled in from the libraries and included in
compiled code by default).  It's handy during development but should
be stripped from every final version.  This can be done with the
utility STRIPA or by using the NODEBUG option in BLINK.  Debug code can
account for 25 percent or more of program size.

Merging:  Use the SMALLCODE and SMALLDATA options when you BLINK; they
force all otherwise separate code hunks of a single type to be merged
into a single large hunk.  By default, the linker puts all hunks of
the same name into a single hunk, but those hunks that are not named
(or whose name you cannot control) are put into separate hunks.  This
is nice for scatter loading of the hunks and uses small fragments of
memory well, but for code under 32K one may readily argue that large
hunks fragment memory less because there are fewer hunks.  By using
the SMALLCODE and SMALLDATA options, you tell BLINK to combine all
like hunks regardless of the name.  Although this doesn't reduce the
size of the executable code, it does reduce the amount of overhead for
the loader and the amount of data stored in the disk load file.

Compiler Options:  With the Lattice(R) C compiler, Release 3.10, you
can employ compiler options such as -b (base relative data) -r
(base relative subroutine calls) and -v (no stack checking).  For a
small program or utility, all of these options are generally
reasonable.  Stack checking is only important if programs make many
levels of subroutine calls and may overflow the stack.  In reality,
one can usually live without the extra handholding for a debugged and
running application.

The Effects:  The options above reduce the code WITHOUT making any
change to the source itself.  To illustrate the changes in size, we can
take the program hello0.c (shown below) and compile it with different
options:
  /* hello0.c */                     Default SmallC&D NoDB  -b-r  -v
  #include <stdio.h>   LISTed Size:  10232    9856    8572  8560  8260
  main()
  {
  printf("hello world\n");
  }
Using SMALLCODE and SMALLDATA options, we drop the code size about 400
bytes; NODEBUG eats up another 1.3K.  The -b and -r options in LC only
eat 12 bytes in this tiny program, but the -v option grabs another 300.
Overall, we drop the code by 2K (20%) simply by recompiling and
relinking.

Changes in Code: We have a good start, but we can achieve even better
results by a few simple changes to the source.  In doing so, we must
remember two rules:
    1) Use only what you need.  General purpose functions should be
        avoided unless their capabilities are essential.
    2) Write code for the Amiga, not for a UNIX(R) machine!

Rule 1 is easy to forget.  How does one normally write a string with C?
With printf(), of course.  printf(), however, is capabable of MUCH more
than output of simple strings--it can format, in complex forms, every
data type known to C.  Why use such a length and powerful function
just to print a simple message?  In the program below, we substitute
puts() and immediately recover 2K of code.
  /* hello2.c */                     Default SmallC&D NoDB  -b-r  -v
  #include <stdio.h>   LISTed Size:   8086    7692    6596  6584  6284
  main()
  {
  puts("hello world");
  }
Our program now is only 61 percent of the size of the original.  Can
we make it smaller still?  We note that the program needn't provide for
command line arguments (argc and argv).  So we conveniently rename our
program _main instead of "main"; the standard _main.c program isn't
pulled in.  We save another 1.4K.

Rule 2 and UNIX.  We ruefully learn that the last program doesn't work.
The Amiga program _main.c is responsible for setting up the default
UNIX input and output file handles.  So our last move fails.  But it
show us how much overhead exists if we set up for the UNIX environment.
Can we avoid that environment?

[NOTE: Do not rename to _main.c if your program uses ARGV or ARGC, for
there will now be only one argument parm: char *argp, which points to
a single, null-terminated command string.  It's useful for some programs
(for an echo command, perhaps?), which require only one argument.]

Doing It With AmigaDOS: To get rid of the UNIX environment, we use
AmigaDOS for output.  This step drops our program all the way down to
2.2K (that's an 80 percent reduction) with the same results:
  /* hello4.c */                     Default SmallC&D NoDB  -b-r  -v
  #include <stdio.h>   LISTed Size:   4276    3968    3256  3240  2220
  _main()
  {
  Write(Output(), "hello world\n", 12);
  }
Sadly, we have to count the number of characters in the string--but we
can add a small routine to do this, as shown below in hello5.c:

If you call the output routine more than once this extra cost for a
subroutine is quickly eliminated.

  /* hello5.c */                     Default SmallC&D NoDB  -b-r  -v
  #include <stdio.h>   LISTed Size:   4388    4080    3336  3312  2272
  _main()
  {
  myputs("hello world\n");
  }

  myputs(str)
  char *str;
  {
  Write(Output(), str, strlen(str));
  }

Have we reached the limit?  No, there is still one more trick to pull.
Using the MAP option of BLINK, we see a couple of routines have been
pulled from LC.LIB, including MEM1.O.  This is quite odd, for our
program allocates no memory at all.  A check shows that c.a (the
assembler code for c.o, the startup code from Lattice) pulls in a
function called MemCleanup upon the assumption that we have allocated
memory--and that we must return it to the system at end of program.

Because we may need MemCleanup often, we certain shouldn't edit and
reassemble c.a.  Instead, we add a stub line (below) which creates a
dummy MemCleanup.  Our program drops out the code, reducing itself to
a mere 1088 bytes.
  MemCleanup(){}                     Default SmallC&D NoDB  -b-r  -v
                       LISTed Size:   2860    2572    2144  2128  1088

Do you always know when your program allocates memory?  No.  It may do
so indirectly even if you don't.  If you stub out the MemCleanup (as
above), can you fail to return some memory to the system when your
program ends?  Yes.  How do you tell when you need MemCleanup and when
you don't?  Simple enough:

Use the stub MemCleanup when you compile and link.  If the MAP file from
BLINK shows a reference to MEM1.o, you MUST remove the stub MemCleanup!
Your program has--directly or indirectly--allocated memory, and you
must return it to the system.

What Other Tricks can we use?

  a) Look at the MAP.  When a routine is pulled in from the library,
     see if you can understand why it is needed and look for ways
     around it.  A good example: if both MEMCPY and MOVMEM are used,
     one can easily be eliminated by recoding to call the other.

     Some functions you can't and shouldn't eliminate, such as STRCPY,
     STRLEN, and STRICMP; it is to your advantage to use these, for they
     are very specific purpose functions coded in assembler.

  b) Use register variables whenever possible.  If a variable is
     referenced frequently, it's a good candidate for a register
     variable.  However, if it is assigned often, code can grow
     somewhat.  If you aren't sure, try it both ways see what happens to
     the code size.

  c) Use subroutines for common functions.  If you find yourself
     repeating three lines or more of code, they're good candidates for
     a subroutine, ESPECIALLY if you can make the parameters register
     variables.

Not a single one of these techniques demands substitution of assembly
language, although that alternative remains open.

How Far Can We Go?  Is 1K too small?  In general, you can cut a small
program to as little as 2 ot 3K.  POPCLI, WBRUN, and MEMWATCH, all
coded using such methods, list at less than 4K.  A stripped-down version
of PRINT wieghs in at 2.2K.  The size of BLINK is due in large part to
the methods above.
     
Summarized below are the command by which each program was compiled and
linked:

Default versions:
  LC hello
  BLINK lib:c.o hello.o LIB lib:lc.lib lib:amiga.lib
[BLINK link above is called <standard link> below.  It was used in all
 linkings]

SMALLDATA and SMALLCODE:
  LC hello
  BLINK <standard link> SMALLCODE SMALLDATA

NODEBUG:
  LC HELLO
  BLINK <standard link> SMALLCODE SMALLDATA NODEBUG

OPTION -b and -r:
  LC HELLO -b -r
  BLINK <standard link> SMALLCODE SMALLDATA NODEBUG

OPTION -v:
  LC HELLO -b -r -v
  BLINK <standard link> SMALLCODE SMALLDATA NODEBUG

-----------------------------------------------------------------------
The above was excerpted with permission from Volume I, Number 6 of:

   The Amigan Apprentice & Journeyman.
     Published 6 times a year by The Amigans, a not-for-profit
     association of those who employ the Amiga computer.  Purpose of
     the association is the interchange of useful information.

     Membership in The Amigans is $24 (U.S.) per year in the United
     States and Canada; $34 (U.S.) elsewhere.  To join, Send membership
     fee to:
         The Amigans
         Box 411
         Hatteras, NC 27943

Copyright (c) 1986 by The Amigans.  All Rights Reserved.

Amiga is a tademark of Commodore-Amiga, Inc.
Lattice is a registered trademark of Lattice, Inc.
Unix is a registered trademark of AT&T
--
Here's a song about absolutely nothing.			Mike Meyer        
It's not about me, not about anyone else,		ucbvax!mwm        
Not about love, not about being young.			mwm@berkeley.edu  
Not about anything else, either.			mwm@ucbjade.BITNET