[comp.lang.c] EXE file size, C vs. Pascal

nuspljj@mentor.cc.purdue.edu (Joseph J. Nuspl Jr.) (11/10/90)

Over the past year, I have written several Unix-like commands -- cat, ls, ...
in Turbo Pascal 5.5.  I have recently rewritten them in Turbo C++ hoping
to improve speed and/or reduce file size.  The C compiled programs are
significanly larger.  Cat in Pascal is ~3k, Turbo C ~17, DeSmet C ~10.

I will continue my 'Ms-Dix' project and was wondering which platform, C or
Pascal, that I should continue with.  I am trying to keep the size of the
executables small and have the speed of execution as fast as possible.

BTW - I don't have time to code every thing in Assembler.

Comments?

Thanks.

      __    __________    _________
     /_/|  /_________/|  /________/|	Joseph J. Nuspl Jr.
     | ||  |  ____  | |  | _______|/	nuspljj@mentor.cc.purdue.edu
     | ||  | ||   | | |  | ||______
  __ | ||  | ||   | | |  | |/_____/|	Purdue University	
 /_/|| ||  | ||   | | |  | _______|/	Shreve Hall, Room 308
 | ||| ||  | ||___| | |  | ||______	West Lafayette, IN 47906
 | |_| ||  | |/___| | |  | |/_____/|
 |_____|/  |________|/   |________|/	(317) 495-5415

lacey@cpsin3.cps.msu.edu (Mark M Lacey) (11/10/90)

In article <16398@mentor.cc.purdue.edu> nuspljj@mentor.cc.purdue.edu (Joseph J. Nuspl Jr.) writes:
>
>Over the past year, I have written several Unix-like commands -- cat, ls, ...
>in Turbo Pascal 5.5.  I have recently rewritten them in Turbo C++ hoping
>to improve speed and/or reduce file size.  The C compiled programs are
>significanly larger.  Cat in Pascal is ~3k, Turbo C ~17, DeSmet C ~10.
>...
>...
>Comments?

Most likely, it is a combination of the library routines that you are
using, along with the code that is being generated by the compiler (are
you sure you don't have TC outputting code for the debugger?).  Another
piece of overhead is the "startup" routine (__main) which most MS-DOS
compilers link in with your code (I believe UNIX 'cc' does the same, but
I don't recall the name of the file it links in).

I have found that most C compilers have particularly bulky library
routines.  I have used Turbo C (and C++), Lattice C, and Zortech C, and
found that almost every time, Zortech produced a much smaller .EXE (like
1-3 K for a small program like cat).  Turbo C & Lattice C always seemed
to by at LEAST 8-10K in size.

--
Mark M. Lacey
(lacey@cpsin.cps.msu.edu)

steve@taumet.com (Stephen Clamage) (11/11/90)

nuspljj@mentor.cc.purdue.edu (Joseph J. Nuspl Jr.) writes:

>Over the past year, I have written several Unix-like commands -- cat, ls, ...
>in Turbo Pascal 5.5.  I have recently rewritten them in Turbo C++ hoping
>to improve speed and/or reduce file size.  The C compiled programs are
>significanly larger.  Cat in Pascal is ~3k, Turbo C ~17, DeSmet C ~10.

Once again, someone is confusing linked program size with efficiency.
If you are going to put this program in ROM, EXE file size is important.
Otherwise, it is likely to be irrelevent.

EXE files may contain data other than program code and data.  It may contain
data which is never loaded into memory, such as debugging information.
Turbo C in particular includes information about exactly which source and
object files were used, and their dates and times, as well as the date and
time of the link.  None of this gets loaded at run time, but it does take up
space.

If you write cat in C in terms of standard I/O (<stdio.h>), you will
be including a lot of library code which has far more capability than
you need for the program.  Depending on how the I/O package is divided
into modules, you may include code which is never executed.  If you
care to sacrifice portability in favor of small EXE size, you can
write in terms of low-level open, close, read, and write calls.  These
are supported by most C systems, and you will not need to link in the
standard I/O package.  The program may be faster as well.

The way to determine which programs are more efficient is to measure
their performance, not to look at the numer of bytes in the EXE file.

You also may wish to consider how easy it is to get the functionality
you need in your program, and how robust your program is likely to be.

I do not claim that you will get better results in C than in Pascal,
but that you are using the wrong criterion for judgement.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

defaria@hpclapd.HP.COM (Andy DeFaria) (11/11/90)

>/ hpclapd:comp.lang.c / nuspljj@mentor.cc.purdue.edu (Joseph J. Nuspl Jr.) /  5:50 pm  Nov  9, 1990 /
>
>Over the past year, I have written several Unix-like commands -- cat, ls, ...
>in Turbo Pascal 5.5.  I have recently rewritten them in Turbo C++ hoping
>to improve speed and/or reduce file size.  The C compiled programs are
>significanly larger.  Cat in Pascal is ~3k, Turbo C ~17, DeSmet C ~10.

I'm in the process of learning C better (and learning C++).  I'm using TC++
and would be interested in your cat, ls, etc routines.

defaria@hpclapd.hp.com

jak@sactoh0.SAC.CA.US (Jay A. Konigsberg) (11/12/90)

In article <16398@mentor.cc.purdue.edu> nuspljj@mentor.cc.purdue.edu (Joseph J. Nuspl Jr.) writes:
>
>Over the past year, I have written several Unix-like commands -- cat, ls, ...
>in Turbo Pascal 5.5.  I have recently rewritten them in Turbo C++ hoping
>to improve speed and/or reduce file size.  The C compiled programs are
>significanly larger.  Cat in Pascal is ~3k, Turbo C ~17, DeSmet C ~10.
>

I have wondered the same thing. Mainly, why are C exectuables so large.

Given the minimum program (foo.c):

main()
{
}

The sizes are (on a 3B2/400 cc, but other machines/C compliers give
about the same).

	 bytes
         -----
foo.c       11
foo.o      256 
foo       4815

As a program gets larger, the overhead stays about the same. This
implies that Unix/C creates a 4K+ header block. My only guess is that
its for bss and stuff. Would anyone care to explain this and give
references? I have looked at a.out.h(4) and suspect the answer is
there, but am unable to ferret it out.

-- 
-------------------------------------------------------------
Jay @ SAC-UNIX, Sacramento, Ca.   UUCP=...pacbell!sactoh0!jak
If something is worth doing, it's worth doing correctly.
Newsgroups: poster
Subject: Exectuable size of C (was: EXE file size, C vs. Pascal)
Summary: 
Expires: 
References: <16398@mentor.cc.purdue.edu>
Sender: 
Followup-To: 
Distribution: 
Organization: Sacramento Public Access Unix, Sacramento, Ca.
Keywords: C, Pascal, MsDos

In article <16398@mentor.cc.purdue.edu> nuspljj@mentor.cc.purdue.edu (Joseph J. Nuspl Jr.) writes:
>
>Over the past year, I have written several Unix-like commands -- cat, ls, ...
>in Turbo Pascal 5.5.  I have recently rewritten them in Turbo C++ hoping
>to improve speed and/or reduce file size.  The C compiled programs are
>significanly larger.  Cat in Pascal is ~3k, Turbo C ~17, DeSmet C ~10.
>

I have wondered the same thing. Mainly, why are C exectuables so large.

Given the minimum program (foo.c):

main()
{
}

The sizes are (on a 3B2/400 cc, but other machines/C compliers give
about the same).

	 bytes
         -----
foo.c       11
foo.o      256 
foo       4815

As a program gets larger, the overhead stays about the same. This
implies that Unix/C creates a 4K+ header block. My only guess is that
its for bss and stuff. Would anyone care to explain this and give
references? I have looked at a.out.h(4) and suspect the answer is
there, but am unable to ferret it out.

-- 
-------------------------------------------------------------
Jay @ SAC-UNIX, Sacramento, Ca.   UUCP=...pacbell!sactoh0!jak
If something is worth doing, it's worth doing correctly.

herrj@silver.ucs.indiana.edu (Jonathan R. Herr) (11/12/90)

Might the C files be larger due to the #include'd files? When you
include these, isn't the WHOLE file included?

Jonathan R. Herr            |  Don't wait to see me.  Send e-mail.  I don't
herrj@silver.ucs.indiana.edu|  always make it to campus but always get home.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  I'd clone myself if I could and get twice as much done in a single day.

bright@nazgul.UUCP (Walter Bright) (11/13/90)

In article <16398@mentor.cc.purdue.edu> nuspljj@mentor.cc.purdue.edu (Joseph J. Nuspl Jr.) writes:
/I am trying to keep the size of the
/executables small and have the speed of execution as fast as possible.
/BTW - I don't have time to code every thing in Assembler.

It is possible to write very tiny executables in C. The trick is to avoid
pulling in unused overhead from the C standard library. The first thing to
do is generate a .MAP file and look at it to see what you are pulling in.
Then, see if large and complex functions can be replaced by small and simple
once. For instance, make sure you are not pulling in floating point code
if you aren't using it. Also, try to replace things like printf("abc") with
puts("abc"), as printf is very big and pulls in a lot.

Purchase the library source for your compiler, and study it to see what
depends on what. I think you'll find it very worthwhile!

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (11/14/90)

(MS-DOS specific stuff follows.)

I found that executables resulting from Turbo C 1.0 compilations could
be made much smaller if I wrote my own customized C run-time
initilization code.  The standard run-time stuff under MS-DOS does zero
or more of the following things:

- Put some standard things in global variables.  E.g.
  MS-DOS version  and address of PSP (program segment prefix).

- Allocate BSS space (if not already done during loading).
  Initialize BSS data to zeroes.

- Allocate space for a stack, exit with error if insufficient
  space available.

- If the program uses near data, make the environment variables
  available in near memory by allocating memory and making a local copy
  of the environment.

- Initialize a block of memory at address 0 (or offset 0), for
  later checking of NULL pointer dereferencing.  (There may be no
  actual run-time initialization, but the memory is still allocated,
  and may take up space in the executable.)

- Collect arguments from the command line.  Test for MS-DOS
  version;  if 3.0 or greater, initilize argv[0] from the pathname
  following the environment data, else initilize argv[0] to point to a
  null character.  Allocate memory for argv[1] onwards.  Parse command
  line, interpreting blank, tab, double quotes, and backslash-
  quoted-double-quote intelligently.  Split command line into tokens,
  allocate memory, and initialize argv[1] onwards.

- Call main() with parameters.  Wait for return.  Upon return,
  check memory at address 0 for possible NULL pointer dereference
  and possibly print an error message.

- If stdio was linked in, call a routine to close all files.

- Exit back to MS-DOS, supplying the exit system call with the return
  code received from main().
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

hp@vmars.tuwien.ac.at (Peter Holzer) (11/20/90)

bright@nazgul.UUCP (Walter Bright) writes:
       ^^^^^^ Beware! The Nine are riding again!!!

>In article <16398@mentor.cc.purdue.edu> nuspljj@mentor.cc.purdue.edu (Joseph J. Nuspl Jr.) writes:
>/I am trying to keep the size of the
>/executables small and have the speed of execution as fast as possible.
>/BTW - I don't have time to code every thing in Assembler.

>It is possible to write very tiny executables in C. The trick is to avoid
>pulling in unused overhead from the C standard library. The first thing to
>do is generate a .MAP file and look at it to see what you are pulling in.
>Then, see if large and complex functions can be replaced by small and simple
>once. For instance, make sure you are not pulling in floating point code
>if you aren't using it. Also, try to replace things like printf("abc") with
>puts("abc"), as printf is very big and pulls in a lot.

Yes, stdio is a memory hog (not just the ?printf-family). fopen
pulls in malloc, for example, and several other functions one
would not expect at first glance. I remember that I wrote a
program that did all it's I/O directly to the screen instead of
using stdio. The executable was about 40k large until I found a
lonely fprintf (stderr, ...) lurking somewhere. I removed it,
and voila, the size of the executable dropped to about 20k.
After running strip and exepack on it (To remove symbol tables
and the bss segment, which TC stupidly initializes to zeroes
instead of leaving this work to DOS at load time) and the
EXE-file shrank to only 12k.

So a little thought and the right tools can shrink programs by a
substantial amount.

>Purchase the library source for your compiler, and study it to see what
>depends on what. I think you'll find it very worthwhile!

That's not necessary, although having the source code is handy
sometimes -- makes bug fixes or writing special versions of
library functions much easier (disassembling the library isn't
my idea of fun). OBJXREF gets you all the information you need
and I wrote a small program that converts its output into
something looking like cflow(1) output. If somebody wants it, I
can mail or post it.

HP
--
|    _  | Peter J. Holzer                       | Think of it   |
| |_|_) | Technical University Vienna           | as evolution  |
| | |   | Dept. for Real-Time Systems           | in action!    |
| __/   | hp@vmars.tuwien.ac.at                 |     Tony Rand |