[comp.sys.ibm.pc] Microsoft Vs. Borland; benchmarks!

evas@euraiv1.UUCP (Eelco van Asperen) (10/06/88)

[Here's a comparison of MSC and TurboC as my contribution to the 
"Microsoft vs. Borland" discussion. I wrote this a couple of months 
ago and posted it but that failed due to administrative reasons.
Note that I don't have Turbo C v2.0 yet; if anybody wants the source 
of the benchmarks to run them for Turbo C v2.0, I'll be happy to send 
them. To make a fair comparison, you should run them on the same type
of machine; I also have access to an Olivetti M24 (aka. AT&T 6300) and
an Olivetti M280.					-EvAs.]


Benchmarking Borland's Turbo C v1.5 and Microsoft's C v5.1 Compilers

			
INTRO

To get some clarity in the continuing debate concerning the Microsoft and
Borland C compilers, I've benchmarked them according to some of the
benchmarks used in the article "Benchmarking C Compilers" which
appeared in Dr.Dobb's Journal (DDJ), August 1986. (Philip Freidin, one
of the authors, was kind enough to send them to me. Thanks, Phil!)

The compilers compared are Microsoft C v5.1 (MS-C) and Borland Turbo C 
v1.5 (TC). All tests were done on an AT-clone, running at 12 Mhz with 
0 wait-states under MS-DOS v3.3; to eliminate the speed of the hard 
disk from the results, I ran the programs on a ramdisk.

The programs were compiled with all optimizations enabled; for MS-C,
the flags are '-Ox -Gs' and for TC they are '-G -O -Z -r'.  The tests
were compiled and run for each memory model available on both
compilers; TC's Tiny-model has not been included because MS-C hasn't
got a comparable model (at least the compiler does not generate special
code for it; I haven't checked if programs compiled with the
small-model can be converted to com-files after linking). Each test was
repeated a number of times to increase accuracy; the loop-count for
each test is in the table.


THE BENCHMARKS

A brief description of the benchmarks used;

ARRAY tests the compiler's ability to efficiently access arrays using
conventional array operations. A 10x10x10 int-array is copied using
three nested for-loops.

ATOX tests the atoi, atol and atof functions; it has 21 atoi calls, 16
atol-calls, and 8 atof-calls. Each call passes a string constant, some
of which have many leading blanks or zeros.

CPYBLK copies a file of 10,000 bytes using fread and fwrite in 1024
bytes blocks.

CPYCHR copies the same file but this time using fgetc and fputc; a
comparison of the times for CPYBLK and CPYCHR should tell you more
about the difference between block and character I/O.

DISKIO does random seeks in a file of 240 Kb and thus measures the
speed of fseek.

FIBTEST is the standard recursive Fibonacci number generator. We call
it for 24. This mainly tests function entry and exit code.

FILLSCR writes 1,248 characters to the screen, consisting of sequences
of 78 a's followed by a carriage return. This measures the speed of
screen output in the absence of scrolling.  (The test is done just
after a CLS.)

The FUNCOVR programs test function call overhead; they consist of
procedures with zero, one, two and three arguments respectively and no
body.

DFUNCRET tests the ability to return function-values efficiently; the
function returns a double.

LOOPTST does a simple for-loop test.

MEMORY was created to test the speed of malloc/free; per loop, 500
blocks of 50 bytes are malloc'ed. Then every fifth one is free'ed and
100 blocks of 35 bytes are malloc'ed, followed by a free of all
allocated blocks.

The MIN programs are used to determine the minimum size for a program;
	MINMAIN         no code; this measures startup + exit code
	MINPRTF         printf's in main 
	MINPUTS         uses puts rather than printf 
	MINFIO          calls to fopen, fgetc, fputc, fread, fwrite and 
			fclose

OPTIMIZE should test the compiler's ability to optimize code; as the
authors of the DDJ-article note, this is one of the weakest benchmarks
because even a relatively simple optimizer could reduce it to nothing.
With the arrival of more and more optimizing compilers, this will
become one of the hardest things to test.

POINTER is a pointer-version of the ARRAY-test; it uses 6 pointers and
three levels of indirection to copy the 10x10x10 array.

PRTF is meant to determine the speed of printf; the results should be
compared to the result of SCROLL. (They print the same line.)

RSIEVE and SIEVE are versions of the infamous sieve benchmark program;
RSIEVE uses register-variables whereas SIEVE does not.

SCROLL is similar to FILLSCR but instead of the carriage-return, a
newline is printed.

STORAGE is used to determine the difference between the various storage
classes in C; four variables are declared automatic, register and
static. To see if the compiler will allocate more than two registers
for variables, the register-test is also done with just two of the four
variables declared as register.

STRINGS assesses the quality of the library-routines strcat, strcpy,
strncpy, strlen, strcmp, and strncmp.

TDOUBLE and DFLOAT test floating-point performance; in each loop, 40
adds, subtractions and multiplies and 20 divides are done. A compiler
that conforms the ANSI C standard (yeah, I know; this should read
'conforms to a draft version of' etc), should be faster than a compiler
that conforms to K&R in the DFLOAT test because it doesn't have to
convert floats to doubles before each operation.

TINT and TLONG attempt to measure the performace of integer and long
operations, respectively.  For each loop, 1,500 adds, 1,600 subtracts,
200 multiplies and 200 divides are done.

TRIG times the speed of the trigonometric functions sin, cos and tan.
For each loop, these functions are called 12 times.

And now for the real stuff; here are the...

EXECUTION TIMES

Model:             Small       Compact      Medium        Large        Huge

Test     Loops   TC    MSC    TC    MSC    TC    MSC    TC    MSC   TC    MSC
--------------+------------+------------+------------+------------+----------
array     1500| 24.9   2.4 | 25.5   2.4 | 25.0   2.4 | 25.5   2.4 | 25.5  2.4
atox       100|  1.1   1.7 |  1.2   1.7 |  1.2   1.7 |  1.2   1.7 |  1.2  1.7
cpyblk      15|  7.8   2.3 |  8.8   3.0 |  7.9   2.3 |  9.1   3.1 |  9.2  3.2
cpychr      15|  9.5   6.4 | 10.3   6.9 |  9.8   6.6 | 11.0   7.3 | 11.4  7.3
diskio     350| 15.7  15.6 | 15.7  15.6 | 15.7  15.6 | 15.7  15.7 | 15.8 15.6
fibtest     18| 14.1  13.4 | 14.4  13.4 | 15.3  14.5 | 15.4  14.5 | 17.9 14.5
fillscr     12|  9.0   3.2 |  9.0   8.9 |  9.0   3.3 |  9.0   8.9 |  9.0  8.8
funcov0  10000| 16.3  15.1 | 17.3  15.1 | 22.2  16.8 | 22.9  16.8 | 34.3 16.8
funcov1  10000| 22.7  22.6 | 22.9  22.6 | 28.0  26.0 | 28.3  26.0 | 35.6 26.0
funcov2  10000| 24.9  24.3 | 23.8  24.3 | 29.6  27.2 | 28.7  27.2 | 37.1 27.2
funcov3  10000| 29.7  28.3 | 30.6  28.2 | 34.8  31.8 | 35.6  31.7 | 43.0 31.8
ifuncret  2500| 12.0  11.7 | 11.8  11.7 | 13.1  13.4 | 13.7  13.4 | 17.4 13.4
lfuncret  2500| 16.7  15.1 | 16.2  15.1 | 18.9  17.6 | 19.1  17.6 | 22.3 17.6
dfuncret   250| 37.8  28.2 | 37.6  29.7 | 37.8  28.4 | 38.1  29.9 | 38.2 29.9
looptst    500|  7.6   0.0 |  6.9   0.0 |  7.6   0.0 |  6.9   0.0 |  6.9  0.0
memory     500| 30.8  11.7 |196.5  14.5 | 31.4  12.3 |198.8  15.3 |206.3 17.5
optimize   100|  4.0   0.5 |  4.1   0.6 |  4.1   0.5 |  4.0   0.6 |  4.0  0.6
pointer   1500|  6.8   5.2 | 12.5   2.5 |  6.7   5.2 | 12.4   2.5 | 12.5 20.8
prtf        12| 12.6   7.0 | 12.6   7.1 | 12.6   7.0 | 12.6   7.1 | 12.6  7.1
rsieve     140| 13.9  11.8 | 13.7  11.8 | 14.0  11.8 | 13.6  11.8 | 13.6 11.8
scroll      12| 12.4   6.5 | 12.4  12.3 | 12.4   6.5 | 12.4  12.3 | 12.3 12.3
sieve      140| 14.0  12.7 | 13.6  12.7 | 14.0  12.7 | 13.7  12.7 | 13.7 12.7
storage:
 autotst   150| 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8  0.0
 stattst   150| 15.2   0.0 | 16.1   0.0 | 15.2   0.0 | 16.1   0.0 | 15.2  0.0
 regtest   150| 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8  0.0
 reg2test  150| 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8  0.0
strings   1000|  2.0   1.7 |  2.0   1.7 |  2.0   1.7 |  2.0   1.7 |  2.0  1.7
switch1   1000|  0.6   1.8 |  0.6   1.8 |  0.6   1.8 |  0.6   1.8 |  0.6  1.8
switch2   1000|  0.6   0.7 |  0.6   0.7 |  0.6   0.7 |  0.6   0.7 |  0.7  0.7
switch3   1000|  0.6   0.7 |  0.7   0.7 |  0.6   0.7 |  0.7   0.7 |  0.7  0.7
tdouble    500| 21.0  10.3 | 21.0  10.3 | 21.0  10.3 | 21.0  10.3 | 21.0 10.3
tfloat     500| 22.6  10.1 | 22.6  10.1 | 22.5  10.1 | 22.6  10.1 | 22.6 10.1
tint      1500|  5.7   2.0 |  5.7   2.0 |  5.7   2.0 |  5.7   2.0 |  5.7  2.0
tlong     1000| 34.0   2.7 | 34.3   2.7 | 34.1   2.7 | 34.3   2.7 | 34.3  2.7
trig       100|  6.4   0.0 |  6.4   0.0 |  6.4   0.0 |  6.4   0.0 |  6.4  0.0
--------------+------------+------------+------------+------------+----------
all times are in seconds.


CODE SIZE

           Small     Compact      Medium      Large       Huge
          TC  MS-C   TC   MS-C   TC   MS-C   TC   MS-C   TC   MS-C
-------- ---- ----  ---- -----  ---- -----  ---- -----  ---- -----
minfio   7560 9319  9700 12049  7806  9523 10474 12253 11978 12301
minmain  2402 4399  2942  4567  2472  4469  3012  4637  3382  4637
minprtf  6214 9081  7762 11315  6356  9263  7904 11497  9025 11497
minputs  4572 7233  6072  9691  4706  7373  6206  9847  7296  9847

(programs where compiled with all optimization-flags on.)


In addition to these tests, I ran the dhrystone-program (compiled
with the Small memory model);

	TC	2590  dhrystones/second
	MS-C	3401  dhrystones/second


The results clearly show that the Microsoft compiler produces superior
code when compared to Borland's. In a number of cases the MS-C code
outperformed TC's by a factor of ten, for example with the TLONG and ARRAY
tests. The good optimization in MS-C also provides some problems; since
benchmarks are artificial and try to measure the efficiency of a
certain type of operation, they are extremely prone to being optimized
away, ie. reduced to no code at all.  This is shown best by the
LOOPTST, STORAGE and TRIG programs. We definitely need a new class of
benchmarks for future tests.

The only areas in which TC has the lead are switch-statements (and only
marginally so) and the ATOX benchmark.  The result of the MEMORY test
are kind of dramatic for TC; these functions get very slow when using a
large data-model, while MS-C performs more or less the same for all
models.


COMPILATION SPEED

The price one usually pays for better optimization is longer
compile-times; to check this, I timed the compilation and linking of
the test suite for the Small memory-model. For TC, the Turbo Linker
TLINK was used; as this is a limited yet fast linker, I reran the test
for TC with the standard linker, the one that was also used for MS-C,
MS LINK v5.01.04.  Before running each test, I ran a disk-compression
utility to make sure that file fragmentation would not distort the
timings.  In the DDJ-review, they used a different method to measure
compilation speed. Since I don't have the files they used for this
test, this will have to do.

Compile and Link Times;

				Optimization
			Enabled		Disabled
			-------		--------
	TC with TLINK :	284.9		284.3
	TC with LINK  :	331.5		331.0
	MS-C with LINK:	681.7 		642.8


The compile time for the following program

	int alfa;

should give us some idea of the amount of overhead associated with
calling the compiler.

		Compile Load 
		------- ----
	TC:	 2.8	2.0
	MS-C:	 9.7	1.4

'Compile' is the total time required to compile this mini-program and
'Load' is the time needed to load the compiler.  (All times are given
in seconds.)


CONCLUSIONS

Based on the data presented here and my experiences with both products,
Microsoft C wins the battle; it generates by far the best code. Turbo
C's one-pass compiler has shorter compile times and creates smaller 
executables but the code produced is inferior to MS-C's. 

Furthermore, when it comes to writing a reference manual for a language
the boys (and girls) at Borland could learn something from the
Unix-community; start each reference on a separate page !  In its
current form, the TC reference manual is a real pain to use. As they
use the same style in the Turbo Pascal 3.0 and 4.0 manuals, I guess
this is a Borland "feature" used to save paper and thus money on the
cost of the manual. 

One of the things missing from both compilers (and from most PC
C-compilers for that matter) is profiling, ie. the ability to get an
overview of where your program spends most of its time when executing.
As they can already do stack-overflow checking upon function entry,
this should not be hard to add.

Naturally, this test has not been as extensive as the one performed by
the DDJ editors; their annual C issue will certainly contain an updated
overview of the C compiler battlefield.

[Well, DDJ ain't what it used to be; their last C compiler test was
rather bleak when compared to the August 86 one. They left out the
extensive tables that made the '86 review stand out. Refer to the
comp.misc for the discussion on the death of DDJ....]


-- 
Eelco van Asperen.		
uucp:        evas@eurtrx / mcvax!eurtrx!evas	#include <inews/filler.h>
earn/bitnet: asperen@hroeur5			#include <stdjunk.h>
"We'ld like to know a little bit about you for our files" - Mrs.Robinson,	 Simon & Garfunkel

suitti@haddock.ima.isc.com (Steve Uitti) (10/10/88)

In article <788@euraiv1.UUCP>  evas@euraiv1.UUCP (Eelco van Asperen) writes:
> [Here's a comparison of MSC and TurboC as my contribution to the 
> "Microsoft vs. Borland" discussion.
> ...
> CONCLUSIONS
> Based on the data presented here and my experiences with both products,
> Microsoft C wins the battle; it generates by far the best code. Turbo
> C's one-pass compiler has shorter compile times and creates smaller 
> executables but the code produced is inferior to MS-C's. 

My data for sieve matches yours pretty well (I haven't played
with the other benchmarks).  My experience differs.  My
conclusions differ.  For nontrivial programs, MSC beats TC
(both with optimization).  However, for anything real
(largish), I've found that I can't use /Ox with MSC.  Something
breaks (who knows what).  Thus, I feel I have to compare MSC
with only /Gs (remove stack probes) with TC (whose optimizer
has never failed me).  When you do this, you find that TC wins
most of the time (but not all the time).

MSC (5.0, 5.1) would be a much better compiler if it (the
optimization part) worked.

I also test a version of "sieve" that has been hand crafted
with all sorts of pointer stuff.  Theory has it that a good
compiler will produce real good code with this version.  What
happens is that compilers with big optimizers (MSC 5.1 included)
actually do worse with the optimizer than without (though still
better than the non-pointer version).

> Furthermore, when it comes to writing a reference manual for a
> language the boys (and girls) at Borland could learn something
> from the Unix-community; start each reference on a separate
> page !  In its current form, the TC reference manual is a real
> pain to use.  I guess this is a Borland "feature" used to save
> paper and thus money on the cost of the manual. 

It also saves mass.  It is such a pain to read the MSC manauls
while riding the train to work...

TC's maual has an index.  It rarely fails me (***everything***
eventually fails me.  This makes me an awesome Beta tester.)
I find it easier to look something up in the TC reference
manual than to find the index (or which book to use) for MSC.

The TC User's Guide is quite well written.  It told me in
minutes how to get going.  By contrast, I'm still bewildered by
the options required by MSC.  I have to look them up for nearly
every project.

Much of this is reflected by the fact that the Turbo
Environment is simply thousands of times more friendly.  It
gets you working sooner.  MSC is infinitely more frustrating.
You have to use the manuals all the time.  You have to wait for
the beast to compile.  The compile breaks (sometimes even
without the optimizer turned on!).  What did I do to deserve
this?  (I paid good money, and lots of it.).

> One of the things missing from both compilers (and from most PC
> C-compilers for that matter) is profiling.

Good point.  Think C for the Mac has had profiling for
awhile now.  I actually use it now & again.

	Stephen.

evas@euraiv1.UUCP (Eelco van Asperen) (10/12/88)

In article <9088@haddock.ima.isc.com>, suitti@haddock.ima.isc.com (Steve Uitti) writes:
> However, for anything real
> (largish), I've found that I can't use /Ox with MSC.  Something
> breaks (who knows what).  

Oops; you're quite right on that one. -Ox implies -Oa, ie. "no-aliasing"
and that's dangerous; most code won't be compiled correctly.
Apparently, most benchmarks can be compiled with -Ox and run correctly;
do I hear somebody grumbling about benchmarks and their value ;-) ?


-- 
Eelco van Asperen.		
uucp:        evas@eurtrx / mcvax!eurtrx!evas	#include <inews/filler.h>
earn/bitnet: asperen@hroeur5			#include <stdjunk.h>
"We'ld like to know a little bit about you for our files" - Mrs.Robinson,	 Simon & Garfunkel

loafman@concave.uucp (Kenneth W. Loafman) (10/14/88)

In article <789@euraiv1.UUCP> evas@euraiv1.UUCP (Eelco van Asperen) writes:
>
>Oops; you're quite right on that one. -Ox implies -Oa, ie. "no-aliasing"
>and that's dangerous; most code won't be compiled correctly.
>Apparently, most benchmarks can be compiled with -Ox and run correctly;
>do I hear somebody grumbling about benchmarks and their value ;-) ?
>
Actually /Oa is broken in ways that have absolutely nothing to do with
aliasing.  I have one case where MSC 5.1 trashed a counter variable in a
double nested loop.  It was declared as automatic int.  Problem went away
when I turned off the /Oa option.  There was no possibility of aliasing in
that particular piece of code.

-----
Kenneth W. Loafman @ Convex Computer Corp, Dallas, Texas
UUCP:  {allegra,uiucdcs,ctvax}!convex!loafman
Disclaimer:  Well!  I never...
-----
Kenneth W. Loafman @ Convex Computer Corp, Dallas, Texas
UUCP:  {allegra,uiucdcs,ctvax}!convex!loafman
Disclaimer:  Well!  I never...

jbvb@ftp.COM (James Van Bokkelen) (10/14/88)

Microsoft C 5.1 will put variables with the ANSI "volatile" characteristic
into registers and spin forever testing them (while the interrupt handler
set the in-memory copy of the flag).  I have been bitten by this in a number
of cases.  So far, I haven't been bitten by /Oa, but I won't rule it out
as a possibility...

jbvb