evas@euraiv1.UUCP (Eelco van Asperen) (10/06/88)
[Here's a comparison of MSC and TurboC as my contribution to the "Microsoft vs. Borland" discussion. I wrote this a couple of months ago and posted it but that failed due to administrative reasons. Note that I don't have Turbo C v2.0 yet; if anybody wants the source of the benchmarks to run them for Turbo C v2.0, I'll be happy to send them. To make a fair comparison, you should run them on the same type of machine; I also have access to an Olivetti M24 (aka. AT&T 6300) and an Olivetti M280. -EvAs.] Benchmarking Borland's Turbo C v1.5 and Microsoft's C v5.1 Compilers INTRO To get some clarity in the continuing debate concerning the Microsoft and Borland C compilers, I've benchmarked them according to some of the benchmarks used in the article "Benchmarking C Compilers" which appeared in Dr.Dobb's Journal (DDJ), August 1986. (Philip Freidin, one of the authors, was kind enough to send them to me. Thanks, Phil!) The compilers compared are Microsoft C v5.1 (MS-C) and Borland Turbo C v1.5 (TC). All tests were done on an AT-clone, running at 12 Mhz with 0 wait-states under MS-DOS v3.3; to eliminate the speed of the hard disk from the results, I ran the programs on a ramdisk. The programs were compiled with all optimizations enabled; for MS-C, the flags are '-Ox -Gs' and for TC they are '-G -O -Z -r'. The tests were compiled and run for each memory model available on both compilers; TC's Tiny-model has not been included because MS-C hasn't got a comparable model (at least the compiler does not generate special code for it; I haven't checked if programs compiled with the small-model can be converted to com-files after linking). Each test was repeated a number of times to increase accuracy; the loop-count for each test is in the table. THE BENCHMARKS A brief description of the benchmarks used; ARRAY tests the compiler's ability to efficiently access arrays using conventional array operations. A 10x10x10 int-array is copied using three nested for-loops. ATOX tests the atoi, atol and atof functions; it has 21 atoi calls, 16 atol-calls, and 8 atof-calls. Each call passes a string constant, some of which have many leading blanks or zeros. CPYBLK copies a file of 10,000 bytes using fread and fwrite in 1024 bytes blocks. CPYCHR copies the same file but this time using fgetc and fputc; a comparison of the times for CPYBLK and CPYCHR should tell you more about the difference between block and character I/O. DISKIO does random seeks in a file of 240 Kb and thus measures the speed of fseek. FIBTEST is the standard recursive Fibonacci number generator. We call it for 24. This mainly tests function entry and exit code. FILLSCR writes 1,248 characters to the screen, consisting of sequences of 78 a's followed by a carriage return. This measures the speed of screen output in the absence of scrolling. (The test is done just after a CLS.) The FUNCOVR programs test function call overhead; they consist of procedures with zero, one, two and three arguments respectively and no body. DFUNCRET tests the ability to return function-values efficiently; the function returns a double. LOOPTST does a simple for-loop test. MEMORY was created to test the speed of malloc/free; per loop, 500 blocks of 50 bytes are malloc'ed. Then every fifth one is free'ed and 100 blocks of 35 bytes are malloc'ed, followed by a free of all allocated blocks. The MIN programs are used to determine the minimum size for a program; MINMAIN no code; this measures startup + exit code MINPRTF printf's in main MINPUTS uses puts rather than printf MINFIO calls to fopen, fgetc, fputc, fread, fwrite and fclose OPTIMIZE should test the compiler's ability to optimize code; as the authors of the DDJ-article note, this is one of the weakest benchmarks because even a relatively simple optimizer could reduce it to nothing. With the arrival of more and more optimizing compilers, this will become one of the hardest things to test. POINTER is a pointer-version of the ARRAY-test; it uses 6 pointers and three levels of indirection to copy the 10x10x10 array. PRTF is meant to determine the speed of printf; the results should be compared to the result of SCROLL. (They print the same line.) RSIEVE and SIEVE are versions of the infamous sieve benchmark program; RSIEVE uses register-variables whereas SIEVE does not. SCROLL is similar to FILLSCR but instead of the carriage-return, a newline is printed. STORAGE is used to determine the difference between the various storage classes in C; four variables are declared automatic, register and static. To see if the compiler will allocate more than two registers for variables, the register-test is also done with just two of the four variables declared as register. STRINGS assesses the quality of the library-routines strcat, strcpy, strncpy, strlen, strcmp, and strncmp. TDOUBLE and DFLOAT test floating-point performance; in each loop, 40 adds, subtractions and multiplies and 20 divides are done. A compiler that conforms the ANSI C standard (yeah, I know; this should read 'conforms to a draft version of' etc), should be faster than a compiler that conforms to K&R in the DFLOAT test because it doesn't have to convert floats to doubles before each operation. TINT and TLONG attempt to measure the performace of integer and long operations, respectively. For each loop, 1,500 adds, 1,600 subtracts, 200 multiplies and 200 divides are done. TRIG times the speed of the trigonometric functions sin, cos and tan. For each loop, these functions are called 12 times. And now for the real stuff; here are the... EXECUTION TIMES Model: Small Compact Medium Large Huge Test Loops TC MSC TC MSC TC MSC TC MSC TC MSC --------------+------------+------------+------------+------------+---------- array 1500| 24.9 2.4 | 25.5 2.4 | 25.0 2.4 | 25.5 2.4 | 25.5 2.4 atox 100| 1.1 1.7 | 1.2 1.7 | 1.2 1.7 | 1.2 1.7 | 1.2 1.7 cpyblk 15| 7.8 2.3 | 8.8 3.0 | 7.9 2.3 | 9.1 3.1 | 9.2 3.2 cpychr 15| 9.5 6.4 | 10.3 6.9 | 9.8 6.6 | 11.0 7.3 | 11.4 7.3 diskio 350| 15.7 15.6 | 15.7 15.6 | 15.7 15.6 | 15.7 15.7 | 15.8 15.6 fibtest 18| 14.1 13.4 | 14.4 13.4 | 15.3 14.5 | 15.4 14.5 | 17.9 14.5 fillscr 12| 9.0 3.2 | 9.0 8.9 | 9.0 3.3 | 9.0 8.9 | 9.0 8.8 funcov0 10000| 16.3 15.1 | 17.3 15.1 | 22.2 16.8 | 22.9 16.8 | 34.3 16.8 funcov1 10000| 22.7 22.6 | 22.9 22.6 | 28.0 26.0 | 28.3 26.0 | 35.6 26.0 funcov2 10000| 24.9 24.3 | 23.8 24.3 | 29.6 27.2 | 28.7 27.2 | 37.1 27.2 funcov3 10000| 29.7 28.3 | 30.6 28.2 | 34.8 31.8 | 35.6 31.7 | 43.0 31.8 ifuncret 2500| 12.0 11.7 | 11.8 11.7 | 13.1 13.4 | 13.7 13.4 | 17.4 13.4 lfuncret 2500| 16.7 15.1 | 16.2 15.1 | 18.9 17.6 | 19.1 17.6 | 22.3 17.6 dfuncret 250| 37.8 28.2 | 37.6 29.7 | 37.8 28.4 | 38.1 29.9 | 38.2 29.9 looptst 500| 7.6 0.0 | 6.9 0.0 | 7.6 0.0 | 6.9 0.0 | 6.9 0.0 memory 500| 30.8 11.7 |196.5 14.5 | 31.4 12.3 |198.8 15.3 |206.3 17.5 optimize 100| 4.0 0.5 | 4.1 0.6 | 4.1 0.5 | 4.0 0.6 | 4.0 0.6 pointer 1500| 6.8 5.2 | 12.5 2.5 | 6.7 5.2 | 12.4 2.5 | 12.5 20.8 prtf 12| 12.6 7.0 | 12.6 7.1 | 12.6 7.0 | 12.6 7.1 | 12.6 7.1 rsieve 140| 13.9 11.8 | 13.7 11.8 | 14.0 11.8 | 13.6 11.8 | 13.6 11.8 scroll 12| 12.4 6.5 | 12.4 12.3 | 12.4 6.5 | 12.4 12.3 | 12.3 12.3 sieve 140| 14.0 12.7 | 13.6 12.7 | 14.0 12.7 | 13.7 12.7 | 13.7 12.7 storage: autotst 150| 12.8 0.0 | 12.8 0.0 | 12.8 0.0 | 12.8 0.0 | 12.8 0.0 stattst 150| 15.2 0.0 | 16.1 0.0 | 15.2 0.0 | 16.1 0.0 | 15.2 0.0 regtest 150| 12.8 0.0 | 12.8 0.0 | 12.8 0.0 | 12.8 0.0 | 12.8 0.0 reg2test 150| 12.8 0.0 | 12.8 0.0 | 12.8 0.0 | 12.8 0.0 | 12.8 0.0 strings 1000| 2.0 1.7 | 2.0 1.7 | 2.0 1.7 | 2.0 1.7 | 2.0 1.7 switch1 1000| 0.6 1.8 | 0.6 1.8 | 0.6 1.8 | 0.6 1.8 | 0.6 1.8 switch2 1000| 0.6 0.7 | 0.6 0.7 | 0.6 0.7 | 0.6 0.7 | 0.7 0.7 switch3 1000| 0.6 0.7 | 0.7 0.7 | 0.6 0.7 | 0.7 0.7 | 0.7 0.7 tdouble 500| 21.0 10.3 | 21.0 10.3 | 21.0 10.3 | 21.0 10.3 | 21.0 10.3 tfloat 500| 22.6 10.1 | 22.6 10.1 | 22.5 10.1 | 22.6 10.1 | 22.6 10.1 tint 1500| 5.7 2.0 | 5.7 2.0 | 5.7 2.0 | 5.7 2.0 | 5.7 2.0 tlong 1000| 34.0 2.7 | 34.3 2.7 | 34.1 2.7 | 34.3 2.7 | 34.3 2.7 trig 100| 6.4 0.0 | 6.4 0.0 | 6.4 0.0 | 6.4 0.0 | 6.4 0.0 --------------+------------+------------+------------+------------+---------- all times are in seconds. CODE SIZE Small Compact Medium Large Huge TC MS-C TC MS-C TC MS-C TC MS-C TC MS-C -------- ---- ---- ---- ----- ---- ----- ---- ----- ---- ----- minfio 7560 9319 9700 12049 7806 9523 10474 12253 11978 12301 minmain 2402 4399 2942 4567 2472 4469 3012 4637 3382 4637 minprtf 6214 9081 7762 11315 6356 9263 7904 11497 9025 11497 minputs 4572 7233 6072 9691 4706 7373 6206 9847 7296 9847 (programs where compiled with all optimization-flags on.) In addition to these tests, I ran the dhrystone-program (compiled with the Small memory model); TC 2590 dhrystones/second MS-C 3401 dhrystones/second The results clearly show that the Microsoft compiler produces superior code when compared to Borland's. In a number of cases the MS-C code outperformed TC's by a factor of ten, for example with the TLONG and ARRAY tests. The good optimization in MS-C also provides some problems; since benchmarks are artificial and try to measure the efficiency of a certain type of operation, they are extremely prone to being optimized away, ie. reduced to no code at all. This is shown best by the LOOPTST, STORAGE and TRIG programs. We definitely need a new class of benchmarks for future tests. The only areas in which TC has the lead are switch-statements (and only marginally so) and the ATOX benchmark. The result of the MEMORY test are kind of dramatic for TC; these functions get very slow when using a large data-model, while MS-C performs more or less the same for all models. COMPILATION SPEED The price one usually pays for better optimization is longer compile-times; to check this, I timed the compilation and linking of the test suite for the Small memory-model. For TC, the Turbo Linker TLINK was used; as this is a limited yet fast linker, I reran the test for TC with the standard linker, the one that was also used for MS-C, MS LINK v5.01.04. Before running each test, I ran a disk-compression utility to make sure that file fragmentation would not distort the timings. In the DDJ-review, they used a different method to measure compilation speed. Since I don't have the files they used for this test, this will have to do. Compile and Link Times; Optimization Enabled Disabled ------- -------- TC with TLINK : 284.9 284.3 TC with LINK : 331.5 331.0 MS-C with LINK: 681.7 642.8 The compile time for the following program int alfa; should give us some idea of the amount of overhead associated with calling the compiler. Compile Load ------- ---- TC: 2.8 2.0 MS-C: 9.7 1.4 'Compile' is the total time required to compile this mini-program and 'Load' is the time needed to load the compiler. (All times are given in seconds.) CONCLUSIONS Based on the data presented here and my experiences with both products, Microsoft C wins the battle; it generates by far the best code. Turbo C's one-pass compiler has shorter compile times and creates smaller executables but the code produced is inferior to MS-C's. Furthermore, when it comes to writing a reference manual for a language the boys (and girls) at Borland could learn something from the Unix-community; start each reference on a separate page ! In its current form, the TC reference manual is a real pain to use. As they use the same style in the Turbo Pascal 3.0 and 4.0 manuals, I guess this is a Borland "feature" used to save paper and thus money on the cost of the manual. One of the things missing from both compilers (and from most PC C-compilers for that matter) is profiling, ie. the ability to get an overview of where your program spends most of its time when executing. As they can already do stack-overflow checking upon function entry, this should not be hard to add. Naturally, this test has not been as extensive as the one performed by the DDJ editors; their annual C issue will certainly contain an updated overview of the C compiler battlefield. [Well, DDJ ain't what it used to be; their last C compiler test was rather bleak when compared to the August 86 one. They left out the extensive tables that made the '86 review stand out. Refer to the comp.misc for the discussion on the death of DDJ....] -- Eelco van Asperen. uucp: evas@eurtrx / mcvax!eurtrx!evas #include <inews/filler.h> earn/bitnet: asperen@hroeur5 #include <stdjunk.h> "We'ld like to know a little bit about you for our files" - Mrs.Robinson, Simon & Garfunkel
suitti@haddock.ima.isc.com (Steve Uitti) (10/10/88)
In article <788@euraiv1.UUCP> evas@euraiv1.UUCP (Eelco van Asperen) writes: > [Here's a comparison of MSC and TurboC as my contribution to the > "Microsoft vs. Borland" discussion. > ... > CONCLUSIONS > Based on the data presented here and my experiences with both products, > Microsoft C wins the battle; it generates by far the best code. Turbo > C's one-pass compiler has shorter compile times and creates smaller > executables but the code produced is inferior to MS-C's. My data for sieve matches yours pretty well (I haven't played with the other benchmarks). My experience differs. My conclusions differ. For nontrivial programs, MSC beats TC (both with optimization). However, for anything real (largish), I've found that I can't use /Ox with MSC. Something breaks (who knows what). Thus, I feel I have to compare MSC with only /Gs (remove stack probes) with TC (whose optimizer has never failed me). When you do this, you find that TC wins most of the time (but not all the time). MSC (5.0, 5.1) would be a much better compiler if it (the optimization part) worked. I also test a version of "sieve" that has been hand crafted with all sorts of pointer stuff. Theory has it that a good compiler will produce real good code with this version. What happens is that compilers with big optimizers (MSC 5.1 included) actually do worse with the optimizer than without (though still better than the non-pointer version). > Furthermore, when it comes to writing a reference manual for a > language the boys (and girls) at Borland could learn something > from the Unix-community; start each reference on a separate > page ! In its current form, the TC reference manual is a real > pain to use. I guess this is a Borland "feature" used to save > paper and thus money on the cost of the manual. It also saves mass. It is such a pain to read the MSC manauls while riding the train to work... TC's maual has an index. It rarely fails me (***everything*** eventually fails me. This makes me an awesome Beta tester.) I find it easier to look something up in the TC reference manual than to find the index (or which book to use) for MSC. The TC User's Guide is quite well written. It told me in minutes how to get going. By contrast, I'm still bewildered by the options required by MSC. I have to look them up for nearly every project. Much of this is reflected by the fact that the Turbo Environment is simply thousands of times more friendly. It gets you working sooner. MSC is infinitely more frustrating. You have to use the manuals all the time. You have to wait for the beast to compile. The compile breaks (sometimes even without the optimizer turned on!). What did I do to deserve this? (I paid good money, and lots of it.). > One of the things missing from both compilers (and from most PC > C-compilers for that matter) is profiling. Good point. Think C for the Mac has had profiling for awhile now. I actually use it now & again. Stephen.
evas@euraiv1.UUCP (Eelco van Asperen) (10/12/88)
In article <9088@haddock.ima.isc.com>, suitti@haddock.ima.isc.com (Steve Uitti) writes: > However, for anything real > (largish), I've found that I can't use /Ox with MSC. Something > breaks (who knows what). Oops; you're quite right on that one. -Ox implies -Oa, ie. "no-aliasing" and that's dangerous; most code won't be compiled correctly. Apparently, most benchmarks can be compiled with -Ox and run correctly; do I hear somebody grumbling about benchmarks and their value ;-) ? -- Eelco van Asperen. uucp: evas@eurtrx / mcvax!eurtrx!evas #include <inews/filler.h> earn/bitnet: asperen@hroeur5 #include <stdjunk.h> "We'ld like to know a little bit about you for our files" - Mrs.Robinson, Simon & Garfunkel
loafman@concave.uucp (Kenneth W. Loafman) (10/14/88)
In article <789@euraiv1.UUCP> evas@euraiv1.UUCP (Eelco van Asperen) writes: > >Oops; you're quite right on that one. -Ox implies -Oa, ie. "no-aliasing" >and that's dangerous; most code won't be compiled correctly. >Apparently, most benchmarks can be compiled with -Ox and run correctly; >do I hear somebody grumbling about benchmarks and their value ;-) ? > Actually /Oa is broken in ways that have absolutely nothing to do with aliasing. I have one case where MSC 5.1 trashed a counter variable in a double nested loop. It was declared as automatic int. Problem went away when I turned off the /Oa option. There was no possibility of aliasing in that particular piece of code. ----- Kenneth W. Loafman @ Convex Computer Corp, Dallas, Texas UUCP: {allegra,uiucdcs,ctvax}!convex!loafman Disclaimer: Well! I never... ----- Kenneth W. Loafman @ Convex Computer Corp, Dallas, Texas UUCP: {allegra,uiucdcs,ctvax}!convex!loafman Disclaimer: Well! I never...
jbvb@ftp.COM (James Van Bokkelen) (10/14/88)
Microsoft C 5.1 will put variables with the ANSI "volatile" characteristic into registers and spin forever testing them (while the interrupt handler set the in-memory copy of the flag). I have been bitten by this in a number of cases. So far, I haven't been bitten by /Oa, but I won't rule it out as a possibility... jbvb