allan@noao.UUCP (06/22/84)
Prompted by all the discussion on fortran vs. C and VMS vs. unix, I have run some comparison programs and timed the results. I have run a 100 x 100 matrix multiplication program in fortran (single precision and double precision) and C on VMS, 4.1 and 4.2 unix. The cpu run times in seconds are shown below. I will post the program listings separately for those who wish to pull them apart. fortran(sp) fortran(dp) C vax1 no opt 69 112 102 vax1 opt 52 98 99 vax2 no opt 40 51 63 vax2 opt 40 49 64 vax3 no opt 32 43 48 vax3 opt 22 35 31 Key: vax1 = 4.2BSD Unix#8 vax2 = Berkeley VAX/UNIX 4.1 + floating point accelerator vax3 = VMS 3.6 + floating point accelerator opt = optimization turned on As far as VMS vs. unix is concerned, VMS is faster than 4.1 unix by a factor of 1.5 on average. 4.2 unix is slower than 4.1 simply because that machine does not have a floating point accelerator. Comparing the languages, single precision fortran is faster than C by a factor of 1.6 on average, and there is barely any difference between double precision fortran and C on the average. The single comparison that is usually made is fortran on VMS vs. C on unix. In this case fortran is faster by a factor of 3 using 4.1 unix. You can argue anyway that you like from these numbers. Single precision fortran is obviously fastest, but it is fairer to compare double precision fortran and C since C does all floating point operations in double precision, but fortran will let you use single precision when you want to where as C will not, but, but, but, .......... ....... .... .. . . . I am not prepared to defend fortran on the grounds of the 'niceness' of the language. For some applications, fortran stinks. However, one thing does seem clear. IF the bottom line is speed of execution THEN you should use fortran on VMS All the above has been concerned with vaxes. If you really need the greatest speed, then you must go out and buy a Cray or a CDC205. If you want to pull all of this apart by arguing that you should not generalize from a single experiment then I suggest that you run some experiments of your own. Peter Allan Kitt Peak National Observatory Tucson, Az UUCP: {akgua,allegra,arizona,decvax,hao,ihnp4,lbl-csam,seismo}!noao!allan ARPA: noao!allan@lbl-csam.arpa
allan@noao.UUCP (06/22/84)
Here are the matrix multiplication programs that I timed. Peter Allan Kitt Peak National Observatory Tucson, Az UUCP: {akgua,allegra,arizona,decvax,hao,ihnp4,lbl-csam,seismo}!noao!allan ARPA: noao!allan@lbl-csam.arpa ------------------------------------------------------------------------- /* Multiply two matrices */ #define N 100 main() { float a[N][N],b[N][N],c[N][N],sum; int i,j,k; /* Ininialise a and b */ for (i=0 ; i<N ; i++) for (j=0 ; j<N ; j++) { a[i][j] = i+j; b[i][j] = i-j; } /* Multiply a by b to give c */ for (i=0 ; i<N ; i++) for (j=0 ; j<N ; j++) { sum = 0.0; for (k=0 ; k<N ; k++) sum += a[i][k]*b[k][j]; c[i][j] = sum; } } ------------------------------------------------------------------------- program testf c c Multiply two matrices. c parameter ( N = 100 ) c real a(N,N),b(N,N),c(N,N) c c Initialise a and b. c do 1 i=1,N do 2 j=1,N a(i,j)= i+j b(i,j)= i-j 2 continue 1 continue c c Multiply a by b to give c. c do 3 i=1,N do 4 j=1,N sum= 0. do 5 k=1,N sum= sum + a(i,k)*b(k,j) 5 continue c(i,j)= sum 4 continue 3 continue c end
gwyn@brl-vgr.ARPA (Doug Gwyn ) (06/27/84)
I do not understand these (Fortran & C on UNIX & VMS) benchmark results. I took the posted C source code and got execution time not much worse than the reported VMS Fortran (single-precision) time. More interestingly, the C source code was written the way a Fortran programmer would write it; doing the obvious things that an experienced C programmer would have done in the first place doubled the speed of execution, placing the UNIX C version well ahead of even the reported VMS Fortran times: Using BRL UNIX System V emulation on 4.2BSD, VAX-11/780 with FPA: C code as posted 38.1 sec user time, 1.4 sec system time proper C code 18.9 sec user time, 0.9 sec system time I think the moral is: "Do not believe benchmarks unless you control them." A secondary moral is that professionally-written C code can be as fast as the machine will allow (give or take a few percent improvement possible by tweaking assembly language), so that the language performance argument made in favor of Fortran is bogus. I would love to see a Fortran programmer take some of my favorite C code using linked data structures and make it work at all using Fortran, let alone work well. (I used to do this sort of thing before C became available, and it is really hard to do right in portable Fortran.)
allan@noao.UUCP (06/27/84)
A little extra information on the timing of my matrix multiplication program. Following a suggestion (I'm afraid I forget from whom), I changed the declaration of the loop counters to be register int's. This speeded things up from 64 seconds to 53 seconds; a fair gain. I was a bit surprised and very pleased to see how much improvement the declaration register float *pa, *pb, *pc; and the use of pointer incrementing does for the timing. Of course, like most things, it is obvious with hindsight. Much of the time of the original program is spent in calculating the addresses of the array elements. I cannot think of a way of forcing fortran to do this incrementing in the 'obviously sensible' way, although as you point out, a good optimiser should do it for you. I conceed. C is faster than fortran when you write your programs correctly. Peter Allan Kitt Peak National Observatory Tucson, Az UUCP: {akgua,allegra,arizona,decvax,hao,ihnp4,lbl-csam,seismo}!noao!allan ARPA: noao!allan@lbl-csam.arpa
allan@noao.UUCP (06/27/84)
I agree about doing your own benchmarks. I only got started in this because of a request for timing information over the news net. Peter Allan Kitt Peak National Observatory Tucson, Az
mwm@ea.UUCP (06/30/84)
#R:noao:-36000:ea:9900004:000:617 ea!mwm Jun 30 13:55:00 1984 /***** ea:net.physics / noao!allan / 5:17 am Jun 28, 1984 */ I conceed. C is faster than fortran when you write your programs correctly. Peter Allan UUCP: {akgua,allegra,arizona,decvax,hao,ihnp4,lbl-csam,seismo}!noao!allan /* ---------- */ Peter - I feel that you ought to point out that this is true *only if you are using double precision.* If you don't need double precision, then the single precision f77 code is faster than your hand-optimized C. This also leaves open the question of what C does to a carefully tuned (tuned for accuracy, that is) algorithm if/when it rearranges your expressions. <mike
west@sdcsla.UUCP (07/03/84)
To make things a little fairer, I rewrote the C matrix multiplication program to take advantage of C's pointers. Of course, this is something a good ``optimizer'' would do for you, but it isn't that hard to do by hand. Peter Allan's original program: /* Multiply two matrices */ #define N 100 main() { float a[N][N],b[N][N],c[N][N],sum; int i,j,k; /* Ininialise a and b */ for (i=0 ; i<N ; i++) for (j=0 ; j<N ; j++) { a[i][j] = i+j; b[i][j] = i-j; } /* Multiply a by b to give c */ for (i=0 ; i<N ; i++) for (j=0 ; j<N ; j++) { sum = 0.0; for (k=0 ; k<N ; k++) sum += a[i][k]*b[k][j]; c[i][j] = sum; } } My quick and easy improvements: /* Multiply two matrices */ #define N 100 main() { float a[N][N],b[N][N],c[N][N]; register float sum; register float *pa, *pb, *pc; register int i,j,k; /* Initialize a and b */ for (i=0 ; i<N ; i++) { pa = a[i]; /* actually could be done just once */ pb = b[i]; for (j=0 ; j<N ; j++, pa++, pb++) { *pa = i+j; *pb = i-j; } } /* Multiply a by b to give c */ for (i=0 ; i<N ; i++) { pa = a[i]; pb = b[i]; for (j=0 ; j<N ; j++, pa++, pb++) { sum = 0.0; for (k=0 ; k<N ; k++) sum += (*pa)*(*pb); c[i][j] = sum; } } } --------------- So the differences are simply judicious use of "register" declarations and using pointers and pointer incrementation to step through an array. The initializations of "pa" and "pb" could even be moved out of the outer loops, but for very little gain. Timings taken on a fairly uncrowded VAX 750 (no floating point extras), for ``optimized'' compilation: time mm1 -- original C program, "optimized". 67.8u 11.6s 2:17 57% 1+202k 0+0io 3pf+0w 67.3u 6.2s 2:31 48% 1+201k 2+1io 1pf+0w time mm1a -- improved C program, "optimized". 21.9u 2.5s 0:40 60% 1+198k 0+0io 2pf+0w 21.9u 2.2s 0:40 60% 1+200k 0+0io 2pf+0w time mm2 -- original Fortran program, "optimized". 28.1u 3.5s 0:51 61% 11+243k 1+1io 11pf+0w 28.6u 3.0s 1:09 45% 11+244k 0+0io 5pf+0w Note that now C at least beats f77 on its (C's) home turf. -- Larry West, UC San Diego, Institute for Cognitive Science -- decvax!ittvax!dcdwest!sdcsvax!sdcsla!west -- ucbvax!sdcsvax!sdcsla!west -- west@NPRDC
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/11/84)
C explicitly does NOT rearrange most arithmetic operations (exceptions are supposedly commutative operators).