[net.physics] VMS vs. Unix

allan@noao.UUCP (06/22/84)

Prompted by all the discussion on fortran vs. C and VMS vs. unix, I have run
some comparison programs and timed the results. I have run a 100 x 100 matrix
multiplication program in fortran (single precision and double precision)
and C on VMS, 4.1 and 4.2 unix. The cpu run times in seconds are shown below.
I will post the program listings separately for those who wish to pull them 
apart.


                fortran(sp)   fortran(dp)      C
vax1 no opt          69          112          102
vax1    opt          52           98           99

vax2 no opt          40           51           63
vax2    opt          40           49           64

vax3 no opt          32           43           48
vax3    opt          22           35           31


Key:
vax1 = 4.2BSD Unix#8
vax2 = Berkeley VAX/UNIX 4.1 + floating point accelerator
vax3 = VMS 3.6 + floating point accelerator

opt = optimization turned on


As far as VMS vs. unix is concerned, VMS is faster than 4.1 unix by a factor
of 1.5 on average. 4.2 unix is slower than 4.1 simply because that machine
does not have a floating point accelerator. Comparing the languages, single
precision fortran is faster than C by a factor of 1.6 on average, and there
is barely any difference between double precision fortran and C on the average.

The single comparison that is usually made is fortran on VMS vs. C on unix.
In this case fortran is faster by a factor of 3 using 4.1 unix.

You can argue anyway that you like from these numbers. Single precision fortran
is obviously fastest, but it is fairer to compare double precision fortran and
C since C does all floating point operations in double precision, but fortran
will let you use single precision when you want to where as C will not,
but,
    but,
        but,
            ..........
                       .......
                              ....
                                  ..
                                    .
                                    .                                     
                                    .

I am not prepared to defend fortran on the grounds of the 'niceness' of 
the language. For some applications, fortran stinks.
However, one thing does seem clear.

IF    the bottom line is speed of execution
THEN  you should use fortran on VMS

All the above has been concerned with vaxes. If you really need the greatest
speed, then you must go out and buy a Cray or a CDC205.
If you want to pull all of this apart by arguing that you should not generalize
from a single experiment then I suggest that you run some experiments of your
own.



Peter Allan
Kitt Peak National Observatory
Tucson, Az
UUCP:	{akgua,allegra,arizona,decvax,hao,ihnp4,lbl-csam,seismo}!noao!allan
ARPA:	noao!allan@lbl-csam.arpa

allan@noao.UUCP (06/22/84)

Here are the matrix multiplication programs that I timed.

Peter Allan
Kitt Peak National Observatory
Tucson, Az
UUCP:	{akgua,allegra,arizona,decvax,hao,ihnp4,lbl-csam,seismo}!noao!allan
ARPA:	noao!allan@lbl-csam.arpa
 
-------------------------------------------------------------------------

/*  Multiply two matrices  */

#define N 100

main()
{
   float a[N][N],b[N][N],c[N][N],sum;
   int i,j,k;

   /*  Ininialise a and b  */

   for (i=0 ; i<N ; i++)
      for (j=0 ; j<N ; j++)
      {
         a[i][j] = i+j;
         b[i][j] = i-j;
      }

/*  Multiply a by b to give c  */

   for (i=0 ; i<N ; i++)
      for (j=0 ; j<N ; j++)
      {
         sum = 0.0;
         for (k=0 ; k<N ; k++)
            sum += a[i][k]*b[k][j];
         c[i][j] = sum;
      }
}

-------------------------------------------------------------------------

      program testf
c
c      Multiply two matrices.
c
      parameter ( N = 100 )
c
      real a(N,N),b(N,N),c(N,N)
c
c      Initialise a and b.
c
      do 1 i=1,N
        do 2 j=1,N
          a(i,j)= i+j
          b(i,j)= i-j
    2   continue
    1 continue
c
c      Multiply a by b to give c.
c
      do 3 i=1,N
        do 4 j=1,N
          sum= 0.
          do 5 k=1,N
            sum= sum + a(i,k)*b(k,j)
    5     continue
          c(i,j)= sum
    4   continue
    3 continue
c
      end

gwyn@brl-vgr.ARPA (Doug Gwyn ) (06/27/84)

I do not understand these (Fortran & C on UNIX & VMS) benchmark results.
I took the posted C source code and got execution time not much worse
than the reported VMS Fortran (single-precision) time.  More interestingly,
the C source code was written the way a Fortran programmer would write it;
doing the obvious things that an experienced C programmer would have done
in the first place doubled the speed of execution, placing the UNIX C
version well ahead of even the reported VMS Fortran times:

	Using BRL UNIX System V emulation on 4.2BSD, VAX-11/780
	with FPA:

	C code as posted	38.1 sec user time, 1.4 sec system time

	proper C code		18.9 sec user time, 0.9 sec system time

I think the moral is:  "Do not believe benchmarks unless you control them."
A secondary moral is that professionally-written C code can be as fast as
the machine will allow (give or take a few percent improvement possible by
tweaking assembly language), so that the language performance argument made
in favor of Fortran is bogus.  I would love to see a Fortran programmer
take some of my favorite C code using linked data structures and make it
work at all using Fortran, let alone work well.  (I used to do this sort
of thing before C became available, and it is really hard to do right in
portable Fortran.)

allan@noao.UUCP (06/27/84)

A little extra information on the timing of my matrix multiplication program.
Following a suggestion (I'm afraid I forget from whom), I changed the 
declaration of the loop counters to be register int's. This speeded things
up from 64 seconds to 53 seconds; a fair gain. I was a bit surprised and 
very pleased to see how much improvement the declaration

register float *pa, *pb, *pc;
 
and the use of pointer incrementing does for the timing. Of course, like most
things, it is obvious with hindsight. Much of the time of the original 
program is spent in calculating the addresses of the array elements.
I cannot think of a way of forcing fortran to do this incrementing in the
'obviously sensible' way, although as you point out, a good optimiser should
do it for you.

I conceed. C is faster than fortran when you write your programs correctly.


Peter Allan
Kitt Peak National Observatory
Tucson, Az
UUCP:	{akgua,allegra,arizona,decvax,hao,ihnp4,lbl-csam,seismo}!noao!allan
ARPA:	noao!allan@lbl-csam.arpa

allan@noao.UUCP (06/27/84)

I agree about doing your own benchmarks. I only got started in this because
of a request for timing information over the news net.


Peter Allan
Kitt Peak National Observatory
Tucson, Az

mwm@ea.UUCP (06/30/84)

#R:noao:-36000:ea:9900004:000:617
ea!mwm    Jun 30 13:55:00 1984

/***** ea:net.physics / noao!allan /  5:17 am  Jun 28, 1984 */
I conceed. C is faster than fortran when you write your programs correctly.

Peter Allan
UUCP:	{akgua,allegra,arizona,decvax,hao,ihnp4,lbl-csam,seismo}!noao!allan
/* ---------- */

Peter -

I feel that you ought to point out that this is true *only if you are using
double precision.* If you don't need double precision, then the single
precision f77 code is faster than your hand-optimized C.

This also leaves open the question of what C does to a carefully tuned
(tuned for accuracy, that is) algorithm if/when it rearranges your
expressions.

	<mike

west@sdcsla.UUCP (07/03/84)

To make things a little fairer, I rewrote the C matrix multiplication
program to take advantage of C's pointers.   Of course, this is something a
good ``optimizer'' would do for you, but it isn't that hard to do by hand.

Peter Allan's original program:

		/*  Multiply two matrices  */

		#define N 100

		main()
		{
		   float a[N][N],b[N][N],c[N][N],sum;
		   int i,j,k;

		   /*  Ininialise a and b  */

		   for (i=0 ; i<N ; i++)
		      for (j=0 ; j<N ; j++)
		      {
			 a[i][j] = i+j;
			 b[i][j] = i-j;
		      }

		/*  Multiply a by b to give c  */

		   for (i=0 ; i<N ; i++)
		      for (j=0 ; j<N ; j++)
		      {
			 sum = 0.0;
			 for (k=0 ; k<N ; k++)
			    sum += a[i][k]*b[k][j];
			 c[i][j] = sum;
		      }
		}


My quick and easy improvements:

	/*  Multiply two matrices  */

	#define N 100

	main()
	{
	   float a[N][N],b[N][N],c[N][N];
	   register float sum;
	   register float *pa, *pb, *pc;
	   register int i,j,k;

	   /*  Initialize a and b  */

	   for (i=0 ; i<N ; i++)
	      {
	      pa = a[i];		/* actually could be done just once */
	      pb = b[i];
	      for (j=0 ; j<N ; j++, pa++, pb++)
		  {
		  *pa = i+j;
		  *pb = i-j;
		  }
	      }

	/*  Multiply a by b to give c  */

	   for (i=0 ; i<N ; i++)
	      {
	      pa = a[i];
	      pb = b[i];
	      for (j=0 ; j<N ; j++, pa++, pb++)
		 {
		 sum = 0.0;
		 for (k=0 ; k<N ; k++)
		    sum += (*pa)*(*pb);
		 c[i][j] = sum;
		 }
	      }
	}

---------------

So the differences are simply judicious use of "register" declarations
and using pointers and pointer incrementation to step through an array.
The initializations of "pa" and "pb" could even be moved out of the
outer loops, but for very little gain.

Timings taken on a fairly uncrowded VAX 750 (no floating point extras),
for ``optimized'' compilation:

time mm1	-- original C program, "optimized".
67.8u 11.6s 2:17 57% 1+202k 0+0io 3pf+0w
67.3u 6.2s 2:31 48% 1+201k 2+1io 1pf+0w

time mm1a	-- improved C program, "optimized".
21.9u 2.5s 0:40 60% 1+198k 0+0io 2pf+0w
21.9u 2.2s 0:40 60% 1+200k 0+0io 2pf+0w

time mm2	-- original Fortran program, "optimized".
28.1u 3.5s 0:51 61% 11+243k 1+1io 11pf+0w
28.6u 3.0s 1:09 45% 11+244k 0+0io 5pf+0w

Note that now C at least beats f77 on its (C's) home turf.

		-- Larry West, UC San Diego, Institute for Cognitive Science
		-- decvax!ittvax!dcdwest!sdcsvax!sdcsla!west
		-- ucbvax!sdcsvax!sdcsla!west
		-- west@NPRDC

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/11/84)

C explicitly does NOT rearrange most arithmetic operations
(exceptions are supposedly commutative operators).