[comp.sys.hp] README: Hazard in 68040 upgrades for HP9000/400t!

gritz@seas.gwu.edu (Larry Gritz) (06/05/91)

We have a cluster of HP9000/400 series workstations, including the following:
	3 HP9000/433s
	2 HP9000/425t
	5 HP9000/400t diskless nodes

We have put in procurements for upgrades of all these workstations to 68040
processors (400t -> 425t).  So far, two of these upgrades have arrived and
were installed.
   At first, we were very happy with the two- to three-fold increase in speed.
Then, one by one, we started to come across programs that ran slower on the
64040's.  Not a little slower, but TEN TIMES SLOWER OR MORE!
   The most dramatic was a program which read in five vectors of about 50000
doubles, computed some statistics, then wrote a similar number of double to
an ASCII files.  The calculations took practically no time, but the disk write
operation took a half hour!  (It took about 3 minutes before the upgrade.)
The program would just sit there, with very intermittant short writes to the
disk.
   At first, we thought it was a problem with writing the files, but it turns
out that many functions that were done in floating point hardware before the
upgrade are not NOT USED ON THE 040, and are EMULATED IN SOFTWARE.  Among
these functions are all trig functions and some routines that are needed by
fprintf.  (These were our problems.)
   We called HP, who were not helpful at all.  The gave us a runaround, and
were generally not interested.  The support people said, "we deal with bugs,
not performance problems."  But if we spend a couple thousand bucks on an
upgrade, and afterwords certain programs run ten times slower, that's a
big problem!  HP told us "calculations with doubles aren't a common 
operation."  ON AN ENGINEERING WORKSTATION???  Who are they kidding?
   The director of our department wants to give his upgrade back and get the
030 back in his machine because his programs ran faster before the upgrade.
We are considering cancelling the rest of our 68040 upgrade orders.  I urge
all of you to think very carefully before buying the 68040 upgrades.
   Comments?  Similar problems?  Solutions?  Feel free to contact me by email
or phone, see my signature.


Here's a short example program with benchmark results:

-------------------------- 8< cut here  >8 ----------------------------------

#include <stdio.h>
#include <math.h>
/* 
	This sample program demonstrates problems with the 68040
	cpu.  Here are benchmarks for the same program, run on 
	(1) A 68030 diskless HP9000/400t, 8 Mb RAM, served by an
	HP9000/835 disk server,
	(2) A 68040 HP9000/425t, with 16 Mb RAM, internal 200 Mb and
	external 330 Mb disks.

	TIME:		(1) 68030		(2) 68040

	real		1m 16.42 s		57m 11.20 s
	user		1m  3.96 s		 0m 28.46 s
	system       	0m  2.46 s		45m 41.26 s
*/
#define NVALS 50000
void main (void)
{
  FILE *gp, *xp, *pp;
  double *indvar, *depvar, *coef, *yest, *resid, *coefsig;
  double r, rsq, see;
  char error;
  int i;

  gp = fopen ("first.out", "w");
  xp = fopen ("second.out", "w");
  pp = fopen ("third.out", "w");

  indvar  = (double *) malloc (NVALS * sizeof (double));
  depvar  = (double *) malloc (NVALS * sizeof (double));
  yest    = (double *) malloc (NVALS * sizeof (double));
  resid   = (double *) malloc (NVALS * sizeof (double));

  for (i = 0; i < NVALS; i++) {
      resid[i] += 1.0;
      yest[i] += 1.0;
      depvar[i] += 1.0;
      indvar[i] += 1.0;
   }
   printf("Ready to output stuff\n");
   /*  The program reaches this stage instantly; all the time-wasting
   takes place below: */
  for (i = 0; i < NVALS; i++) {
      fprintf (gp, "%19.12lf   %19.12lf   %19.12lf   %19.12lf\n",
	       indvar[i], depvar[i], yest[i], resid[i]);
      fprintf (xp, "%19.12lf   %19.12lf\n", indvar[i], resid[i]);
      fprintf (pp, "%19.12lf   %19.12f\n", indvar[i], yest[i]);
   }
}

-------------------------- 8< cut here  >8 ----------------------------------
--

Larry Gritz                                   lg@galileo.usno.navy.mil
US Naval Observatory                          phone: 202-653-1034
Washington, DC 20392-5100                     also: gritz@seas.gwu.edu

vandys@sequent.com (Andrew Valencia) (06/06/91)

gritz@seas.gwu.edu (Larry Gritz) writes:
>   At first, we were very happy with the two- to three-fold increase in speed.
>Then, one by one, we started to come across programs that ran slower on the
>64040's.  Not a little slower, but TEN TIMES SLOWER OR MORE!

I seem to remember that they moved the floating point functions onto
the '040 chip, but that they couldn't fit all of them.  Moto arranged for
some emulation code, appropriate for a trap handler, to be written and
made available to '040 system vendors.  I very much suspect that you're
being hit by this.

One thought is to see if you can get that code and link it in from user
mode.  Then you only have to pay for emulating the instruction--not the
whole trap shmear.  With the speed of the '040, this may not even perform
badly at all!  If HP's giving you the runaround, you may be talking to the
wrong folks.  With its divisional structure, HP is a lot more like a bunch
of closely tied companies than you might expect.  See if you can get through
to a tech support type at the Fort Collins, CO systems division--they do
a lot of 300/400 stuff out there, they also do a lot of X windows.  Something
tells me they might have a trick or two left to show you.

								Andy

Disclaimer: these are only my opinions

hardy@golem.ps.uci.edu (Meinhard E. Mayer (Hardy)) (06/09/91)

In article <3270@sparko.gwu.edu> gritz@seas.gwu.edu (Larry Gritz) writes:
[... stuff deleted]
>>   /* 
>>           This sample program demonstrates problems with the 68040
>>           cpu.  Here are benchmarks for the same program, run on 
>>           (1) A 68030 diskless HP9000/400t, 8 Mb RAM, served by an
>>           HP9000/835 disk server,
>>           (2) A 68040 HP9000/425t, with 16 Mb RAM, internal 200 Mb and
>>           external 330 Mb disks.
>>
>>           TIME:		(1) 68030		(2) 68040
>>
>>           real		1m 16.42 s		57m 11.20 s
>>           user		1m  3.96 s		 0m 28.46 s
>>           system       	0m  2.46 s		45m 41.26 s
>>   */
>>
Uout of curiosity, I ran the program on a 25 MHz NeXT with 20 MB of
memory.
Here is the timing (in NeXT format ... use your own translations):

27.772u 60.903s 1:36.83 91.5% 0+0k 33+1137io 0pf+0w

Somewhat slower than the 68030 , but a far cry from what the above.
Here are the sizes of the code and output on the NeXT, for comparison.

-rw-rw-r--  1 hardy    wheel    2100000 Jun  9 09:17 third.out
-rw-rw-r--  1 hardy    wheel    2100000 Jun  9 09:17 second.out
-rw-rw-r--  1 hardy    wheel    4300000 Jun  9 09:17 first.out
drwxrwxr-x  5 hardy    wheel       1024 Jun  9 09:16 ./
-rwxrwxr-x  1 hardy    wheel      16384 Jun  9 09:16 float*

Would the experts please comment?



 

Greetings,
Hardy 
			  -------****-------
Meinhard E. Mayer (Hardy);  Department of Physics, University of California
Irvine CA 92717; (714) 856 5543; hardy@golem.ps.uci.edu or MMAYER@UCI.BITNET

chen@digital.sps.mot.com (Jinfu Chen) (06/11/91)

>   /* 
>           This sample program demonstrates problems with the 68040
>           cpu.  Here are benchmarks for the same program, run on 
>           (1) A 68030 diskless HP9000/400t, 8 Mb RAM, served by an
>           HP9000/835 disk server,
>           (2) A 68040 HP9000/425t, with 16 Mb RAM, internal 200 Mb and
>           external 330 Mb disks.
>
>           TIME:		(1) 68030		(2) 68040
>
>           real		1m 16.42 s		57m 11.20 s
>           user		1m  3.96 s		 0m 28.46 s
>           system     	0m  2.46 s		45m 41.26 s
>   */

For your comparison on Domain/OS running on 400t and 425t (both have 2x210MB
disks and 16MB RAM):

            TIME:     400t (50Mhz 030)     425t (25Mhz 040)

            real         181.8                 110.3
            user         147.5                  74.8
            system        13.7                   8.4

I deleted all the *.out files before running other tests.
-- 
Jinfu Chen      (602)898-5338 
Motorola, Inc.  SPS,  Mesa, AZ
 ...uunet!motsps!digital!chen
chen@digital.sps.mot.com
----------