gritz@seas.gwu.edu (Larry Gritz) (06/05/91)
We have a cluster of HP9000/400 series workstations, including the following: 3 HP9000/433s 2 HP9000/425t 5 HP9000/400t diskless nodes We have put in procurements for upgrades of all these workstations to 68040 processors (400t -> 425t). So far, two of these upgrades have arrived and were installed. At first, we were very happy with the two- to three-fold increase in speed. Then, one by one, we started to come across programs that ran slower on the 64040's. Not a little slower, but TEN TIMES SLOWER OR MORE! The most dramatic was a program which read in five vectors of about 50000 doubles, computed some statistics, then wrote a similar number of double to an ASCII files. The calculations took practically no time, but the disk write operation took a half hour! (It took about 3 minutes before the upgrade.) The program would just sit there, with very intermittant short writes to the disk. At first, we thought it was a problem with writing the files, but it turns out that many functions that were done in floating point hardware before the upgrade are not NOT USED ON THE 040, and are EMULATED IN SOFTWARE. Among these functions are all trig functions and some routines that are needed by fprintf. (These were our problems.) We called HP, who were not helpful at all. The gave us a runaround, and were generally not interested. The support people said, "we deal with bugs, not performance problems." But if we spend a couple thousand bucks on an upgrade, and afterwords certain programs run ten times slower, that's a big problem! HP told us "calculations with doubles aren't a common operation." ON AN ENGINEERING WORKSTATION??? Who are they kidding? The director of our department wants to give his upgrade back and get the 030 back in his machine because his programs ran faster before the upgrade. We are considering cancelling the rest of our 68040 upgrade orders. I urge all of you to think very carefully before buying the 68040 upgrades. Comments? Similar problems? Solutions? Feel free to contact me by email or phone, see my signature. Here's a short example program with benchmark results: -------------------------- 8< cut here >8 ---------------------------------- #include <stdio.h> #include <math.h> /* This sample program demonstrates problems with the 68040 cpu. Here are benchmarks for the same program, run on (1) A 68030 diskless HP9000/400t, 8 Mb RAM, served by an HP9000/835 disk server, (2) A 68040 HP9000/425t, with 16 Mb RAM, internal 200 Mb and external 330 Mb disks. TIME: (1) 68030 (2) 68040 real 1m 16.42 s 57m 11.20 s user 1m 3.96 s 0m 28.46 s system 0m 2.46 s 45m 41.26 s */ #define NVALS 50000 void main (void) { FILE *gp, *xp, *pp; double *indvar, *depvar, *coef, *yest, *resid, *coefsig; double r, rsq, see; char error; int i; gp = fopen ("first.out", "w"); xp = fopen ("second.out", "w"); pp = fopen ("third.out", "w"); indvar = (double *) malloc (NVALS * sizeof (double)); depvar = (double *) malloc (NVALS * sizeof (double)); yest = (double *) malloc (NVALS * sizeof (double)); resid = (double *) malloc (NVALS * sizeof (double)); for (i = 0; i < NVALS; i++) { resid[i] += 1.0; yest[i] += 1.0; depvar[i] += 1.0; indvar[i] += 1.0; } printf("Ready to output stuff\n"); /* The program reaches this stage instantly; all the time-wasting takes place below: */ for (i = 0; i < NVALS; i++) { fprintf (gp, "%19.12lf %19.12lf %19.12lf %19.12lf\n", indvar[i], depvar[i], yest[i], resid[i]); fprintf (xp, "%19.12lf %19.12lf\n", indvar[i], resid[i]); fprintf (pp, "%19.12lf %19.12f\n", indvar[i], yest[i]); } } -------------------------- 8< cut here >8 ---------------------------------- -- Larry Gritz lg@galileo.usno.navy.mil US Naval Observatory phone: 202-653-1034 Washington, DC 20392-5100 also: gritz@seas.gwu.edu
vandys@sequent.com (Andrew Valencia) (06/06/91)
gritz@seas.gwu.edu (Larry Gritz) writes: > At first, we were very happy with the two- to three-fold increase in speed. >Then, one by one, we started to come across programs that ran slower on the >64040's. Not a little slower, but TEN TIMES SLOWER OR MORE! I seem to remember that they moved the floating point functions onto the '040 chip, but that they couldn't fit all of them. Moto arranged for some emulation code, appropriate for a trap handler, to be written and made available to '040 system vendors. I very much suspect that you're being hit by this. One thought is to see if you can get that code and link it in from user mode. Then you only have to pay for emulating the instruction--not the whole trap shmear. With the speed of the '040, this may not even perform badly at all! If HP's giving you the runaround, you may be talking to the wrong folks. With its divisional structure, HP is a lot more like a bunch of closely tied companies than you might expect. See if you can get through to a tech support type at the Fort Collins, CO systems division--they do a lot of 300/400 stuff out there, they also do a lot of X windows. Something tells me they might have a trick or two left to show you. Andy Disclaimer: these are only my opinions
hardy@golem.ps.uci.edu (Meinhard E. Mayer (Hardy)) (06/09/91)
In article <3270@sparko.gwu.edu> gritz@seas.gwu.edu (Larry Gritz) writes: [... stuff deleted] >> /* >> This sample program demonstrates problems with the 68040 >> cpu. Here are benchmarks for the same program, run on >> (1) A 68030 diskless HP9000/400t, 8 Mb RAM, served by an >> HP9000/835 disk server, >> (2) A 68040 HP9000/425t, with 16 Mb RAM, internal 200 Mb and >> external 330 Mb disks. >> >> TIME: (1) 68030 (2) 68040 >> >> real 1m 16.42 s 57m 11.20 s >> user 1m 3.96 s 0m 28.46 s >> system 0m 2.46 s 45m 41.26 s >> */ >> Uout of curiosity, I ran the program on a 25 MHz NeXT with 20 MB of memory. Here is the timing (in NeXT format ... use your own translations): 27.772u 60.903s 1:36.83 91.5% 0+0k 33+1137io 0pf+0w Somewhat slower than the 68030 , but a far cry from what the above. Here are the sizes of the code and output on the NeXT, for comparison. -rw-rw-r-- 1 hardy wheel 2100000 Jun 9 09:17 third.out -rw-rw-r-- 1 hardy wheel 2100000 Jun 9 09:17 second.out -rw-rw-r-- 1 hardy wheel 4300000 Jun 9 09:17 first.out drwxrwxr-x 5 hardy wheel 1024 Jun 9 09:16 ./ -rwxrwxr-x 1 hardy wheel 16384 Jun 9 09:16 float* Would the experts please comment? Greetings, Hardy -------****------- Meinhard E. Mayer (Hardy); Department of Physics, University of California Irvine CA 92717; (714) 856 5543; hardy@golem.ps.uci.edu or MMAYER@UCI.BITNET
chen@digital.sps.mot.com (Jinfu Chen) (06/11/91)
> /* > This sample program demonstrates problems with the 68040 > cpu. Here are benchmarks for the same program, run on > (1) A 68030 diskless HP9000/400t, 8 Mb RAM, served by an > HP9000/835 disk server, > (2) A 68040 HP9000/425t, with 16 Mb RAM, internal 200 Mb and > external 330 Mb disks. > > TIME: (1) 68030 (2) 68040 > > real 1m 16.42 s 57m 11.20 s > user 1m 3.96 s 0m 28.46 s > system 0m 2.46 s 45m 41.26 s > */ For your comparison on Domain/OS running on 400t and 425t (both have 2x210MB disks and 16MB RAM): TIME: 400t (50Mhz 030) 425t (25Mhz 040) real 181.8 110.3 user 147.5 74.8 system 13.7 8.4 I deleted all the *.out files before running other tests. -- Jinfu Chen (602)898-5338 Motorola, Inc. SPS, Mesa, AZ ...uunet!motsps!digital!chen chen@digital.sps.mot.com ----------