[comp.sys.hp] HP825 math 15x SLOWER than 825

rclark@bgphp1.UUCP (Roger N. Clark) (03/26/88)

I have benchmarked the HP9000 series 825 using number crunching
programs and find:

        The 825 is 5 to 7 times SLOWER than a single cpu 500!!!!!

        In a multitasking environment the 825 can be at least

                     15 TIMES SLOWER
                     ^^^^^^^^^^^^^^^
        than a 3 cpu 500!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

The details:

In February a note was posted to comp.sys.hp that the HP9000 series
500 was being discontinued.  That caused quite a flurry of
responses, including several that said the new HP9000 series 825 is
much faster.  I have heard several stories about how 3 9000 s500's
were replaced with one 825 and everyone was happy.  HP is saying the
825 is very fast.

Well, on February 19, I posted a rather strong note about the 500
being discontinued.  The 500 is no longer being suported in that
there will be no more software releases (that is especially
disturbing considering that HP-UX 5.21 apparently has many
problems).  I need features that are not on the 500 (network file
system or at least TCP/IP, domain-based mail).  HP has said I need
to upgrade to an 825 (or higher).

Before changing machines, I benchmark it with programs similar to
what my group does.  Here at the USGS, we do analysis of spectra of
rocks and minerals and apply the results to imaging data (remote
sensing).  I am on 3 NASA planetary spacrcraft teams and the methods
will be applied to gigabytes of data in the 1990's.
The analyses includes some very sophisticated (and number
crunching intensive) modeling programs.  The programs are not huge
(less than 2 MBytes on a 6.5 MByte system) and we do not have a
paging problem.

Below are the results of a simple "wierd box filter" program.  This
program shows a typical response in our shop.  It does both array
indexing and computation on elements in the arrays.  The compiled
program is only about 350KBytes in size and it does not page to
disk.  

             A Multitasking, CPU intensive Benchmark

                          Real Time
-----------------------------------------------------------------------
                                  Number of Tasks
System                 1     2     3     4     5      7     10    12
-----------------------------------------------------------------------
HP9000/500 3 CPUs     5.9   6.0   6.3   8.4  10.5   14.7   21.5   27.8
HP900/825 HPUX1.2    29.1  58.1  87.2 116.3 145.6  205.0  291.5  350.1
-----------------------------------------------------------------------
                           CPU Time
-----------------------------------------------------------------------
                                  Number of Tasks
System                 1     2     3     4     5      7     10     12
-----------------------------------------------------------------------
HP9000/500 3 CPUs     5.8  11.8  18.4  24.4  30.7   43.0   62.2   81.4
HP900/825 HPUX1.2    29.0  57.9  87.0 116.0 144.7  202.7  288.5  346.5
-----------------------------------------------------------------------
NOTES:
HP9000/500: 6.5 MBytes main memory, 3 floating point CPUs, 65MByte
            system, 55MByte /tmp disk, 132MByte user disk, 571MByte
            data disk (Used by virtual memory), HP-UX 5.21.

HP9000 series 825 (HP Precision Architecture, RISC machine)  16 MBytes of
            main memory, single 404MByte disk drive.  HP Demo, 3/23/88.
            HP-UX 1.2 (Also tried it on HP-UX 2.0 pre-release with slightly
            worse results).
-----------------------------------------------------------------------

I have several other benchmarks.  On number crunching programs that
do not have array indexing (just do +, -, *, /, logs, sin, cos,
sqrt, powers) the results came out (normalized to s500):

                                 single cpu
                  program    825   500
---------------------------------------------------------------
                   in C      7.6    1   (825 7.6  times SLOWER)
 single precision Fortran    3.23   1   (825 3.23 times SLOWER)
 double precision Fortran    6.7    1   (825 6.7  times SLOWER)


WHAT DOES ALL THIS MEAN?  HP advertises the 825 as a 0.5 megaflop
machine.  My results show it as about a 0.03 megaflop machine.  The
benchmarks were done several times wiith different machine
configurations at the Neeley sales region (Hal Shearer, hpuecoa!hals).
HP has benn very helpful but has not been able to figure out why
these results are so bad.

HP has a new 835 that is substantially faster.  This benchmark has
been run at Fort Collins but I haven't gotten the results yet.  I
have heard that they are faster than a 3 cpu 500 however.

A LESSON EVERYONE SHOULD KNOW: BENCHMARK YOUR APPLICATION BEFORE YOU
BUY A MACHINE.                           ^^^^^^^^^^^^^^^^

Is the 825 really that bad?  Could there be a problem with the 825
I tested.  The sieve benchmark came out 12 times faster than a
single cpu 500 and all my I/O benchmarks came out very fast.  I
think the 825 has a real problem with number crunching.

I then looked at alternatives to the 825.  I tried a 350 but I
currently have about 8 to 10 users on every day.  We have 29 RS232
ports, 6 HP-IB cards (4 disks, 2 plotters, 1 9-track tape, 2 cartridge
tapes), 2 printers, 3 modems and 3 spectrometers connected to the
500.  (The benchmarks were also done on the 500 WHILE a program was locked
in memory gathering data from a spectrometer real time!).  The 350
does not have enough slots to put all this stuff in it.

CONCLUSIONS:
The HP9000 series 500 is a DAMN GOOD machine.  HP doesn't seem to
know how good it is!  I gues because they failed to market it those
who bought it now have to suffer.

HOW GOOD IS IT?
As I write this note, we have been up 129 days.  We have never had a
operating system crash!  In 4 years, we have only gone down for adding new
boards, occassional disk image backups, or power failures.  We have
been up for as long as 6 months!  We have 8 to 10 users on every
day, and gather data from 3 different spectrometers while users are
doing compute modeling and interactive analysis with graphics (on
HP2623A and HP2393A terminals).  The machine is currently the
central node in our Branch uucp network and a nationwide uucp
network of spectroscopy groups.  During power failures, we have
never lost data except once:  our air conditioner caught fire and I
pulled the plug!  (we only lost one small text file, and we had many
active users on at the time).  The process ID rolls over (32000 or
processes) every day or two.  We have had only two hardware
problems: the main power supply went out shortly ofter installation
and an 8-channel mux went bad about a year ago.  


I HAVE NEVER SEEN SUCH A SOLID MACHINE!

Contrast the above to our VAXes and PEs: they have to reboot every
few days to a couple of weeks or so, and have hadware problems about
every month (of course they are getting old and are older
technology).

************************************************************************
*                                                                      *
*  BRING BACK THE 500 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  *
*                                                                      *
************************************************************************

Below is the "wierd box filter" benchmark.  Try it yourself.  I
would be interested in what you find.

Roger N. Clark
Research Scientist
U.S. Geological Survey, MS 964
Box 25046 Federal Center
Denver, CO 80225-0046
(303) 236-1332
 FTS  776-1332
{known-world}!hplabs!hpfcla!hpfcse!hpuecoa!bgphp1!rclark



#---------------------------------- cut here ----------------------------------
# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by rclark at bgphp1 on Fri Mar 25 08:07:26 1988
#
# This archive contains:
#	makefile	speedtest.f	multi.sh	timeit		
#
# Error checking via wc(1) will be performed.
# Error checking via sum(1) will be performed.

echo x - makefile
cat >makefile <<'@EOF'
CFLAGS= 
FFLAGS=
LFLAGS=
RFLAGS= -6% -C
GET= get
GFLAGS=

a.out:	speedtest.f
		f77 $(FFLAGS) speedtest.f
@EOF
set -- `sum <makefile`; if test $1 -ne 7217
then
	echo ERROR: makefile checksum is $1 should be 7217
fi
if test "`wc -lwc <makefile`" != '      9     14    105'
then
	echo ERROR: wc results of makefile are `wc -lwc <makefile` should be       9     14    105
fi

chmod 644 makefile

echo x - speedtest.f
cat >speedtest.f <<'@EOF'
C  array addressing and number crunching

        implicit integer*4 (i-n)

        common array1(200,200), array2(200,200), z(9)

        limit = 200
        ktimes = 1

C initialize arrays
	do 10 j= 1, 9
		z(j) = float(j)+2.0
10	continue

        x = 1.0
        do 30 j = 1, limit
                do 20 i = 1, limit
                        x = x + 1.0
                        array1(i,j) = x
20              continue
30      continue

        do 200 k = 1, ktimes

C main computation loop: Weird Box Filter
                do 100 j = 2, limit-1

                        do 50 i = 2, limit-1

                                array2(i,j) = 
     1					( array1(i-1,j-1)*2.0*z(1)
     1                                   +array1(i  ,j-1)*2.0/z(2)
     1                                   +array1(i+1,j-1)*2.0*z(3)
     1                                   +array1(i-1,j  )*2.0/z(4)
     1                                   +array1(i  ,j  )*2.0*z(5)
     1                                   +array1(i+1,j  )*2.0/z(6)
     1                                   +array1(i-1,j+1)*2.0*z(7)
     1                                   +array1(i  ,j+1)*2.0/z(8)
     1                                   +array1(i+1,j+1)*2.0*z(9))
     1					/(9.0*(z(1)-z(2)+z(3)-
     1					  z(4)+z(5)-z(6)+z(7)-
     1					  z(8)+z(9)))


50                      continue
100             continue
C main computation loop complete

200     continue

        stop
        end
@EOF
set -- `sum <speedtest.f`; if test $1 -ne 11286
then
	echo ERROR: speedtest.f checksum is $1 should be 11286
fi
if test "`wc -lwc <speedtest.f`" != '     52    130   1447'
then
	echo ERROR: wc results of speedtest.f are `wc -lwc <speedtest.f` should be      52    130   1447
fi

chmod 644 speedtest.f

echo x - multi.sh
cat >multi.sh <<'@EOF'

for i
do
	a.out &
done
wait
@EOF
set -- `sum <multi.sh`; if test $1 -ne 2160
then
	echo ERROR: multi.sh checksum is $1 should be 2160
fi
if test "`wc -lwc <multi.sh`" != '      6      7     29'
then
	echo ERROR: wc results of multi.sh are `wc -lwc <multi.sh` should be       6      7     29
fi

chmod 744 multi.sh

echo x - timeit
cat >timeit <<'@EOF'
set -x

echo "********** weird box filter *********"

/bin/time /bin/sh multi.sh 1
/bin/time /bin/sh multi.sh 1
/bin/time /bin/sh multi.sh 1

/bin/time /bin/sh multi.sh 1 2
/bin/time /bin/sh multi.sh 1 2
/bin/time /bin/sh multi.sh 1 2

/bin/time /bin/sh multi.sh 1 2 3
/bin/time /bin/sh multi.sh 1 2 3
/bin/time /bin/sh multi.sh 1 2 3

/bin/time /bin/sh multi.sh 1 2 3 4
/bin/time /bin/sh multi.sh 1 2 3 4
/bin/time /bin/sh multi.sh 1 2 3 4

/bin/time /bin/sh multi.sh 1 2 3 4 5
/bin/time /bin/sh multi.sh 1 2 3 4 5
/bin/time /bin/sh multi.sh 1 2 3 4 5

/bin/time /bin/sh multi.sh 1 2 3 4 5 6 7
/bin/time /bin/sh multi.sh 1 2 3 4 5 6 7
/bin/time /bin/sh multi.sh 1 2 3 4 5 6 7

/bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 8 9 10
/bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 8 9 10
/bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 8 9 10

echo "************ DONE weird box filter benchmark ************"
@EOF
set -- `sum <timeit`; if test $1 -ne 200
then
	echo ERROR: timeit checksum is $1 should be 200
fi
if test "`wc -lwc <timeit`" != '     33    175    888'
then
	echo ERROR: wc results of timeit are `wc -lwc <timeit` should be      33    175    888
fi

chmod 755 timeit

exit 0

diamant@hpfclp.HP.COM (John Diamant) (03/29/88)

> I have benchmarked the HP9000 series 825 using number crunching
> programs and find:
> 
>         In a multitasking environment the 825 can be at least
> 
>                      15 TIMES SLOWER
>                      ^^^^^^^^^^^^^^^
>         than a 3 cpu 500!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

These are not the numbers I get.  I'm not sure where all the discrepancy
came from, but I ran your own, unmodified program on our 825.  Our machine
has more memory, and a possibly later version of the 2.0 prerelease.

However, I have to point out that your benchmark was run unoptimized, which
is not a good idea on a RISC-based machine.  As you can see from the numbers
below, the optimization (HP9000/824 HP-UX2.0opt) makes quite a difference.
Compile with "-O" to get optimization.  It will make much more of a difference
on a RISC machine than a CISC machine, so running unoptimized on both
machines is not a fair comparison (it's much less important on the 500).

I wasn't sure whether you counted both user and sys time in CPU time.  I
did in my numbers.  System time varied between .3 and 3 seconds.

> The details:
> 
>              A Multitasking, CPU intensive Benchmark
> 
>                           Real Time
> -----------------------------------------------------------------------
>                                   Number of Tasks
> System                 1     2     3     4     5      7     10    12
> -----------------------------------------------------------------------
> HP9000/500 3 CPUs     5.9   6.0   6.3   8.4  10.5   14.7   21.5   27.8
> HP900/825 HPUX1.2    29.1  58.1  87.2 116.3 145.6  205.0  291.5  350.1
HP9000/825 HP-UX2.0pre  3.0   5.3   7.8  10.4  13.2   18.3   27.4   31.7
HP9000/825 HP-UX2.0opt  2.4   4.3   6.5   9.3   9.9   13.9   18.9   23.0
> -----------------------------------------------------------------------
>                            CPU Time
> -----------------------------------------------------------------------
>                                   Number of Tasks
> System                 1     2     3     4     5      7     10     12
> -----------------------------------------------------------------------
> HP9000/500 3 CPUs     5.8  11.8  18.4  24.4  30.7   43.0   62.2   81.4
> HP900/825 HPUX1.2    29.0  57.9  87.0 116.0 144.7  202.7  288.5  346.5
HP9000/824 HP-UX2.0pre  2.7   5.2   7.7  10.3  12.9   18.2   26.2   31.1
HP9000/824 HP-UX2.0opt  2.0   3.9   5.8   7.8   9.5   13.3   18.7   19.6
> -----------------------------------------------------------------------
> NOTES:
> HP9000/500: 6.5 MBytes main memory, 3 floating point CPUs, 65MByte
>             system, 55MByte /tmp disk, 132MByte user disk, 571MByte
>             data disk (Used by virtual memory), HP-UX 5.21.
> 
> HP9000 series 825 (HP Precision Architecture, RISC machine)  16 MBytes of
>             main memory, single 404MByte disk drive.  HP Demo, 3/23/88.
>             HP-UX 1.2 (Also tried it on HP-UX 2.0 pre-release with slightly
>             worse results).

HP9000 series 825: 32 MBytes of main memory, single 404Mb disk drive.
	      HP-UX 2.00 prerelease (probably more recent than yours).  The
	      opt entries were run through the optimizer; the other ones
	      weren't.
> -----------------------------------------------------------------------
> 
> WHAT DOES ALL THIS MEAN?  HP advertises the 825 as a 0.5 megaflop
> machine.  My results show it as about a 0.03 megaflop machine.  The
> benchmarks were done several times wiith different machine
> configurations at the Neeley sales region (Hal Shearer, hpuecoa!hals).
> HP has benn very helpful but has not been able to figure out why
> these results are so bad.

My numbers are coming out over 10 times better than yours, so the .5 megaflop
seems about right.  I don't know why you were getting so much worse numbers,
but I doubt the extra 16 MBytes was the difference, since your program
was so small (unless it dynamically allocated a whole bunch of memory).
Floating point hardware in the 825 is essentially the same as in the
500, so the multi-CPU 500 could be somewhat better.  Other series 800 machines
have faster floating point hardware.  In operations other than floating
point, the 825 is faster than the 500 (even multi-CPU).
> 
> HP has a new 835 that is substantially faster.  This benchmark has
> been run at Fort Collins but I haven't gotten the results yet.  I
> have heard that they are faster than a 3 cpu 500 however.

The 835 uses faster floating point hardware, so this would be no surprise.
> 
> A LESSON EVERYONE SHOULD KNOW: BENCHMARK YOUR APPLICATION BEFORE YOU
> BUY A MACHINE.                           ^^^^^^^^^^^^^^^^

This is a good lesson in any case, though I think the machine you
were testing on may have been misconfigured.
> 
> Is the 825 really that bad?  Could there be a problem with the 825
> I tested.  The sieve benchmark came out 12 times faster than a
> single cpu 500 and all my I/O benchmarks came out very fast.  I
> think the 825 has a real problem with number crunching.

This is not consistent with other benchmarks, so I suspect it was something
with the particular 825 you tested on.


John Diamant
SDE				UUCP:  {hplabs,hpfcla}!hpfclp!diamant
Hewlett-Packard Co.		ARPA Internet: diamant%hpfclp@hplabs.HP.COM
Fort Collins, CO

shankar@hpclscu.HP.COM (Shankar Unni) (03/30/88)

> 
>         In a multitasking environment the 825 can be at least
> 
>                      15 TIMES SLOWER
>                      ^^^^^^^^^^^^^^^
>         than a 3 cpu 500!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> 
> NOTES:
> HP9000/500: 6.5 MBytes main memory, 3 floating point CPUs, 65MByte
>             system, 55MByte /tmp disk, 132MByte user disk, 571MByte
>             data disk (Used by virtual memory), HP-UX 5.21.
> 
> HP9000 series 825 (HP Precision Architecture, RISC machine)  16 MBytes of
>             main memory, single 404MByte disk drive.  HP Demo, 3/23/88.
>             HP-UX 1.2 (Also tried it on HP-UX 2.0 pre-release with slightly
>             worse results).
> 
Errr...,

Before posting such stuff, it might have been nice to consult with the
HP support org.

1. On your 500, your swap area and your user data area are on different
   disks, thus leading to less contention on the disk. On your 825 system,
   you used only one disk.

2. Did you compile your benchmark with optimization (-O)? From the makefile
   you attached, obviously not!

3. The 571 meg. disk on your 550 (a 7937, n'est ce pas?) is faster than the
   404 meg disk (a 7935) on the 825.

The first item is especially damaging, since your multitasking system is
obviously very swap-intensive.

Also, each system has one or more things that it does well. In the case of
the 825, what it does very well indeed is cpu-intensive stuff. The
cpu-to-memory bandwidth is good, too, but not on the same scale as the raw cpu
speed. The compilers on the s800, therefore, rely heavily on the optimizer to
take advantage of multiple registers and reduce physical memory usage as much
as possible. Therefore, to get the best performance out of applications on
the 825 , THEY NEED TO BE OPTIMIZED!!!

> 
> Is the 825 really that bad?  Could there be a problem with the 825
> I tested.  The sieve benchmark came out 12 times faster than a
                                          ^^^^^^^^^^^^^^^
> single cpu 500 and all my I/O benchmarks came out very fast.  I
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> think the 825 has a real problem with number crunching.
> 

See, there are no problems with integer arithmetic and I/O (which is memory
mapped). The problems you are having are with floating-point performance.
(The famous MFLOPS number). This is being addressed : the 825 is rated at 0.7
MFLOPS (LINPACK?), and there is now a newer model (the 835) out there with a
much faster floating-point card (> 2 times).

> I HAVE NEVER SEEN SUCH A SOLID MACHINE!
> 
> ************************************************************************
> *                                                                      *
> *  BRING BACK THE 500 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  *
> *                                                                      *
> ************************************************************************
> 

Thanx for the compliment. The 825 is pretty solid, too. Our s800 HP-UX (on
a different CPU) has not had a crash in over 2 years. The only problem we
ever had were a few mysterious swap errors which were traced down to an old,
faulty disk with a bad head.

The 500 had the unfortunate problem of being squeezed from both above (by
the new s800's) and below (by the s300's - 680x0 based boxes). This
positioning problem coupled with the (relatively) soft order situation was
what led to its obsolescence. The continuing effort to maintain and
enhance HP-UX on several radically different architectures was getting a
little difficult, and the resource crunch was much too severe to justify
such an effort.

Maybe if there's enough demand, there would be talk of reviving it, though I
feel that there are other alternatives. It won't be long at all before we
have an alternative machine that has the price of a 550 with much better
performance. In general, reducing the number of disparate machines has the
(overwhelming) benefit of offering customers a much wider range of
performance in the *same* architecture. The hard part that a company faces
in such a situation is choosing which set of customers to hurt: the old loyal
customers who are very happy with what they have, or the prospective new
customers who are looking for a wide range of models to choose from and a 
relatively easy migration path.

Besides  - the s500 HP-UX was sort of an unusual item - its guts are different
from the two major flavors out there (sysV, BSD) while trying valiantly to
present the same interface. Maintaining it was no simple matter of keeping the
guts uptodate across different models (which is what is done for the s300 and
the s800 models), but completely re-implementing each new feature. The file
system would have been most deeply affected by this - it was radically
different from anything in sysV or BSD. The s300/s800 version of HP-UX has an
essentially sysV file system with BSD extensions, and keeping track of industry
standard features like NFS was a simple matter of taking Sun's code and making
relatively minor modifications to fit it into HP-UX. For the s500, it would
have been an implement-from-scratch affair.

Lots of thought went into the decision to scrap the s500. So bear with us..

--scu

daryl@hpcllcm.HP.COM (Daryl Odnert) (03/30/88)

I would like to second Shankar's plea to make use of the -O (optimization)
option before making your final timings on the system.

Please try this and let us know what happens.

Thanks,
Daryl Odnert
Code Generation/Optimization Project
HP Computer Language Lab
{outside world}!hplabs!hpcllcm!daryl

jeffh@weycord.WEYCO.COM (03/30/88)

Ya know, I've ran a few tests with the 825 and didn't see 
much performance improvement. I'm a s300 user- a couple of 
users and a lot of I/O and math. So what is "RISK" after 
the marketing hoopla settles? I haven't seem any "real" 
performance increase.

The s800 reminds me of the 9817, whatever that thing was
with a big black and white monitor, and the s500. Seemed
like a good idea until it got lost in the background noise.

At least s300 and s500 HP-UX was close the the same. It 
might be a good idea to wait a few years to see if the s800 
follows the 9817's path...


Jeff Harrell
hpubvwa!weycord!jeffh
 

mash@mips.COM (John Mashey) (03/30/88)

In article <830004@bgphp1.UUCP> rclark@bgphp1.UUCP (Roger N. Clark) writes:

>I have benchmarked the HP9000 series 825 using number crunching
>programs and find:

>        The 825 is 5 to 7 times SLOWER than a single cpu 500!!!!!
>             A Multitasking, CPU intensive Benchmark
>
>                          Real Time
>-----------------------------------------------------------------------
>                                  Number of Tasks
>System                 1     2     3     4     5      7     10    12
>-----------------------------------------------------------------------
>HP9000/500 3 CPUs     5.9   6.0   6.3   8.4  10.5   14.7   21.5   27.8
>HP900/825 HPUX1.2    29.1  58.1  87.2 116.3 145.6  205.0  291.5  350.1
MIPS M/1000		.7   1.0   1.3   1.8   2.4    3.1    4.6
HP9000/825 GUESS       2     3     4     6     7      9     14
....

The 825's FPU must be broken or not there.  As one calibration,
the FORTRAN SP Linpack MFLOPS for these are:

.62	HP9000 Series 825S
.098	HP9000 Series 500

As another, from other FP benchmarks we've seen, we'd guess an 825S to have
about 30% of the performance of one of our MIPS M/1000s, whose numbers were
added above.  As can be seen, the 825 appears consistently about a factor
of 20 slower than you'd expect.  Trying this on one of our boxes with no FPU
slows it down by about a 40X [kernel emulation], so the 825 may be doing 
such emulation also.

I'd really be surprised if it were anything other than that.  I think
HP is pretty conservative and realistic on benchmarking: see, for example,
the "HP 9000 Series 800 Performance Brief", 5/87, a fine document,
well-written, with a broad coverage of useful benchmarks.
(This is presumably gettable from local HP offices (?))
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

decot@hpisod2.HP.COM (Dave Decot) (03/31/88)

Shankar Unni writes:
> ...  The s300/s800 version of HP-UX has an essentially sysV file system
> with BSD extensions, and keeping track of industry standard features
> like NFS was a simple matter of taking Sun's code and making relatively
> minor modifications to fit it into HP-UX.  For the s500, it would have
> been an implement-from-scratch affair.

A slight correction to forestall more misconceptions...

Both the Series 300 and Series 800 use a BSD-based (McKusick-style) file
system (indeed, most of the kernel is BSD with bug fixes), but the
interface above it is compatible with System V (although many BSD features
are also present).  Clearly, if we had used the System V file system
code, NFS and other networking products such as ARPA/BSD services would
have been much harder to port.

Dave Decot
hpda!decot

jarmo@tut.FI (Jarmo Sorvari) (03/31/88)

In article <830004@bgphp1.UUCP> rclark@bgphp1.UUCP (Roger N. Clark) writes:
>I have benchmarked the HP9000 series 825 using number crunching
>programs and find:
>        The 825 is 5 to 7 times SLOWER than a single cpu 500!!!!!
>        In a multitasking environment the 825 can be at least
>                     15 TIMES SLOWER
>                     ^^^^^^^^^^^^^^^
>        than a 3 cpu 500!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


>Below are the results of a simple "wierd box filter" program.  This
>program shows a typical response in our shop.  It does both array
>indexing and computation on elements in the arrays.  The compiled
>program is only about 350KBytes in size and it does not page to
>disk.  
>
>             A Multitasking, CPU intensive Benchmark
>
>                          Real Time
>-----------------------------------------------------------------------
>                                  Number of Tasks
>System                 1     2     3     4     5      7     10    12
>-----------------------------------------------------------------------
>HP9000/500 3 CPUs     5.9   6.0   6.3   8.4  10.5   14.7   21.5   27.8
>HP900/825 HPUX1.2    29.1  58.1  87.2 116.3 145.6  205.0  291.5  350.1
>-----------------------------------------------------------------------
>                           CPU Time
>-----------------------------------------------------------------------
>                                  Number of Tasks
>System                 1     2     3     4     5      7     10     12
>-----------------------------------------------------------------------
>HP9000/500 3 CPUs     5.8  11.8  18.4  24.4  30.7   43.0   62.2   81.4
>HP900/825 HPUX1.2    29.0  57.9  87.0 116.0 144.7  202.7  288.5  346.5
>-----------------------------------------------------------------------

I tried your benchmark also, and got results that look like they
should look like, at least to my mind.  The 9000/840 is the first RISC
model they produced (for HP-UX), implemented in TTL technology, has
4.5 MIPS, while the 825 is an NMOS machine and 7 MIPS (if my memory
serves me right).

             A Multitasking, CPU intensive Benchmark

                          Real Time
-----------------------------------------------------------------------
                                  Number of Tasks
System                 1     2     3     4     5      7     10    12
-----------------------------------------------------------------------
HP9000/500 3 CPUs     5.9   6.0   6.3   8.4  10.5   14.7   21.5   27.8
HP9000/825 HPUX1.2   29.1  58.1  87.2 116.3 145.6  205.0  291.5  350.1
HP9000/840 HPUX1.2-   3.0   7.8   9.9  11.6  14.3   20.2   28.9   37.4
HP9000/840 HPUX1.2+   2.1   3.9   5.8   7.8  10.9   16.2   26.0   32.3
"-": no optimization for the FORTRAN compilation, "+": full optimization
-----------------------------------------------------------------------
                           CPU Time
-----------------------------------------------------------------------
                                  Number of Tasks
System                 1     2     3     4     5      7     10     12
-----------------------------------------------------------------------
HP9000/500 3 CPUs     5.8  11.8  18.4  24.4  30.7   43.0   62.2   81.4
HP9000/825 HPUX1.2   29.0  57.9  87.0 116.0 144.7  202.7  288.5  346.5
HP9000/840 HPUX1.2-   2.8   5.8   8.6  11.3  14.5   20.7   29.0   35.3
HP9000/840 HPUX1.2+   2.0   3.9   5.9   8.0   9.8   13.6   19.6   24.6
"-": no optimization for the FORTRAN compilation, "+": full optimization
-----------------------------------------------------------------------
NOTES:
HP9000/500: 6.5 MBytes main memory, 3 floating point CPUs, 65MByte
            system, 55MByte /tmp disk, 132MByte user disk, 571MByte
            data disk (Used by virtual memory), HP-UX 5.21.

HP9000 series 825 (HP Precision Architecture, RISC machine)  16 MBytes of
            main memory, single 404MByte disk drive.  HP Demo, 3/23/88.
            HP-UX 1.2 (Also tried it on HP-UX 2.0 pre-release with slightly
            worse results).

HP9000 series 840 (HP Precision Architecture, RISC machine, TTL
            technology).  8 Mb of main memory, single 570 Mb disk
            drive.  HP-UX 1.2.  Tests run with a very small load, five
            users logged in (using a Bridge terminal server, and
            ARPA/Berkeley running in the 840).  HP gives the machine
            the nominal performance index 4.5 MIPS, as opposed to the
            7 MIPS for the 825.

-----------------------------------------------------------------------

>Is the 825 really that bad?  Could there be a problem with the 825
>I tested.

I suspect there were.

-- 
-----------------------------------------------------------------------------
! Jarmo Sorvari                         Control Engineering Laboratory      !
! ...!mcvax!tut.fi!jarmo                Tampere University of Technology    !
--------------------------------------- BOX 527, 33101 Tampere, Finland -----

rclark@bgphp1.UUCP (Roger N. Clark) (04/01/88)

WELL, my posting has certainly generated a lot of response!!

      THERE WAS A PROBLEM WITH THE HP9000/825 TESTED!!!

        HP 825 math IS NOT 15x slower than a 500!
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

HP 825 math is about 1.2x FASTER than a 500 (for the box filter benchmark).
                     ^^^^^^^^^^^^

The 825 math is barely faster than a 500 in a multitasking
environment.   In my opinion, the 825 does not represent a large
enough increase to justify trading in the 500 (which in my case
would require about $40,000).  I am concerned (and so are my users)
that any replacement machine be as solid as the 500 (e.g. as of this
writing we have been up 136 days and have never had a crash).

Can anyone tell me their experience with the HP9000 series 825 or
300 (or for that matter other machines in this price/speed class)?
Are there other machines that are as solid as the 500?

The problem on the 825 turned out to be that the floating point chip
was not working (shouldn't there have been a check of all parts of
the system at boot time and the problem reported?)  Would a 500
report such a problem?  Would a 300?

When the problem first became apparent, Hal Shearer (HP) worked very hard to
try and find out why.  We tried different configurations.  We did the test
on an 840 with much better results (I didn't keep them because I did
not run my entire set and we were trting to figure out why the 825
was so slow).  We also tried the benchmarks on a pre-release
version of HP-UX 2.0 on the 825.  After several weeks of not finding any
reason for the slow results, I decided to post a note to the net.  I
am sorry that the note may have made the 825 and HP look
unjustifiably bad (but it was their machine).  Hal is now trying to
find out why the 825 didn't report something was not working
correctly.

The numbers below show much better results.  I have also included
numbers for an 840 (thanks to: Jarmo Sorvari, Finland) and an 835
(thanks to: Bob Montgomery <hpfcse!hpfcmr!bobm>)

It looks like the 835 is a fast machine!




             A Multitasking, CPU intensive Benchmark
                         (03/31/88)

                          Real Time
-----------------------------------------------------------------------
                                  Number of Tasks
System                 1     2     3     4     5      7     10    12
-----------------------------------------------------------------------
HP9000/835 HP-UX:2.0  0.5   1.0   1.5   2.0   2.4    3.4    4.9    5.9
HP9000/825 HP-UX:1.2  1.9   3.8   5.7   7.6   9.5   13.3   19.1   22.8
HP9000/500 3 CPUs     5.9   6.0   6.3   8.4  10.5   14.7   21.5   27.8
HP9000/840 HP-UX:1.2  2.1   3.9   5.8   7.8  10.9   16.2   26.0   32.3
-----------------------------------------------------------------------
                           CPU Time (user + sys)
-----------------------------------------------------------------------
                                  Number of Tasks
System                 1     2     3     4     5      7     10     12
-----------------------------------------------------------------------
HP9000/835 HP-UX:2.0  0.4   1.0   1.4   1.9   2.4    3.4    4.9    5.9
HP9000/825 HP-UX:1.2  1.9   3.8   5.6   7.5   9.4   13.2   18.9   22.7
HP9000/500 3 CPUs     5.8  11.8  18.4  24.4  30.7   43.0   62.2   81.4
HP9000/840 HP-UX:1.2  2.0   3.9   5.9   8.0   9.8   13.6   19.6   24.6
-----------------------------------------------------------------------

NOTES:

HP9000 series 835 (HP Precision Architecture, RISC machine)
            HP-UX 2.0.  HP Fort Collins machine, conducted by 
            Bob Montgomery: hpuecoa!hpfcse!hpfcmr!bobm Mar 31, 1988
            (the twelve task values were extrapolated from 10 tasks)

HP9000 series 825 (HP Precision Architecture, RISC machine)  16 MBytes of
            main memory, single 404MByte disk drive.  HP Demo, 3/31/88.
            HP-UX 1.2.

HP9000/500: 6.5 MBytes main memory, 3 floating point CPUs, 65MByte
            system, 55MByte /tmp disk, 132MByte user disk, 571MByte
            data disk (Used by virtual memory), HP-UX 5.21.

HP9000 series 840 (HP Precision Architecture, RISC machine, TTL
            technology).  8 Mb of main memory, single 570 Mb disk
            drive.  HP-UX 1.2.  Tests run with a very small load, five
            users logged in (using a Bridge terminal server, and
            ARPA/Berkeley running in the 840).  HP gives the machine
            the nominal performance index 4.5 MIPS, as opposed to the
            7 MIPS for the 825.  FROM:
            Jarmo Sorvari           Control Engineering Laboratory
            !mcvax!tut.fi!jarmo     Tampere University of Technology
                                    BOX 527, 33101 Tampere, Finland



CONCLUSIONS:
The 500 is still a good machine.  BRING BACK THE 500!!!!
Or: give the 500 owners a deal they can't refuse on 835's!

Thanks to Hal and Bob at HP.  Everyone has been very courteous.  HP
has some very good products.  It is unfortunate that I ran into what
is most likely a very unusual problem that resulted in a lot of
confusion.

Roger N. Clark
U.S. Geological Survey, MS 964
Box 25046 Federal Center
Denver, CO 80225-0046
{known-world}!hplabs!hpfcla!hpfcse!hpuecoa!bgphp1!rclark
(Any opinions expressed here are mine and not necessarily those of
the USGS)

rclark@bgphp1.UUCP (Roger N. Clark) (04/01/88)

I should qualify the results of the 500 versus 825 speeds.  If you
do not have a fully configured 500 (3 floating point cpus, 6+
megabytes of memory) and run multitasking, then the 825 might be a
real benefit.  After all it is about 3x faster than a single cpu
500.  It just happens that in my case I have about 10 people sharing
the cpus.  How many scientists/engineers can afford $50k+ machines
dedicated to one person, one task?

daryl@hpcllcm.HP.COM (Daryl Odnert) (04/04/88)

Word has been circulating here at HP that the Roger Clark's S825 had
a bad floating-point coprocessor in it.  One of the features of the
system, however, is that if no floating-point coprocessor is present
in the hardware, the floating-point instruction are emulated in
software.  This accounts for the poor performance of this benchmark.
Apparently, the system believed that no coprocessor was avaiable.

Can you verify this for those of us who are following this on notes, Roger?

Thanks,
Daryl Odnert
HP Computer Language Lab
hplabs!hpcllcm!daryl

daryl@hpcllcm.HP.COM (Daryl Odnert) (04/04/88)

> HP 825 math is about 1.2x FASTER than a 500 (for the box filter benchmark).

Roger... did you optimize the application this time (using the -O option)
before doing the timings?  Your posting on 3/31 did not say whether or not
optimization was selected.

Daryl Odnert
HP Computer Language Lab
hplabs!hpcllcm!daryl

campbelr@hpsel1.HP.COM (Bob Campbell) (04/05/88)

> . . . . . . . . . . . . . . .  I am concerned (and so are my users)
> that any replacement machine be as solid as the 500 (e.g. as of this
> writing we have been up 136 days and have never had a crash).
> 
> Can anyone tell me their experience with the HP9000 series 825 or
> 300 (or for that matter other machines in this price/speed class)?
> Are there other machines that are as solid as the 500?
> 
> Roger N. Clark
> U.S. Geological Survey, MS 964
> Box 25046 Federal Center
> Denver, CO 80225-0046
> {known-world}!hplabs!hpfcla!hpfcse!hpuecoa!bgphp1!rclark
> (Any opinions expressed here are mine and not necessarily those of
> the USGS)
> ----------

We recently celebrated the fact that 800 series computers had been shipping
for one year with no failures.  The testers of HP have little desire to rest
on past accomplishments and would always like to hear of problem areas.

I believe that in the area of powerfail recovery, the 800 series may
be the most reliable system yet.  Of course I am biased and the 300 series
folks might have a thing to say :-)  Hopefully the responses to the problem
left you with a feeling that we do try to work for our reputation.

Bob Campbell                Some times I wish that I could stop you from 
campbelr@hpda.hp.com        talking, when I hear the silly things you say.
Hewlett Packard                                    - Elvis Costello
HP-UX System Interface & Recovery Testing

rclark@bgphp1.UUCP (Roger N. Clark) (04/05/88)

> Word has been circulating here at HP that the Roger Clark's S825 had
> a bad floating-point coprocessor in it. ...
> 
> Can you verify this for those of us who are following this on notes, Roger?
> 
> Thanks,
> Daryl Odnert
> HP Computer Language Lab
> hplabs!hpcllcm!daryl

That is correct (except that it was HP's 825!).

I think my last posting should have cleared up the
confusion.  Again, sorry for the problems.  But for HP to answer:
shouldn't the 825 gone through some sort of check at boot time and
told us if something was wrong?  The 825 math seems to be about 3x
faster than a single cpu 500 for problems involving normal math + -
/ *) on arrays.

Roger N. Clark

rclark@bgphp1.UUCP (Roger N. Clark) (04/06/88)

> Roger... did you optimize the application this time (using the -O option)
> before doing the timings?  Your posting on 3/31 did not say whether or not
> optimization was selected.

I keep no flags turned in as default in the make file because every
machine is different.  I take every edvantage of the particular
machine, so if it has optimization, I use it.   If it has a floating
point accelerator (e.g. Sun 3s, HP 9000/350s) I use it too.
Roger

rclark@bgphp1.UUCP (Roger N. Clark) (04/06/88)

> We recently celebrated the fact that 800 series computers had been shipping
> for one year with no failures.
> 
> Bob Campbell                Some times I wish that I could stop you from 
> campbelr@hpda.hp.com        talking, when I hear the silly things you say.
> Hewlett Packard                                    - Elvis Costello
> HP-UX System Interface & Recovery Testing

That is incredible!  But do I take it the HP Neeley Sales Region
825 that I did my benchmark on with the bad floating point is the
FIRST failure?  Or does that machine not count (because it wasn't
shipped to a customer)?  In any event, it is very impressive.  Could
HP give some indication of how many 825 years that is (at least to
an order of magnitude or so)?  What is the MTBF for a typical cpu +
memory + I/O boards?

Roger N. Clark
bgphp1!rclark

wunder@hpcea.CE.HP.COM (Walter Underwood) (04/09/88)

   But for HP to answer: shouldn't the 825 gone through some sort of
   check at boot time and told us if something was wrong? 

   Roger N. Clark

That question has already come up internally.  The system already
does the check -- note that it automatically decided to emulate
the FP stuff when it noticed that the FP unit was broken.

Obviously, the system should log an error.

wunder

irf@kuling.UUCP (Bo Thide) (04/20/88)

In article <8870004@hpsel1.HP.COM> campbelr@hpsel1.HP.COM (Bob Campbell) writes:
>> . . . . . . . . . . . . . . .  I am concerned (and so are my users)
>> that any replacement machine be as solid as the 500 (e.g. as of this
>> writing we have been up 136 days and have never had a crash).
>> 
>> Can anyone tell me their experience with the HP9000 series 825 or
>> 300 (or for that matter other machines in this price/speed class)?
>> Are there other machines that are as solid as the 500?
>> 
>
>I believe that in the area of powerfail recovery, the 800 series may
>be the most reliable system yet.  Of course I am biased and the 300 series
>folks might have a thing to say :-)  Hopefully the responses to the problem

I have had my 350 for 5 months by now and, wow, am I pleased with it.
Not a single problem so far.  The machine booted up in late November and
the HP-UX hasn't crashed once. (I also have the Pascal Workstation and
HP BASIC operating systems installed on the same disc as HP-UX and Pascal
has crashed twice, probably since I have been playing around with special
ADC and FFT hardware which I'm trying to connect to the DIO bus).

Already from the start I found the 350 to be a VERY FAST machine and after
installing the HP FPA it really flies.  Below are some simple 350
benchmarks and comparisons with the 540.

-Bo

------------------------------------------------------------------------------
    Benchmark results for HP9000/350 (8 MByte DRAM) running HP-UX 5.5.

Below are printouts from standard FORTRAN77 programs with straight ANSI code
(compiled with 'f77 -O') according to "HP 9000 Computers Series 200 and 500
Performance Guide" (HP 5953-9405 11/83).  The program contains 5 consecutive
DO-loops.  The first loop, used for estimating loop overhead, only assigns
a constant to a dummy variable.  The other loops do the same assigments plus
additions, subtractions, multiplications, and divisions, respectively.  All
loops are run 1 000 000 times and are timed individually by using the
internal clock and are corrected for the loop overhead.

The programs were run in a 16 user HP-UX Unix environment but with real-time
priority ('rtprio') = 0.  No assembler code or any other tricks were used.

--------------------------------------------------------------------bt-880216-

Without floating point accelerator:

   Loop overhead is        .48 seconds
   Time for 1000000 REAL*4 adds is       3.95 seconds
   Time for 1000000 REAL*4 subtracts is       3.95 seconds
   Time for 1000000 REAL*4 multiplys is       4.35 seconds
   Time for 1000000 REAL*4 divides is       4.72 seconds

   Loop overhead is        .50 seconds
   Time for 1000000 REAL*8 adds is       3.92 seconds
   Time for 1000000 REAL*8 subtracts is       3.93 seconds
   Time for 1000000 REAL*8 multiplys is       4.70 seconds
   Time for 1000000 REAL*8 divides is       6.25 seconds


With floating point accelerator:

   Loop overhead is        .47 seconds
   Time for 1000000 REAL*4 adds is        .82 seconds
   Time for 1000000 REAL*4 subtracts is        .80 seconds
   Time for 1000000 REAL*4 multiplys is        .82 seconds
   Time for 1000000 REAL*4 divides is       1.97 seconds

   Loop overhead is        .47 seconds
   Time for 1000000 REAL*8 adds is        .83 seconds
   Time for 1000000 REAL*8 subtracts is        .82 seconds
   Time for 1000000 REAL*8 multiplys is        .87 seconds
   Time for 1000000 REAL*8 divides is       3.33 seconds

------------------------------------------------------------------------

For comparison, here is the same test run on an HP9000/540 with
a FOCUS II CPU (including FPA):

   Loop overhead is       4.52 seconds
   Time for 1000000 REAL*4 adds is       3.15 seconds
   Time for 1000000 REAL*4 subtracts is       2.70 seconds
   Time for 1000000 REAL*4 multiplys is       3.30 seconds
   Time for 1000000 REAL*4 divides is       4.50 seconds

   Loop overhead is       4.52 seconds
   Time for 1000000 REAL*8 adds is       3.90 seconds
   Time for 1000000 REAL*8 subtracts is       3.60 seconds
   Time for 1000000 REAL*8 multiplys is       4.18 seconds
   Time for 1000000 REAL*8 divides is       5.23 seconds

------------------------------------------------------------------------
-- 
>>> Bo Thide', Swedish Institute of Space Physics, S-755 90 Uppsala, Sweden <<<  Phone (+46) 18-300020.  Telex: 76036 (IRFUPP S).  UUCP: ..enea!kuling!irfu!bt