[comp.arch] For a good time, read...

bcase@Apple.COM (Brian Case) (04/06/88)

There is a wonderful article in EE Times this week.  Starting on page 49
and continuing on page 54, the article, entitled "CISC beats RISC in test,"
sumarizes the results of a battery of tests performed by Neal Nealson &
Associates.  They compared comparably-configured (say it three times fast)
workstations.  The SUN-3 was a 25 MHz CPU with 16 Meg of memory.  The other
computers were the two models of the IBM RT (slug city), the Intergraph
32C (slightly less sluggish), the MIPS M-500, the SUN-4, and HP's 9000 and
825.  The results seem to show that as the number of running processes
goes up, the advantage of RISC drops.  The crossover point was often 12
processes (the UNIX kernels of the RISC machines must have had a clause
"if (procs >= 12) becomeCISC ();"  :-) :-).  On at least one test, the
SUN-3 ran 18 times faster than the Intergraph! I recommend this article
for some amusing reading!

Of course, I suspect the SUN-3 kernel is highly tuned and the others are
not as much so; also, what disk interface do these machines use?  SCSI,
ESDI, SMD?  And, note that the SUN is running at 25 MHz, while the MIPS
and IBM systems are running 8 and 10 MHz (6 MHz for the old model!).
Also note that the Intergraph is running at 30 MHz!

Whoever uses the highest-speed disk interface will likely win.  So, this
is probably less of a processor comparison than a system comparison.  Read
the article, then make comments.

csg@pyramid.pyramid.com (Carl S. Gutekunst) (04/06/88)

In article <7841@apple.Apple.Com> bcase@Apple.COM (Brian Case) writes:
>There is a wonderful article in EE Times this week.  Starting on page 49
>and continuing on page 54, the article, entitled "CISC beats RISC in test,"
>sumarizes the results of a battery of tests performed by Neal Nealson &
>Associates.

I have not seen the article; I am familiar with the Neal Nelson benchmark and
would not take seriously any conclusions drawn from it.

The code is written to be "unoptimizable." What this means is that is is com-
pletely non-structured -- FORTRAN with semicolons, vast runs of intertangled
"if () goto" and labels, no locality to speak of. Totally contrary to normal C
programming methodology, and surely a nightmare for the instruction cache.

Far worse, it uses *NO* local variables. Everything is declared static. This
kills machines that depend on storing frequently-used variables in registers
(and do not have register allocation in the loader).

In both cases -- no locality, no automatic variables -- the benchmark is using
constructs that RISC architects have determined (correctly) are not typically
used in "real" code. No wonder the RISC machines come out looking bad!

(Disclaimer: These are my personal observations, which have no relationship to
the opinions of Pyramid Technology, and are based almost entirely on hearsay.)

<csg>

walter@garth.UUCP (Walter Bays) (04/07/88)

In article <7841@apple.Apple.Com> bcase@Apple.COM (Brian Case) writes:
> There is a wonderful article in EE Times this week.  Starting on page 49
> and continuing on page 54, the article, entitled "CISC beats RISC in test,"
> sumarizes the results of a battery of tests performed by Neal Nealson &
> Associates.

Although there are a number of weaknesses in the Neal Nelson benchmarks,
they make a significant contribution in assessing multi-user
performance.  Even single-user workstations will usually have multiple
processes running.  Simple benchmarks like Whetstone and Dhrystone are
not by themselves good predictors of application performance in such an
environment.

If you're selecting a system, your benchmark should accurately
represent your real workload:  single user, multi-user, embedded
system, floating point, integer, array references, etc.  ONE SIZE DOES
NOT FIT ALL.  Different CPU's are better for different applications.

The Neal Nelson benchmarks were developed when CPU's were much slower,
and timing accuracy has degraded accordingly.  On fast CPU's, times
reported for one copy of the program are generally around 1 second,
plus or minus 1 second.  Neal Nelson points this out, and recommends
that results between 15 and 20 copies be compared for greater
accuracy.  These times are still often in the 10-60 second range, so
the accuracy is less than it should be.  However at 15-20 concurrently
active users - 30-200 logged users - memory, paging, and disk effects
dominate over CPU speed.

> They compared comparably-configured (say it three times fast)
                ^^^^^^^^^^ ^^^^^^^^^^
> workstations.  The SUN-3 was a 25 MHz CPU with 16 Meg of memory.  The other
> computers were the two models of the IBM RT (slug city), the Intergraph
> 32C (slightly less sluggish), the MIPS M-500, the SUN-4, and HP's 9000 and
                     ^^^^^^^^ [see published results below]
> 825.

They are _not_ comparably configured workstations.  6 MB is _not_
comparable to 32 MB when running multi-user applications.  Though
the article didn't give details of machine configurations, the
Intergraph appears to be an old model with 6 MB of memory and old
system software.  The article does not state which Sun models
were tested, but appears to be based on a Neal Nelson report comparing
a Sun 3/260 with 16 MB of memory and a 20 ms disk against a Sun 4/280
with 32 MB of memory and an 18 ms disk.  If the results were from the
less expensive Sun 4/110 which has no cache, we would expect the 4/280
to run faster.  MIPS has two models above the M-500.

Both in Intergraph workstations, and in Clipper PC-AT add-in cards, we
generally use 4-6 MB for single user machines, and 8-16 MB for several
users.  Most of Intergraph's current models come with 16-80 MB.

> The results seem to show that as the number of running processes
> goes up, the advantage of RISC drops.  The crossover point was often 12
> processes (the UNIX kernels of the RISC machines must have had a clause
> "if (procs >= 12) becomeCISC ();"  :-) :-).

The only results published in EE Times (4/4/88) were for an unspecified
benchmark, but it's probably "Test 1", a "normal" mix of calculations
and I/O:

# of simultaneous
copies
              IBM  Intergraph  MIPS           IBM     HP-9000   HP
      Sun-3   RT-25    32C    M-500   Sun-4  RT-115    /840     825
 1       2      12       2       3       2       4       4       2
 3       6      37       5       6       6      12      10       7
 5       6      65      10      10       9      20      18      11
 7      12      87      12      14      13      27      24      15
 9      15     113      17      18      17      36      31      19
11      19     135      21      22      21      44      37      24
13      23     163      24      26      25      53      44      28
15      26     192      30      29      30      62      50      32

Averaging these results for each machine gives:

              IBM  Intergraph  MIPS           IBM       HP      HP
copies Sun-3   RT-25    32C    M-500   Sun-4  RT-115    9000    825
 8      13.6   100.5    15.1    16      15.4    32.2    27.2    17.2

On this benchmark, Intergraph is the fastest of the RISC machines.
That hardly supports the characterization of it as "sluggish".  Does
this mean that Intergraph is faster than all the other RISC machines on
every workload?  Of course not!  Does it mean that the Sun-3 is faster
than all RISC machines on every workload?  Of course not!!

> On at least one test, the SUN-3 ran 18 times faster than the Intergraph!

Most likely, with many copies of a large program running, the 6 MB
Intergraph was paging itself to death, while the 16 MB Sun-3 ran in
memory.  The 24 MB HP-9000 and 32 MB Sun-4 were probably quite happy,
too.

> I recommend this article for some amusing reading!
>
> Of course, I suspect the SUN-3 kernel is highly tuned and the others are
> not as much so; also, what disk interface do these machines use?  SCSI,
> ESDI, SMD?  And, note that the SUN is running at 25 MHz, while the MIPS
> and IBM systems are running 8 and 10 MHz (6 MHz for the old model!).
> Also note that the Intergraph is running at 30 MHz!

The old 32C's, though fast for 1985, used some fairly slow disks, slow
compilers, slow I/O co-processor, and untuned kernels compared to
current models.  Also, current Intergraph Clipper C100 models run at 33
MHz.  The new Clipper C300 (3Q88) runs at 50 MHz, and has some other
internal speed-ups.

> Whoever uses the highest-speed disk interface will likely win.  So, this
> is probably less of a processor comparison than a system comparison.

Right.  It's a system comparison, and the systems are not comparably
configured.  Neal Nelson and EE Times make a valid point, that the
performance advantage of RISC generally lessens with increased user
load.  This point deserves more discussion in this forum.  There are
two main architectural reasons for this effect, both of which were
specifically addressed in the design of the Clipper.

1) A load/store architecture depends on moving heavily used variables
to registers (via optimizing compilers) or to cache memory.  Context
switches tend to flush the cache and require saving registers.  The
Clipper uses a 2-way set associative cache instead of a direct mapped
cache.  A separate register set is provided for supervisor mode so,
although you have to save registers once per context switch, you don't
have to save them twice.

2) A sliding register window provides fast subroutine calls if the
depth is not greater than the number of hardware levels and does not
change too often (perfect for Dhrystones).  However, using register
windows, context switches require an excessive number of register
saves.  (We have seen the Sun-4 run 12 times the speed of a 780 on
Dhrystones, yet slower than a 780 on context switching.)  The Clipper
uses a conventional register architecture.

> Read the article, then make comments.

You certainly manage to instigate some lively discussions on the net.  I'm
sure this one will be no exception.
-- 
------------------------------------------------------------------------------
Any similarities between my opinions and those of the
person who signs my paychecks is purely coincidental.
E-Mail route: ...!pyramid!garth!walter
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303
Phone: (415) 852-2384
------------------------------------------------------------------------------

alan@pdn.UUCP (Alan Lovejoy) (04/07/88)

In article <7841@apple.Apple.Com> bcase@Apple.COM (Brian Case) writes:
>There is a wonderful article in EE Times this week.  Starting on page 49
>and continuing on page 54, the article, entitled "CISC beats RISC in test,"
>sumarizes the results of a battery of tests performed by Neal Nealson &
>Associates.  They compared comparably-configured (say it three times fast)
>workstations.  The SUN-3 was a 25 MHz CPU with 16 Meg of memory.  The other
>computers were the two models of the IBM RT (slug city), the Intergraph
>32C (slightly less sluggish), the MIPS M-500, the SUN-4, and HP's 9000 and
>825.  

The other point that Nelson & Associates made was that the Sun-3 was
faster than most or all of the RISCs on some of their integer math
benchmarks (regardless of the number of running processes).  They pointed 
out that the apparent reason for this was that THEIR integer benchmarks
force lots of register-memory data shuffling (instead of keeping most
data in registers for extended periods).  They contend that this is the
more common case in business programming (I have no opinion on that).

What do y'all think of the proposition that business programs do more
memory-register data shuffling?  Could that cause a shift in the
"balance of power" between CISCs and RISCs?  Why or why not?  And if
not, how do you explain these results? Are their benchmarks simply 
flawed? (Careful, these people have a good reputation for knowing what
they're doing/talking-about).   

If Nelson & Associates have published the source code for their
benchmarks, perhaps someone could post them to the net for our
edification?

-- Alan "Inquiring minds want to know" Lovejoy
-- alan@pdn 

proctor@ingr.UUCP (John Proctor) (04/08/88)

The previously quoted EE Times article is interesting from several points of
view:

	1) No comparison is made regarding file systems, that is whether
	   it is the standard AT&T, Berkley FFS or some other. This has a
	   very real bearing on the performance issue.

	2) No indication is given on architectural issues such as who is doing
	   the I/O operation. For example in the SUN case either the 68020 or
	   the SPARC chip handles all the I/O. In the Intergraph case an
	   80186 or 80386 does the I/O depending on the model.

	3) As for raw speed, there were figures in the report which indicate
	   the Intergraph machine up to 111 percent faster in 'computationally
	   intensive' applications. So much for picking individual numbers!

All in all, the article had very little to do with performance issues as it
failed to pinpoint architectural features and their performance impact both
positive or negative! It was also guilty of very selective quoting thus 
placing doubt over the validity of the whole article.

Let the flames begin!


John D. Proctor		|  Usenet: {ihnp4,uunet}!ingr!jdp!proctor
Intergraph Corp.	| ARPAnet: uu.net.uunet@ingr!jdp!proctor
			| US Post: 1 Madison Industrial Park
Usual Disclaimers Apply	|	   Huntsville, AL 35807-4201

A foolish consistency is the hobgoblin of little minds, adored by little 
statesmen and philosophers and divines.

		"Self Reliance" by Ralph Waldo Emerson

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/08/88)

In article <592@garth.UUCP> walter@garth.UUCP (Walter Bays) writes:
[ on the topic of memory vs. users ]

I think that this is a good point, independent of your comments on
benchmarking. As the number of users (on most systems) becomes larger,
the memory per user becomes less.

This is caused by text sharing. When only a few users are using the
machine, it is not likely that they will be running the same program at
the same time. However, as the number of users increases, there are
certain programs which are in use by a lot of users. These would be the
shell(s) popular at the site, vi, emacs, or other editor, perhaps the C
compiler, perhaps nroff, etc. This is one of the few saving graces of a
high user load.

This applies to BSD and SysV machines with users doing at least somewhat
the same thing.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

brucek@hpsrla.HP.COM (Bruce Kleinman) (04/08/88)

From the EE Times article in question ...
+-------
| The tests compared the Sun-3 to .... Hewlett-Packard's 9000 and 825 ....
+-------
The HP 9000 is a SERIES, not a computer.  The 9000 series encompasses a wide
variety of 68020 based boxes (models 330, 350, etc) and Preision Architecture
based boxes (models 825, 850, etc).  Does anyone know which machine 'HP9000'
is alluding to?

And my favorite quote from the same ...
+------
| "But we still haven't seen any areas of study that say RISC has been 
| implemented and shown a marked improvement in real world applications."
+------
Remember that there are those who believe that we still haven't seen any
areas of study that say that smoking is bad for your health. :-)

I can see it now, on the top of every 80386 and 68020 ...
        The Surgeon General has determined that CISC architectures can
        be hazardous to your health.

dave@micropen (David F. Carlson) (04/08/88)

In article <2737@pdn.UUCP>, alan@pdn.UUCP (Alan Lovejoy) writes:
> In article <7841@apple.Apple.Com> bcase@Apple.COM (Brian Case) writes:
> >There is a wonderful article in EE Times this week.  Starting on page 49
> >and continuing on page 54, the article, entitled "CISC beats RISC in test,"

> not, how do you explain these results? Are their benchmarks simply 
> flawed? (Careful, these people have a good reputation for knowing what
> they're doing/talking-about).   
> 
> -- alan@pdn 

After this past winters benchmarks of 16 bit OS/2 and 32 bit XENIX showing
that (*suprise*) the 32-bit code runs faster by a factor almost identical
to the empirical speed difference that 16bit UNIX vs. 32 bit UNIX does on
the 80386.  The fact that Nelson published these benchmarks and only 
afterward acknowledged that it was apples and oranges doesn't boost my 
confidence in their "reputation for knowing what [their] talking about."



-- 
David F. Carlson, Micropen, Inc.
...!{ames|harvard|rutgers|topaz|...}!rochester!ur-valhalla!micropen!dave

"The faster I go, the behinder I get." --Lewis Carroll

bcase@Apple.COM (Brian Case) (04/09/88)

In article <2006@ingr.UUCP> proctor@ingr.UUCP (John Proctor) writes:
>The previously quoted EE Times article is interesting from several points of
>view:
>
>All in all, the article had very little to do with performance issues as it
>failed to pinpoint architectural features and their performance impact both
>positive or negative! It was also guilty of very selective quoting thus 
>placing doubt over the validity of the whole article.

Right;  I want to say that I didn't mean to pick on the Clipper; it is
obvious to me that the 18 times slower number is really far-fetched; and
also, I didn't mean for anyone to think that I thought this article has
any validity.  I just found it amusing and thought others would too.