[comp.arch] Snake

cgy@cs.brown.edu (Curtis Yarvin) (03/22/91)

In today's New York Times, there is an article about the new HP Snake line.
The story places the low-end Snake (720?) at 57 MIPS, 55 Specmarks for
$12,000.

This will obviously cramp the digestion of competing workstation makers.

Does anyone know how these numbers were achieved?  Are they misleading in
any way?

Curtis

"I tried living in the real world
 Instead of a shell
 But I was bored before I even began." - The Smiths

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/22/91)

In article <69465@brunix.UUCP> cgy@cs.brown.edu (Curtis Yarvin) writes:
| In today's New York Times, there is an article about the new HP Snake line.
| The story places the low-end Snake (720?) at 57 MIPS, 55 Specmarks for
| $12,000.

  Assuming that this is what it sounds like, the next question is
software. Does it run UNIX, and have X, and have {name it} application
software? 

  The workstation market can be divided into people who have source for
everything they run and are buying raw MIPS, and people who run
applications like Maxima, Interleaf, troff, etc, who are not in the
market for hardware which doesn't support their application.

  Depending on the software support this machine may not currently be a
player in the second market. This has happened to IBM somewhat, although
they have the money to pay someone to port an application.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

spot@CS.CMU.EDU (Scott Draves) (03/22/91)

In article <3284@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:

   In article <69465@brunix.UUCP> cgy@cs.brown.edu (Curtis Yarvin) writes:
   | In today's New York Times, there is an article about the new HP Snake line.
   | The story places the low-end Snake (720?) at 57 MIPS, 55 Specmarks for
   | $12,000.

     Assuming that this is what it sounds like, the next question is
   software. Does it run UNIX, and have X, and have {name it} application
   software? 

I don't know, but it will probably run HP-UX like all its
predecessors.  Now, whether or not you call that unix is another
question... :) But seriously, HP-UX is a rather dusty, but reliable
version of SysV.  I'd live with it to get that much CPU.

My questions are:

When will they be available in volume, ie, when does it become real?
Until then, it's just a marketdroid scheme to hurt the competition and
grab publicity.

and

What technology/process are they using?  One chip?  Clock?
--

			christianity is stupid
Scott Draves		communism is good
spot@cs.cmu.edu		give up

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (03/23/91)

In article <3284@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>In article <69465@brunix.UUCP> cgy@cs.brown.edu (Curtis Yarvin) writes:
>| In today's New York Times, there is an article about the new HP Snake line.
>| The story places the low-end Snake (720?) at 57 MIPS, 55 Specmarks for
>| $12,000.
>
>  Assuming that this is what it sounds like, the next question is
>software. Does it run UNIX, and have X, and have {name it} application
>software? 

HP-UX <gag> initially. Berkeley 4.4 and OSF/1 to be available when released.
There is X. Not good X, but X. 

Since it's a member of the HP-PA RISC club, there is (already) a large base of
software developed for the machine. 

>  The workstation market can be divided into people who have source for
>everything they run and are buying raw MIPS, and people who run
>applications like Maxima, Interleaf, troff, etc, who are not in the
>market for hardware which doesn't support their application.
>
>  Depending on the software support this machine may not currently be a
>player in the second market. This has happened to IBM somewhat, although
>they have the money to pay someone to port an application.

Already a developed base of software, due to the earlier members of the HP-PA
club, so I understand. But, even if there are software porting problems
(doubtful, HP tends to take pains on this), 57 MIPS for $12K will get a lot of
people working on porting real quick. The floating point on the mid-level 
"snake" is supposed to be obscenely high. I have a friend at NWSC who is going
to purchase one.
..........................

The real question is: What will Digital do to save their bacon, while HP and
IBM explore performance, and Sun keeps chugging along in a commodity market?

     Reform may be dying in the Soviet Union, but we have the right to 
                introduce it to the DECUS Board of Directors. 

  -- >                  SYSMGR@CADLAB.ENG.UMD.EDU                        < --

jlol@REMUS.EE.BYU.EDU (Jay Lawlor) (03/23/91)

>>>>> On 22 Mar 91 13:58:08 GMT, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) said:

> In article <69465@brunix.UUCP> cgy@cs.brown.edu (Curtis Yarvin) writes:
> | In today's New York Times, there is an article about the new HP Snake line.
> | The story places the low-end Snake (720?) at 57 MIPS, 55 Specmarks for
> | $12,000.

Bill>   Assuming that this is what it sounds like, the next question is
Bill> software. Does it run UNIX, and have X, and have {name it} application
Bill> software? 

Bill>   The workstation market can be divided into people who have source for
Bill> everything they run and are buying raw MIPS, and people who run
Bill> applications like Maxima, Interleaf, troff, etc, who are not in the
Bill> market for hardware which doesn't support their application.

Bill>   Depending on the software support this machine may not currently be a
Bill> player in the second market. This has happened to IBM somewhat, although
Bill> they have the money to pay someone to port an application.
Bill> -- 
Bill> bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)

Well...  The one I tested (720, the low end desktop model) was running
HPUX, just like the 9000/800 series.  It ran binaries from our 835
without recompilation, although floating point seemed faster after
recompiling.  Times for the code I ran (floating point intensive) were
about the same as our RS/6000 540.  Very fast.

The X windows performance (using Motif window manager) was the best
I've seen on any machine, although X isn't exactly efficient for lots
of things.

linley@hpcuhe.cup.hp.com (Linley Gwennap) (03/26/91)

(Curtis Yarvin) asks:
In today's New York Times, there is an article about the new HP Snake line.
The story places the low-end Snake (720?) at 57 MIPS, 55 Specmarks for
$12,000.

This will obviously cramp the digestion of competing workstation makers.

Does anyone know how these numbers were achieved?  Are they misleading in
any way?
----------

Yes, they are misleading.  The performance on real applications (not
toy benchmarks) is actually significantly *higher* due to the much larger
caches (128KB I/256KB D) than competing systems.

The CPU in the Model 720 is a traditional RISC implementation (no "super"
stuff) that runs at 50 MHz using fairly standard 1.0 micron CMOS process. 
I could go into great detail, but basically the high performance is due
to eliminating most pipeline interlocks and keeping the chip simple
enough to allow the high clock frequency.  The compilers have also been
closely tuned to the hardware.  The PA-RISC instruction set is rich
enough to offer some instruction-level parallelism (e.g. COMPARE-AND-
BRANCH, ADD-AND-BRANCH instructions) so that superscalar complexities
are not needed.

Yes, these system run UNIX (HP-UX) and most common applications, over
2000 total.  All Series 700 systems are also source-code compatible
with our popular Motorola-based workstations.  By the way, if you need
more performance, a 66 Mhz Model 730 is available for $20,000.

						--Linley Gwennap
						  Hewlett-Packard

cs450a03@uc780.umd.edu (03/26/91)

Liney Gwennap writes:
  [ enough details on the Snake (+HP sig) to make it look like he knowns
    what he's talking about ]

How's I/O on this thing?  Would use as a fileserver be a shameful 
waste?

Raul Rockwell

burdick@hpspdra.HP.COM (Matt Burdick) (03/27/91)

> There is X. Not good X, but X.

What's wrong with it?  It's normal X11R4.

						-matt
-- 
Matt Burdick                |   Hewlett-Packard
burdick@hpspd.spd.hp.com    |   Intelligent Networks Operation

lewine@cheshirecat.rtp.dg.com (Donald Lewine) (03/27/91)

In article <32580004@hpcuhe.cup.hp.com>, linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:
|> 
|> Yes, these system run UNIX (HP-UX) and most common applications, over
|> 2000 total.  All Series 700 systems are also source-code compatible
|> with our popular Motorola-based workstations.  By the way, if you need
|> more performance, a 66 Mhz Model 730 is available for $20,000.
|> 
|> 						--Linley Gwennap
|> 						  Hewlett-Packard
|> 
Are the new PA machines binary compatible with the old ones?  Reading
the press, it sounds like there are new instructions in the new 
machines so that while the old programs will run, the full performance
will not be achieved unless you recompile.  Is this true?

--------------------------------------------------------------------
Donald A. Lewine                (508) 870-9008 Voice
Data General Corporation        (508) 366-0750 FAX
4400 Computer Drive. MS D112A
Westboro, MA 01580  U.S.A.

uucp: uunet!dg!lewine   Internet: lewine@cheshirecat.webo.dg.com

linley@hpcuhe.cup.hp.com (Linley Gwennap) (03/27/91)

Due to popular demand, here is an article comparing the new Snakes CPU to
IBM's "America" chip (used in the RS/6000 series).  I have deleted the
section on America.  I would be happy to post more info if this is useful.

						--Linley Gwennap
						  Hewlett-Packard
HP SNAKES CPU

HP's high-performance chip set consists of the  "Snakes"  CPU  chip  and  a
floating  point  coprocessor  ("FPC")  jointly developed with Texas Instru-
ments[1].  These are the first chips to implement the PA-RISC 1.1 architec-
ture.   They  use  a  traditional RISC approach to achieve industry-leading
performance of 72 SPECmarks with a 66 MHz clock.

PA-RISC 1.1, an extension to the original  PA-RISC  architecture,  includes
several  new instructions, many of which accelerate graphics operations[2].
A multiply-and-add instruction (as in IBM's POWER) is  included.  In  addi-
tion,  the  page  size was doubled to 4 KB to reduce the TLB miss rate, and
eight "shadow" registers were added to provide quick context switching  for
the TLB miss handler.

The CPU contains all integer  instruction  processing,  cache  control  and
memory  management  functions.   All  cache  memory is included in external
SRAMs connected directly to the CPU.  Snakes has a 64-bit path  to  the  D-
cache,  just  like  the  R4000.   Both  the I- and D-caches can be accessed
simultaneously, resulting in a total cache bandwidth of 792 MB  per  second
(peak).   The  FPC implements all floating point instructions.  It receives
instructions and data from the caches at the same time as the CPU, and  du-
plicates parts of the CPU's instruction pipeline, eliminating the penalties
often incurred by separate CPU and FPC chips.  Snakes is designed  to  work
with a variety of memory and I/O interfaces.

The CPU uses a five-stage pipeline to reduce cycle time.  The penalties  in
this  pipeline  have been minimized.  For example, conditional branches are
executed with no delay if their outcome is predicted  correctly,  and  with
only  a  single  cycle penalty otherwise.  The branch prediction algorithm,
more advanced than America's, predicts forward branches to be  untaken  and
backward  branches  taken, thus optimizing for loops. The load penalty is a
maximum of one cycle and the store penalty a maximum of two;  these  penal-
ties can usually be avoided by the compiler. All other integer instructions
(except a few rare system control functions) are always executed in a  sin-
gle  cycle.   This uncomplicated design is reflected by a simple, efficient
compiler.

Although Snakes is not superscalar, PA-RISC instructions such  as  ADD  AND
BRANCH,  MOVE  AND  BRANCH and COMPARE AND BRANCH allow a similar amount of
parallelism as America for integer-only applications; in fact, the ratio of
Integer  SPECmarks  to  MHz  for  Snakes (65/66) actually exceeds America's
(35/42).

FPC is a full 64-bit implementation.  It contains  two  parallel  execution
units:   the ALU (addition, conversion) and the MPY unit (multiply, divide,
square root).  Each unit can start a new operation on every other cycle, so
FPC  can  accept one floating point instruction per cycle provided that ALU
and MPY instructions are alternated.

The external caches are direct mapped and are protected by  parity,  making
them  slightly less robust than America's ECC cache.  Cache coherency flags
are included to facilitate multiprocessor operation.  A write-back protocol
is  used  to reduce writes to main memory.  Although Snakes does not imple-
ment America's complex "critical word first" algorithm on cache misses,  it
will  begin  processing  as soon as the critical word is obtained, reducing
the miss penalty by as much  as  seven  cycles.   Snakes  supports  a  wide
variety  of  off-the-shelf  SRAMs  and can be configured with anywhere from
8 KB to 3 MB of external cache.  At  its  maximum  operating  frequency  of
66 MHz, it requires 12 ns SRAMs.

The I- and D-TLBs are fully associative and contain 96  entries  each.   In
addition, each TLB implements four variable size "block" entries capable of
mapping up to 16 MB each, which can be  used  for  large  portions  of  the
operating system and/or graphics frame buffers.  The memory system supports
48 bits (256 terabytes) of virtual address space and 32 bits  (4 gigabytes)
of  real address space.  (This is a subset of the full 64-bit virtual space
allowed by PA-RISC).  Two addressing modes support 1 GB or 4 GB  data  seg-
ments, significantly larger than America's segments.

A separate bus provides access to memory, I/O and,  if  desired,  graphics.
This bus is a synchronous, dedicated interface with a peak transfer rate of
264 MB per second, about one-half the speed  of  America's  memory  system.
The bus bandwidth is limited by its width of 32 bits, but a wider bus would
have required a larger, more expensive package.  Snakes's cache miss penal-
ty,  measured  in cycles, is much higher than America's, due to the shorter
clock cycle time. Snakes compensates for these penalties  by  allowing  for
large  external caches to reduce the miss rate; the performance numbers for
Snakes assume a 128 KB instruction cache and 256 KB data cache.

The CPU is fabricated in HP's CMOS-26 process (a  1.0 micron,  three  metal
layer  process)  and  packaged in a 408-pin PGA.  FPC is fabricated in TI's
0.8 micron CMOS process and placed in  a  207-pin  PGA.   These  PGAs  were
custom-designed  to  allow  high  frequency operation with wide CMOS buses.
The CPU contains about 577,000 transistors, while FPC  uses  640,000.   For
lower-cost  systems,  the  chip set is designed to run at frequencies below
66 MHz, allowing lower-speed SRAMs to be used.  FPC can also be  eliminated
to further reduce costs.

REFERENCES AND NOTES

[1]  "CMOS  PA-RISC  Processor  for  a  New  Family  of  Workstations"   by
M. Forsyth,  S. Mangelsdorf,  E. DeLano,  C. Gleason and J. Yetter, COMPCON
Spring 91 Digest of Technical Papers, February 1991.

[2] "Architecture and Compiler Enhancements for  PA-RISC  Workstations"  by
D. Odnert,  R. Hansen,  M. Dadoo and M. Laventhal, COMPCON Spring 91 Digest
of Technical Papers, February 1991.

darrylo@hpnmdla.hp.com (Darryl Okahata) (03/27/91)

In comp.arch, linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:

> (Curtis Yarvin) asks:
> > Does anyone know how these numbers were achieved?  Are they misleading in
> > any way?
>
> Yes, they are misleading.  The performance on real applications (not
> toy benchmarks) is actually significantly *higher* due to the much larger
> caches (128KB I/256KB D) than competing systems.

     I'd like to point out that the D-cache is 64-bits wide, to improve
floating-point performance.  The I-cache is only 32-bits wide, and comes
in either 128K or 256K configurations.

     -- Darryl Okahata
	UUCP: {hplabs!, hpcea!, hpfcla!} hpnmd!darrylo
	Internet: darrylo%hpnmd@relay.hp.com

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion or policy of Hewlett-Packard or of the
little green men that have been following him all day.

preston@ariel.rice.edu (Preston Briggs) (03/28/91)

linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:

[good info about the new HP chips]

>eight "shadow" registers were added to provide quick context switching  for
>the TLB miss handler.

I'm not sure I understand.  Could you expand slightly?

>conditional branches are
>executed with no delay if their outcome is predicted  correctly,  and  with
>only  a  single  cycle penalty otherwise.  The branch prediction algorithm,
>more advanced than America's, predicts forward branches to be  untaken  and
>backward  branches  taken, thus optimizing for loops.

The RS/6000 can rearrange loops so that there are no branch
delays (often with no branch cost at all).  That's hard to beat.
What happens with a fall-through and a forward branch?

>in fact, the ratio of Integer  SPECmarks  to  MHz  for  
>Snakes (65/66) actually exceeds America's (35/42).

Could you post results for individual SPEC programs (both int and float)?

>The external caches are direct mapped and are protected by  parity,  making
>them  slightly less robust than America's ECC cache. 

I would have liked some set-associativity too.  (I'm very greedy)

>will  begin  processing  as soon as the critical word is obtained, reducing
>the miss penalty by as much  as  seven  cycles.

What're the best and worst-case D-cache miss times (say, without writeback)?
Line length?  Will a cache-miss freeze the CPU or just lock the target
register?

>The I- and D-TLBs are fully associative

Hooray!

Thanks for the information.  Thanks also for the references.

Preston Briggs

jonathan@cs.pitt.edu (Jonathan Eunice) (03/28/91)

In article <69465@brunix.UUCP> cgy@cs.brown.edu (Curtis Yarvin) writes:

   In today's New York Times, there is an article about the new HP Snake line.
   The story places the low-end Snake (720?) at 57 MIPS, 55 Specmarks for
   $12,000.

   This will obviously cramp the digestion of competing workstation makers.

Yep. 

HP's low end is apparently performance competitive with high end of
IBM RS/6000 line (little better integer perf, not as good floating
point, dramatically better X perf), and much better than even the
highest-end workstations from Sun, DEC etc.  Look for serious
repricing, heavy discounting, and a i lot of worrying from
competitors.

   Does anyone know how these numbers were achieved?  Are they misleading in
   any way?

Not really; just good engineering, running real fast.  

HP-PA is a pretty good RISC design, and they've tweaked it with some
handy cache-usage optimizations and better floating point in version
1.1.  The main win seems to be high speed CMOS fabrication --
50-someodd MHz on the low-end, just over 65 MHz on the high end.
Another win is large caches, which should be very handy indeed for X,
GNU, and other poor-locality-of-reference software, not to mention
large data sets.  

These machines are more-or-less workstations, with workstation-sized I/O
capacity.  So, how they will compare to SMP machines and "real" minicomputers
on throughput-oriented jobs is still in question.  But, their CPU 
performance looks excellent.

samf@perform.dell.com (Sam Fuller) (03/29/91)

In article <7410003@hpnmdla.hp.com>, darrylo@hpnmdla.hp.com (Darryl Okahata) writes
  
|>      I'd like to point out that the D-cache is 64-bits wide, to improve
|> floating-point performance.  The I-cache is only 32-bits wide, and comes
|> in either 128K or 256K configurations.
|>

What does that mean? I assume a cache line is wider than 4 or 8 bytes.  Is this
the width of the processor to cache bus for I and D respectively?

Sam Fuller
Dell Computer
Advanced Systems 
samf@perform.dell.com
 
|>      -- Darryl Okahata
|> 	UUCP: {hplabs!, hpcea!, hpfcla!} hpnmd!darrylo
|> 	Internet: darrylo%hpnmd@relay.hp.com
|> 
|> DISCLAIMER: this message is the author's personal opinion and does not
|> constitute the support, opinion or policy of Hewlett-Packard or of the
|> little green men that have been following him all day.

jpk@ingres.com (Jon Krueger) (03/29/91)

How many have you shipped?

Who has replicated your SPECmarks?

-- Jon
--

Jon Krueger, jpk@ingres.com 

jonathan@cs.pitt.edu (Jonathan Eunice) (03/30/91)

spot@CS.CMU.EDU (Scott Draves) writes:

   I don't know, but it will probably run HP-UX like all its
   predecessors.  Now, whether or not you call that unix is another
   question... :) But seriously, HP-UX is a rather dusty, but reliable
   version of SysV.  I'd live with it to get that much CPU.

Yep, HP-UX and toward the end of the year, OSF/1.  HP-UX is a much-enhanced
BSD kernel with a System V environment above.  Rather spartan in BSD 
system calls and utilities, apparently.

   When will they be available in volume, ie, when does it become real?
   Until then, it's just a marketdroid scheme to hurt the competition and
   grab publicity.

My, aren't we cynical?  Not believing the vendors are we?  What'll
it be next, the government?  ;-)

According to HP, over 1,000 systems have apparently shipped, and
availability is either now or near-term (no latter than May, I belive)
for all the goods.  Not bad.