[comp.arch] Snakebytes

irf@kuling.UUCP (Bo Thide') (03/27/91)

Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let
loose, the official HP info has become available.  Some of this info follows.

There are three models, the desktop (114mm*508mm*470mm) 720 (Cobra) and
730 (King Cobra) and the deskside (610mm*220mm*595mm) 750 (Coral). They
come initially with HP-UX 8.01 to be upgraded to HP-UX 8.05 in June. Later
OSF/1 will be available.

Clock: 50 MHZ (720) or 66 MHz (730, 750)

Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.

Interfaces: SCSI-II, EISA, LAN, RS-232 (to 460.8 kbaud), HP-HIL, Centronics.
            HP-IB optional (via EISA!).

Monitors: 72 Hz, 19" 1280x1024 8-bit grayscale (GRX) or 8+8 color planes (CRX).

Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA.

Languages: C, C++, Pascal, FORTRAN, ANSI C, Assembler.  FORTRAN compiler
	   with "+800" option for series 800 compatibility. Series 800
	   binaries run on 700 series.


Performance (with HP-UX 8.05) and comparison with other workstations:
-----------------------------------------------------------------------------
                            SPEC        Khorner-       Linp2P  x11-  Dhry-
                        mark int  fp    stones   MIPS  MFLOPS  perf  stone2.0
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX    72.2 51.0 91.0  143974   76    22      10460  114680
HP9000/720 G/CRX        55.5 39.0 70.2  119213   57    17       8244   87000
IBM 6000/550            54.3 34.5 73.5   n/a     56    23       n/a    n/a
IBM 6000/320            24.6 16.3 32.4   54661   29.5   8.5     1520   45250
DECstation 5000/200PXGT 18.5 19.0 18.5   26456   24.2   3.7     3256   38760
DECstation 3100         11.3 11.8 10.9   15285   14.9   1.6     1702   23470
Sun SPARCstation 2GX    21.0 20.2 21.5   27142   28.5   4.2     n/a    35590
Sun SPARCstation IPC    11.8 12.4 11.4   13329   15.7   1.7     n/a    22830
-----------------------------------------------------------------------------
Linp2P = Linpack Double precision, 100*100 FORTRAN BLAS, rolled.
x11perf = geometric mean of the x11perf1.2 component tests (excluding 1
	  and 500 pixel tests).


Selected x11perf Tests:
-----------------------------------------------------------------------------
			         10 pixel  10*10   TR      create & map
			Dots     lines     rects   text    subwins (50 kids)
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX    1630000  911000    278000  273000  6000
HP9000/720 G/CRX        1260000  874000    272000  245000  4500
DECstation 5000/200PXGT  370000  455000    256000   90900  1750
Sun SPARCstation 2GX     101100  147000     83500   49000  1050
-----------------------------------------------------------------------------


Graphics Performance:
-----------------------------------------------------------------------------
                          2D floating       3D floating pt
		    	pt vectors/s      vectors/s (peak)
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX      1120000           1150000
HP9000/720 G/CRX          1120000           1150000
DECstation 5000/200PXGT    300000            300000
Sun SPARCstation 2GX       450000            240000
-----------------------------------------------------------------------------


Sequential Disk Access Rates:
-----------------------------------------------------------------------------
                                       Read (kB/s)       Write (kB/s)
-----------------------------------------------------------------------------
HP9000/700, 1*210MByte disk            1120              1140
HP9000/700, 1*420MByte disk            1520              1510
HP9000/700, 2*210MByte disk            2070              1800
HP9000/700, 2*420MByte disk            2460              2140
Sun SPARCstation 2, 207MByte disk       744               794
-----------------------------------------------------------------------------


ANSYS SP-3 results (smaller = better):
-----------------------------------------------------------------------------
                            CPU seconds
-----------------------------------------------------------------------------
Cray 2                       27
HP9000/730,750 G/CRX         49
DEC VAX9000                  65
HP9000/720 G/CRX             66
IBM 6000/540                 68
DECstation 5000             145
IBM 6000/320                107
Sun SPARCstation 1+         311
Sun SPARCstation 2          225
-----------------------------------------------------------------------------
HP numbers were measured with series 800 compiler code. No series 700 
specific optimizations used.

irf@kuling.UUCP (Bo Thide') (03/27/91)

Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let
loose, the official HP info has become available.  Some of this info follows.

There are three models, the desktop (114mm*508mm*470mm) 720 (Cobra) and
730 (King Cobra) and the deskside (610mm*220mm*595mm) 750 (Coral). They
come initially with HP-UX 8.01 to be upgraded to HP-UX 8.05 in June. Later
OSF/1 will be available.

Clock: 50 MHZ (720) or 66 MHz (730, 750)

Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.

Interfaces: SCSI-II, EISA, LAN, RS-232 (to 460.8 kbaud), HP-HIL, Centronics.
            HP-IB optional (via EISA!).

Monitors: 72 Hz, 19" 1280x1024 8-bit grayscale (GRX) or 8+8 color planes (CRX).

Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA.

Languages: C, C++, Pascal, FORTRAN, ANSI C, Assembler.  FORTRAN compiler
	   with "+800" option for series 800 compatibility. Series 800
	   binaries run on series 700 machines.


Performance (with HP-UX 8.05) and comparison with other workstations:
-----------------------------------------------------------------------------
                            SPEC        Khorner-       Linp2P  x11-  Dhry-
                        mark int  fp    stones   MIPS  MFLOPS  perf  stone2.0
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX    72.2 51.0 91.0  143974   76    22.9    10460  114680
HP9000/720 G/CRX        55.5 39.0 70.2  119213   57    17.2     8244   87000
IBM 6000/550            54.3 34.5 73.5   n/a     56    23       n/a    n/a
IBM 6000/320            24.6 16.3 32.4   54661   29.5   8.5     1520   45250
Sun SPARCstation 2GX    21.0 20.2 21.5   27142   28.5   4.2     n/a    35590
DECstation 5000/200PXGT 18.5 19.0 18.5   26456   24.2   3.7     3256   38760
DECstation 3100         11.3 11.8 10.9   15285   14.9   1.6     1702   23470
Sun SPARCstation IPC    11.8 12.4 11.4   13329   15.7   1.7     n/a    22830
-----------------------------------------------------------------------------
Linp2P = Linpack Double precision, 100*100 FORTRAN BLAS, rolled.
x11perf = geometric mean of the x11perf1.2 component tests (excluding 1
	  and 500 pixel tests).


Selected x11perf Tests:
-----------------------------------------------------------------------------
			         10 pixel  10*10   TR      create & map
			Dots     lines     rects   text    subwins (50 kids)
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX    1630000  911000    278000  273000  6000
HP9000/720 G/CRX        1260000  874000    272000  245000  4500
DECstation 5000/200PXGT  370000  455000    256000   90900  1750
Sun SPARCstation 2GX     101100  147000     83500   49000  1050
-----------------------------------------------------------------------------


Graphics Performance:
-----------------------------------------------------------------------------
                          2D floating       3D floating pt
		    	  pt vectors/s      vectors/s (peak)
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX      1120000           1150000
HP9000/720 G/CRX          1120000           1150000
DECstation 5000/200PXGT    300000            300000
Sun SPARCstation 2GX       450000            240000
-----------------------------------------------------------------------------


Sequential Disk Access Rates:
-----------------------------------------------------------------------------
                                       Read (kB/s)       Write (kB/s)
-----------------------------------------------------------------------------
HP9000/700, 1*210MByte disk            1120              1140
HP9000/700, 1*420MByte disk            1520              1510
HP9000/700, 2*210MByte disk            2070              1800
HP9000/700, 2*420MByte disk            2460              2140
Sun SPARCstation 2, 207MByte disk       744               794
-----------------------------------------------------------------------------


ANSYS SP-3 results (smaller = better):
-----------------------------------------------------------------------------
                            CPU seconds
-----------------------------------------------------------------------------
Cray 2                       27
HP9000/730,750 G/CRX         49
DEC VAX9000                  65
HP9000/720 G/CRX             66
IBM 6000/540                 68
DECstation 5000             145
IBM 6000/320                107
Sun SPARCstation 1+         311
Sun SPARCstation 2          225
-----------------------------------------------------------------------------
HP numbers were measured with series 800 compiler code. No series 700 
specific optimizations used.

nazgul@alphalpha.com (Kee Hinckley) (03/27/91)

In article <1998@kuling.UUCP> irf@kuling.DoCS.UU.SE (Bo Thide') writes:
>Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA.
		  ^^^^^^^^^^^^
I don't believe this.  1.2 uses the R5 Intrinsics, and while HP is a
consortium member and the contractor doing the 1.2 work I can't believe
that any of that stuff is stable enough to use.  It's not even in beta
yet from OSF.  If they are releasing it then it's sure to change before
the official release.  (And we won't even talk about bugs.)

-- 
Alfalfa Software, Inc.          |       Poste:  The EMail for Unix
nazgul@alfalfa.com              |       Send Anything... Anywhere
617/646-7703 (voice/fax)        |       info@alfalfa.com

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

iyengar@gradient.cis.upenn.edu (Anand Iyengar) (03/28/91)

In article <1998@kuling.UUCP> irf@kuling.DoCS.UU.SE (Bo Thide') writes:
>Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let
>...
>Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.
	Are these external caches (sound too big to be on chip)?  How much
(if any) delay does a cache access cost?  

					Anand.  
--
"The nearer your destination, the more you're slip-sliding away..."
iyengar@grad1.cis.upenn.edu
--- Lbh guvax znlor vg'yy ybbx orggre ebg-guvegrrarg? ---
Disclaimer:  It's a forgery.  
--
"The nearer your destination, the more you're slip-sliding away..."
iyengar@grad1.cis.upenn.edu
--- Lbh guvax znlor vg'yy ybbx orggre ebg-guvegrrarg? ---

daryl@hpcupt3.cup.hp.com (Daryl Odnert) (03/28/91)

> Does anyone know how these numbers were achieved?
>
> Curtis

Yes, I know how they were achieved.

   1)  Fast hardware.  The processor is running at 50MHz or 66MHz,
       depending on which model you consider.

   2)  Improved processor architecture.  The processor in the Snakes
       workstation is based on the PA-RISC 1.1 architecture, which is
       a compatible upgrade from the original PA-RISC 1.0 architecture.
       Among the significant changes are an expanded floating-point
       coprocessor register file that now has 32 64-bit registers,
       also addressable as 64 32-bit registers.  There were also new
       multiply-and-add and multiply-and-subtract floating-point
       instructions, and an integer multiply instruction.

   3)  Enhanced compilers.  Several new optimizations have been implemented
       in HP's PA-RISC compilers, and more are on the way which will continue
       to improve benchmark and application performance.

For more information, see the 3 papers from HP published in the Spring '91
COMPCON Digest of Papers, pages 202-218.

Regards,
Daryl Odnert       daryl@hpcllla.cup.hp.com
Hewlett-Packard
California Language Lab
Cupertino, California

daryl@hpcupt3.cup.hp.com (Daryl Odnert) (03/28/91)

> Are the new PA machines binary compatible with the old ones?  Reading
> the press, it sounds like there are new instructions in the new 
> machines so that while the old programs will run, the full performance
> will not be achieved unless you recompile.  Is this true?
>
> Donald A. Lewine

Yes, this is true.  The new PA-RISC machines, the HP9000 Series 700, are
binary compatible in the sense that PA-RISC 1.0 HP-UX binaries will run
without modification on the PA-RISC 1.1 processor running HP-UX.
To take advantage of the new instructions and registers in PA-RISC 1.1
you do need to recompile.  Code compiled for PA-RISC 1.1 will not run
on PA-RISC 1.0 based systems.

Daryl Odnert            daryl@hpcllla.cup.hp.com
Hewlett-Packard
California Language Lab
Cuptertino, California

linley@hpcuhe.cup.hp.com (Linley Gwennap) (03/29/91)

How's I/O on this thing?  Would use as a fileserver be a shameful 
waste?

Raul Rockwell
----------
HP has worked hard to optimize system performance on the Series 700
instead of focusing on a few small benchmarks.  The June '91 release
of HP-UX will support a high-performance EISA SCSI adapter providing
a 10MB/second (burst) transfer rate to the disk.  When combined with
HP's differential disk storage system, this is among the highest
performance SCSI subsystems available.  For example, based on numbers
from the SPARCstation2 Performance Brief, the Series 700 provides 50%
more throughput on sequential disk accesses.

The Series 700 provides superior LAN performance as well.  In fact, our
tests show NFS, Berkeley Stream Socket, and FTP transfers at up to
1000 KB/second, nearly the theoretical maximum of 1200 KB/second for
IEEE802.3 LAN transfers.  Of course, these results will vary depending
on the CPU load at the server, but the Series 700 has plenty of CPU
power to solve that problem!
						--Linley Gwennap
						  Hewlett-Packard

linley@hpcuhe.cup.hp.com (Linley Gwennap) (03/29/91)

(Donald Lewine) asks:
Are the new PA machines binary compatible with the old ones?  Reading
the press, it sounds like there are new instructions in the new 
machines so that while the old programs will run, the full performance
will not be achieved unless you recompile.  Is this true?
----------

The Series 700 systems are fully binary compatible with our Series 800
RISC product family of workstations and multi-user systems.  It is true
that older programs will not take advantage of the new PA-RISC 1.1
instructions, which primarily affect floating point and graphic
applications.  Dhrystones, for example, are unaffected; SPECmarks are
improved by 20%-30% by recompiling.

Of course, since the Series 700 uses the same HP-UX operating system
as the Series 800, recompiling is simple and fast (as long as you have
source).  If you cannot recompile, your application will still run
faster than it would on any other workstation available today.

						--Linley Gwennap
						  Hewlett-Packard

linley@hpcuhe.cup.hp.com (Linley Gwennap) (03/29/91)

(Anand Iyengar) asks
>Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.
	Are these external caches (sound too big to be on chip)?  How much
(if any) delay does a cache access cost?  
----------
The caches on the Series 700 are all implemented external to the chip
using standard commercially available SRAMs.  There is no delay for a
cache access; so long as the cache is hit, instructions execute one
per clock cycle.
						--Linley Gwennap
						  Hewlett-Packard

wcs) (04/01/91)

In article <32580009@hpcuhe.cup.hp.com> linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:
] > Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.
] > Are these external caches (sound too big to be on chip)?  How much
] > (if any) delay does a cache access cost?  
] The caches on the Series 700 are all implemented external to the chip
] using standard commercially available SRAMs.  There is no delay for a
] cache access; so long as the cache is hit, instructions execute one
] per clock cycle.

Are you saying LOAD and STORE instructions take 1 cycle? !??!
I thought the 700 took 2-4 cycles, like most machines.
Can you at least overlap loads and stores if you use separate
registers, for applications like bcopy()?
-- 
# Bill Stewart 908-949-0705 erebus.att.com!wcs AT&T Bell Labs 4M-312 Holmdel NJ
(Little Girl:) When I grow up, I want to be a nurse      } From this week's UFT
(Little Boy:)  When I grow up, I want to be an engineer  } radio commercial
.... guess the Political Correctness Police don't run NYC's teachers' union yet

linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/02/91)

> Are you saying LOAD and STORE instructions take 1 cycle? !??!
> I thought the 700 took 2-4 cycles, like most machines.
> Can you at least overlap loads and stores if you use separate
> registers, for applications like bcopy()?
> -- 
> # Bill Stewart

In "real life", loads take 2 cycles and stores take 3 cycles on the
Series 700 processors.  However, the CPU will appear to execute loads
and stores in a single cycle as long as there is no interlock or
dependency on the target register.  The compiler will usually schedule
instructions to avoid such interlocks.  In your example, loads and
stores can be overlapped so long as separate registers are used.

To address another question (I'm try to restrict myself to one posting
here), HP is committed to delivering OSF/1 on the Series 700 by the
end of the year (1991).  OSF/1 is not available for immediate delivery.
Thus, no problems with excessive bugs and/or changes.

						--Linley Gwennap
						  Hewlett-Packard Co.

linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/02/91)

I'd like to take a moment to respond to comments that the Series 700
has achieved its high performance due primarily to high (66 MHz) clock
frequencies resulting from advanced CMOS processes.  While HP's IC
processes are as good as anyone's, the Series 700 CPU is implemented
in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard
throughout the industry.  It is nearly identical to IBM's 1.0 micron
process used in their 20-30 MHz RS/6000s, and not as dense as the
0.8 micron process used in IBM's 41 MHz Model 550.

TUTORIAL MODE ON:

The cycle time of a CPU is determined by the longest amount of time
required to complete a single pipeline stage.  This in turn is driven
by the number of gate delays, the length of the gate delay, the number
of chip crossings, and the length of time to cross between chips.  Of
these factors, only the length of the gate delay is determined directly
by the IC process.

The number of gate delays is determined by the number of pipeline
stages and the complexity of the design (and of course the skill of
the designer).  The R4000 uses a longer (8-stage) pipeline to reduce
the number of gates needed for a single stage and thus reduce the
cycle time.  The Series 700 focused on reducing the complexity of
the design to achieve a fast cycle time.  The RS/6000 is not focused
on cycle time at all, resulting in a complex design and a relatively
slow clock frequency.

The RS/6000 also suffers from the chip crossing problem.  The complex
superscalar design requires 8 chips to implement instead of 3 on the
Series 700, forcing signals to move from chip to chip in a single
clock cycle.  This additional overhead further slows the RS/6000 clock.
To improve its clock frequency, IBM must either (a) use a denser
IC process to cram more circuitry onto a smaller number of chips;
(b) use a multi-chip module to reduce chip crossing delays; or (c)
simplify the America design by eliminating some of the superscalar
complexities.

Of course, cycle time in itself does not determine CPU performance,
but that is a different discussion.

TUTORIAL MODE COMPLETE

In conclusion, the Series 700 uses a simple, efficient processor
design, coupled with state-of-the-art IC processes, to achieve high
clock frequencies and high performance.  HP's 1.0 micron IC process
has as much room to evolve as other vendors' processes, and more
headroom than IBM's 41 MHz RS/6000.  The Series 700 family will
continue to improve in performance as IC processes improve.

						--Linley Gwennap
						  Hewlett-Packard

linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/02/91)

By request, complete SPEC numbers on the Series 700:

	    gcc   expres   li   eqntot  spice  doduc  nasa  matrix fpppp tomcat
Model 720   35.2   42.5   38.1   40.6   46.9   48.6   58.0  210.0   81.4   52.9
Model 730   46.5   55.2   50.3   52.6   60.9   64.0   73.7  273.3  107.0   67.4
Model 750   46.5   55.2   50.3   52.6   60.9   64.0   73.7  273.3  107.0   67.4

These are current results obtained from compilers to be released in June '91.
They may change slightly before submission to SPEC.
							--Linley Gwennap
							  Hewlett-Packard

cag@hpfcso.FC.HP.COM (Craig Gleason) (04/02/91)

>(Anand Iyengar) asks
>>Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.
>	Are these external caches (sound too big to be on chip)?  How much
>(if any) delay does a cache access cost?  
>----------
>The caches on the Series 700 are all implemented external to the chip
>using standard commercially available SRAMs.  There is no delay for a
>cache access; so long as the cache is hit, instructions execute one
>per clock cycle.
>						--Linley Gwennap
>						  Hewlett-Packard

This isn't strictly true.  All loads and I-fetches take one cycle.  There
is a load-use penalty of one cycle when the instruction after the load
uses the load data as an operand.

Stores take three cycles on the data cache, but if they are followed by
non-data cache accesses, there will be no penalty.  Therefore the store
penalty can be zero, one or two states, and the compilers can schedule
accordingly. 

The processor was optimized for loads/I-fetches since the biggest performance
lever is processor frequency.  Slowing down to allow a one or two cycle store
would not have been a good performance tradeoff.

Craig Gleason
Hewlett-Packard

jbs@WATSON.IBM.COM (04/02/91)

         Linley Gwennap writes:
          If you cannot recompile, your application will still run
faster than it would on any other workstation available today.

         Isn't this a bit of an exaggeration?  Suppose your application
resembles tomcatv?
                          James B. Shearer

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (04/02/91)

In article <32580010@hpcuhe.cup.hp.com>
	linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:

>To address another question (I'm try to restrict myself to one posting
>here), HP is committed to delivering OSF/1 on the Series 700 by the
>end of the year (1991).  OSF/1 is not available for immediate delivery.
>Thus, no problems with excessive bugs and/or changes.

I'm rather interested in the availability of 4.4BSD or 4.3BSD-reno.
Dose anyone know anything about that?

HP-UX or any other SysV based OS is too painful to administrate.

						Masataka Ohta

gary@chpc.utexas.edu (Gary Smith) (04/02/91)

Have any of the computational chemistry software systems, such as
GAUSSIAN-90, been ported to the new HP machines?

-- 
Gary Smith <gary@chpc.utexas.edu>
Systems Group, Center for High Performance Computing
The University of Texas System
Commons Building 1.151C, Balcones Research Center
10100 Burnet Road
Austin, TX 78758
(512) 471-2411

peter@ficc.ferranti.com (Peter da Silva) (04/03/91)

In article <31@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
> HP-UX or any other SysV based OS is too painful to administrate.

What have you been smoking? I want some.

BSD system administration hasn't changed significantly since V7. You still
have to add drivers by editing makefiles, one way or the other. Compared to
the System V sysadm stuff, and the idmk* programs, it's like stone knives
and bear skins.

And where Berkeley *has* innovated in system administration it's not done
such a good job. I'm no big fan of MMDF, but next to sendmail.cf it's a
masterpiece of clarity. We have some SPARCstations here and I'm dreading
hooking them into the mail network.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

mslater@cup.portal.com (Michael Z Slater) (04/03/91)

Linley Gwennap from HP writes:

>While HP's IC
>processes are as good as anyone's, the Series 700 CPU is implemented
>in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard
>throughout the industry.

I think this understates the value of HP's process.  IBM may have 3-layer
metal, but I am not aware of any commercially available microprocessor 
fabricated in a 3-layer metal process.  I'm not a process expert by a long
shot, but from what I understand, there is a speed advantage here.

Also, from casual discussions with others who have had chips fabbed at various
foundries, including HP, HP's process has a reputation for being one of the
fastest around.

Michael Slater, MIcroprocessor Report   mslater@cup.portal.com

jdr@sloth.mlb.semi.harris.com (Jim Ray) (04/03/91)

In article <40812@cup.portal.com> mslater@cup.portal.com (Michael Z Slater) writes:
>Linley Gwennap from HP writes:
>
>>While HP's IC
>>processes are as good as anyone's, the Series 700 CPU is implemented
>>in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard
>>throughout the industry.
>
>I think this understates the value of HP's process.  IBM may have 3-layer
>metal, but I am not aware of any commercially available microprocessor 
>fabricated in a 3-layer metal process.  I'm not a process expert by a long
>shot, but from what I understand, there is a speed advantage here.
>
>Also, from casual discussions with others who have had chips fabbed at various
>foundries, including HP, HP's process has a reputation for being one of the
>fastest around.
>
>Michael Slater, MIcroprocessor Report   mslater@cup.portal.com

I could have swarn that at least one of the chips ( one of the 3 used
in the "snakes") , is foundered by Hitachi.  Anyone else hear anything
of this?

-- 
Jim Ray                                Harris Semiconductor
Internet:  jdr@semi.harris.com         PO Box 883   MS 62B-022
Phone:     (407) 729-5059              Melbourne, FL  32901

glew@pdx007.intel.com (Andy Glew) (04/03/91)

>Stores take three cycles on the data cache, but if they are followed by
>non-data cache accesses, there will be no penalty.  Therefore the store
>penalty can be zero, one or two states, and the compilers can schedule
>accordingly. 

What are the penalties for back-to-back loads? With/without dependencies?
Back-to-back stores? With/without deperndencies?

I.e. what are the penalties for the following code fragments:


(1) Load followed by independent non-memory code (assuming all registers immediately ready):

    	load r1 := M[imm]
    	add  r2 := r3 + r4  
    	add  r2 := r3 + r4
    	add  r2 := r3 + r4

(2) Load followed by dependent non-memory code (assuming all registers immediately ready):

    	load r1 := M[imm]
    	add  r2 := r3 + r1  
    	add  r2 := r3 + r1
    	add  r2 := r3 + r1

(3) Load followed by independent load:

    	load r1 := M[imm]
    	load r2 := M[imm2]

(4) Load followed by dependent load:

    	load r1 := M[imm]
    	load r2 := M[r1]

(5) Store followed by independent non-memory code:

    	store M[imm] := r1
    	add  r2 := r3 + r4  
    	add  r2 := r3 + r4
    	add  r2 := r3 + r4

(6) Store followed by store

    	store M[imm] := r1
    	store M[imm2] := r2

(7) Store followed by independent load:

    	store M[imm] := r1
    	load  r2 := M[imm2]

(8) Store followed by dependent load:

    	store M[imm] := r1
    	load r2 := M[imm]

(9) Load followed by independent store:

    	load r1 := M[imm]
    	store M[imm2] := r2

(10) Load followed by dependent store:

    	load r1  := M[imm]
    	store M[r1] := r2

Have I missed any cases?

--

Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

This is a private posting; it does not indicate opinions or positions
of Intel Corp.

mlord@bwdls58.bnr.ca (Mark Lord) (04/04/91)

In article <> cag@hpfcso.FC.HP.COM (Craig Gleason) writes:
<...                   All loads and I-fetches take one cycle.  There
<is a load-use penalty of one cycle when the instruction after the load
<uses the load data as an operand.
<
<Stores take three cycles on the data cache, but if they are followed by
<non-data cache accesses, there will be no penalty.  Therefore the store
<penalty can be zero, one or two states, and the compilers can schedule
<accordingly. 

Thanks for this very useful info.  A question:  what happens with successive
stores?  We seem to have trouble avoiding a lot of these in applications 
around here (BNR), even after the optimizer has its go at things.  Mostly these
occur early in procedures which need to save lots of parameters before invoking
another procedure as the first activity in the current procedure (am I confused
or what!).  As such, the optimizer has trouble spreading them out, and quite
often they end up clumped together:  store,store,store,store. 

What sort of penalty does this incur cycle-wise (and why) ?
-- 
MLORD@BNR.CA  Ottawa, Ontario *** Personal views only ***
begin 644 NOTSHARE.COM ; Free MS-DOS utility - use instead of SHARE.EXE
MZQ.0@/P/=`J`_!9T!2[_+H``L/_/+HX&+`"T2<TAO@,!OX0`N1(`C,B.P/.DS
<^K@A-<TAB1Z``(P&@@"ZA`"X(27-(?NZE@#-)P#-5
``
end

maj@hpfcso.FC.HP.COM (Mike Jassowski) (04/05/91)

> I could have swarn that at least one of the chips ( one of the 3 used
> in the "snakes") , is foundered by Hitachi.  Anyone else hear anything
> of this?
----------
The floating point coprocessor was developed jointly by HP and TI and
is fabbed at TI.

--Mike Jassowski

clc5q@madras.cs.Virginia.EDU (Clark L. Coleman) (04/06/91)

In article <32580012@hpcuhe.cup.hp.com> linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:
>I'd like to take a moment to respond to comments that the Series 700
>has achieved its high performance due primarily to high (66 MHz) clock
>frequencies resulting from advanced CMOS processes.  While HP's IC
>processes are as good as anyone's, the Series 700 CPU is implemented
>in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard
>throughout the industry.  It is nearly identical to IBM's 1.0 micron
>process used in their 20-30 MHz RS/6000s, and not as dense as the
>0.8 micron process used in IBM's 41 MHz Model 550.

Maybe you could clarify this for me. I read that the main CPU was 1.0 micron
geometry and the Texas Instruments floating point coprocessor was 0.8 micron
from a TI process. It sounds like the floating point unit has a process
at least as good as any competitors' processes, and the main CPU and cache
chips are about average for this market segment. Is this right?

-----------------------------------------------------------------------------
"The use of COBOL cripples the mind; its teaching should, therefore, be 
regarded as a criminal offence." E.W.Dijkstra, 18th June 1975.
|||  clc5q@virginia.edu (Clark L. Coleman)

danw@tbird9.prime.com (Dan Westerberg) (04/06/91)

In article <1991Apr5.183233.26573@murdoch.acc.Virginia.EDU>, clc5q@madras.cs.Virginia.EDU (Clark L. Coleman) writes:
|> In article <32580012@hpcuhe.cup.hp.com> linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:
|> >I'd like to take a moment to respond to comments that the Series 700
|> >has achieved its high performance due primarily to high (66 MHz) clock
|> >frequencies resulting from advanced CMOS processes.  While HP's IC
|> >processes are as good as anyone's, the Series 700 CPU is implemented
|> >in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard
|> >throughout the industry.  It is nearly identical to IBM's 1.0 micron
|> >process used in their 20-30 MHz RS/6000s, and not as dense as the
|> >0.8 micron process used in IBM's 41 MHz Model 550.
|> 
|> Maybe you could clarify this for me. I read that the main CPU was 1.0 micron
|> geometry and the Texas Instruments floating point coprocessor was 0.8 micron
|> from a TI process. It sounds like the floating point unit has a process
|> at least as good as any competitors' processes, and the main CPU and cache
|> chips are about average for this market segment. Is this right?

Please correct me if I'm wrong since I design gate arrays, but are 3-metal-layer
processes *average* ?  I was under the impression that 3-metal-layer's are on the 
cutting edge of technology, tough to design and fab, atleast wrt gate arrays.  
Is this radically different in full-custom design?


|> 
|> -----------------------------------------------------------------------------
|> "The use of COBOL cripples the mind; its teaching should, therefore, be 
|> regarded as a criminal offence." E.W.Dijkstra, 18th June 1975.
|> |||  clc5q@virginia.edu (Clark L. Coleman)

dan

-- 
===============================================================================
|                                                                             |
| Daniel I. Westerberg                 email:  danw@tbird9.prime.com (or)     |
| Prime Computer Inc.                          danw@s49.prime.com             |
| MS 10-9                                                                     |
| 500 Old Connecticut Path             phone:  508-620-2800  x3644            |
| Framingham,  MA  10701                 fax:  508-879-9098                   |
|                                                                             |
===============================================================================

linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/09/91)

(Michael Z Slater) notes:

> IBM may have 3-layer
> metal, but I am not aware of any commercially available microprocessor 
> fabricated in a 3-layer metal process.  I'm not a process expert by a long
> shot, but from what I understand, there is a speed advantage here.

The third metal layer is indeed an advantage for high speed microprocessors.
It can allow lower skew on internal clock distribution networks, more area
devoted to power/ground busing to reduce noise, and often shorter signal
interconnects which minimize RC delays.  It is not, however, as significant
as shrinking the overall device size below one micron.  While HP's IC
fabrication processes are very good, they are not out on the "bleeding
edge", so there's plenty of room for future improvement.  (For bleeding
edge, try TI's 0.8 micron BiCMOS process!)

To clarify another inquiry, of the three large chips (CPU, FPU, MC) in
the Series 700, two are fabbed by HP.  The FPU is fabbed by TI in a 0.8
micron, two metal-layer CMOS process.  While Hitachi is not involved in
the Series 700, they have licensed the PA-RISC architecture and are
expected to produce PA-RISC chips in the future.

---------------------------------------------------------------------------
DISCLAIMER:  The views expressed here do not		Linley Gwennap
represent the views of the Hewlett-Packard		PA-RISC Marketing
Company.  Caveat emptor.				Hewlett-Packard

linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/12/91)

(Dan Westerberg) asks:
> Please correct me if I'm wrong since I design gate arrays, but are
> 3-metal-layer processes *average*?

Let me clarify.  HP's IC processes are among the industry leaders for
large-scale CPU designs (CPUs tend to push the boundaries).  I would
not call our IC processes "average".  The 3-metal-layer is one area
where we are probably ahead of most of the industry.  In the area of
device size, at 1 micron we are with the industry leaders but not
really pushing the limits.  In short, there are plenty of technological
advances left to apply to the Series 700 CPU chip.

The FPC chip is fabricated by Texas Instruments in a different process.
This chip uses a 0.8 micron, 2-metal-layer TI process.  This denser
process helps keep the cost of the FPC chip down.

---------------------------------------------------------------------------
DISCLAIMER:  The views expressed here do not		Linley Gwennap
represent the views of the Hewlett-Packard		PA-RISC Marketing
Company.  Caveat emptor.				Hewlett-Packard

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (04/13/91)

In article <32580016@hpcuhe.cup.hp.com>, linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:
>  While HP's IC
>fabrication processes are very good, they are not out on the "bleeding
>edge", so there's plenty of room for future improvement.  (For bleeding
>edge, try TI's 0.8 micron BiCMOS process!)

Can you (without lotta headaches) put HP-PA on Gallium Arsenide <sp>?

>To clarify another inquiry, of the three large chips (CPU, FPU, MC) in
>the Series 700, two are fabbed by HP.  The FPU is fabbed by TI in a 0.8
>micron, two metal-layer CMOS process. 

So the TI floating-point goodie is the most expensive chip sitting in your box,
due to the fabrication process, among other things? (TI: Good, but pricy) Could
you offer a Real Cheap Snake (snakette?) with a goal of 20MIPS/5K given the
current technology? 

> While Hitachi is not involved in the Series 700, they have licensed the 
>PA-RISC architecture and are
>expected to produce PA-RISC chips in the future.

Will they be offering complementary low-end machines to the Snake? Have any
other companies considered licensing PA-RISC? 

     Signature envy: quality of some people to put 24+ lines in their .sigs
  -- >                  SYSMGR@CADLAB.ENG.UMD.EDU                        < --

mike@UC780.UMD.EDU (Mike Santangelo) (04/17/91)

In article <00947104.8B2D8080@KING.ENG.UMD.EDU>, sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes:
>In article <32580016@hpcuhe.cup.hp.com>, linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:
>>  While HP's IC
>>fabrication processes are very good, they are not out on the "bleeding
>>edge", so there's plenty of room for future improvement.  (For bleeding
>>edge, try TI's 0.8 micron BiCMOS process!)
>
>Can you (without lotta headaches) put HP-PA on Gallium Arsenide <sp>?
>
>>To clarify another inquiry, of the three large chips (CPU, FPU, MC) in
>>the Series 700, two are fabbed by HP.  The FPU is fabbed by TI in a 0.8
>>micron, two metal-layer CMOS process. 
>
>So the TI floating-point goodie is the most expensive chip sitting in your box,
>due to the fabrication process, among other things? (TI: Good, but pricy) Could
>you offer a Real Cheap Snake (snakette?) with a goal of 20MIPS/5K given the
>current technology? 
>


Rumor has it that HP will indeed introduce a "snakette" and a 90Mhz
version of the current systems in the Fall.


>> While Hitachi is not involved in the Series 700, they have licensed the 
>>PA-RISC architecture and are
>>expected to produce PA-RISC chips in the future.
>
>Will they be offering complementary low-end machines to the Snake? Have any
>other companies considered licensing PA-RISC? 
>

I don't think Hitachi is in anyway involved with the snakes and doesn't plan
to be.  They will introduce their own line as I recall sometime in the not
so distant future.

>
>     Signature envy: quality of some people to put 24+ lines in their .sigs
>  -- >                  SYSMGR@CADLAB.ENG.UMD.EDU                        < --
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Michael F. Santangelo                 + Inet: mike@uc780.umd.edu
VMS / UNIX Systems                    +       mike@socrates.umd.edu
Academic Computing UMUC               + Bnet: MIKE@UC780
(The University of Maryland,          +       MIKE@UMUC (not visited often)
 University College)                  +<Your clever net-phrase here>

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (04/18/91)

>Rumor has it that HP will indeed introduce a "snakette" and a 90Mhz
>version of the current systems in the Fall.
>
Thanks, I can read the rumors column like anyone else ;-). I was hoping to get
a comment or two from an HP employee <carefully waving six-pack of beer around>

I'm also interested in the cost of the TI floating-point chip when compared to
the other HP-PA stuff, and if the floating-point chip could be made a socketed
option...

>>Will they be offering complementary low-end machines to the Snake? Have any
>>other companies considered licensing PA-RISC? 
>
>I don't think Hitachi is in anyway involved with the snakes and doesn't plan
>to be.  They will introduce their own line as I recall sometime in the not
>so distant future.

I am referring to marketing strategy. Sun has very successfully (to this date)
worked a "Middle of the Road" approach, letting Solbourne and others work on
either faster/stronger/multiprocessor options and affordable/cheaper/commodity
machines (such as the CompuAdd box). 

Hitachi and a Korean manufacturer (name escapes me) have licensed HP-PA. If
they don't have something which is binary-compatible with the HP-PA lines, I'd
be really surprised. Kinda waste of time and money....


     Signature envy: quality of some people to put 24+ lines in their .sigs
  -- >                  SYSMGR@CADLAB.ENG.UMD.EDU                        < --

mike@vlsivie.tuwien.ac.at (Michael K. Gschwind) (04/19/91)

In article <00947104.8B2D8080@KING.ENG.UMD.EDU> sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes:
>So the TI floating-point goodie is the most expensive chip sitting in your box,
>due to the fabrication process, among other things? (TI: Good, but pricy) Could
>you offer a Real Cheap Snake (snakette?) with a goal of 20MIPS/5K given the
>current technology? 
>
>> While Hitachi is not involved in the Series 700, they have licensed the 
>>PA-RISC architecture and are
>>expected to produce PA-RISC chips in the future.
>
>Will they be offering complementary low-end machines to the Snake? Have any
>other companies considered licensing PA-RISC? 

Samsung is supposed to build low-end Snakes. According to what I hear 
from HP, they will cut back on floating point performance to achieve
this goal. 

The HP people I talked with only mentioned Hitachi and Samsung as
licensees. 

					mike

Michael K. Gschwind, Dept. of VLSI-Design, Vienna University of Technology
mike@vlsivie.tuwien.ac.at	1-2-3-4 kick the lawsuits out the door 
mike@vlsivie.uucp		5-6-7-8 innovate don't litigate         
e182202@awituw01.bitnet		9-A-B-C interfaces should be free
Voice: (++43).1.58801 8144	D-E-F-O look and feel has got to go!
Fax:   (++43).1.569697

frank@grep.co.uk (Frank Wales) (04/20/91)

In article <00947448.7A748120@KING.ENG.UMD.EDU> sysmgr@KING.ENG.UMD.EDU
 (Doug Mohney) writes:
>Hitachi and a Korean manufacturer (name escapes me) have licensed HP-PA. 
Samsung.  It seems Mitsubishi have also adopted PA for future workstations.
--
Frank Wales, Grep Limited,             [frank@grep.co.uk<->uunet!grep!frank]
Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303

martelli@cadlab.sublink.ORG (Alex Martelli) (04/23/91)

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes:
	...
:I am referring to marketing strategy. Sun has very successfully (to this date)
:worked a "Middle of the Road" approach, letting Solbourne and others work on
:either faster/stronger/multiprocessor options and affordable/cheaper/commodity
:machines (such as the CompuAdd box). 

I have not yet seen ONE Sun clone that is cheaper than a trueblue Sparcstation
SLC from Sun itself (possibly with 3rd party 8->16 meg RAM expansion, and
with a 3rd party external SCSI-connected box with disc/tape/whatever) - so
much for 'affordable/cheaper/commodity' machines!
-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia
Email: (work:) martelli@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).

ajayshah@alhena.usc.edu (Ajay Shah) (04/27/91)

In article <772@cadlab.sublink.ORG> martelli@cadlab.sublink.ORG (Alex Martelli) writes:
>I have not yet seen ONE Sun clone that is cheaper than a trueblue Sparcstation
>SLC from Sun itself (possibly with 3rd party 8->16 meg RAM expansion, and
>with a 3rd party external SCSI-connected box with disc/tape/whatever) - so
>much for 'affordable/cheaper/commodity' machines!

The SLC is truly a cheap machine.  But RAM <= 16Meg, no SBus
slots, 12.5 mips without expandability is not a solution for
everyone.  But it's really priced low -- just add up the cost of
the spare parts going into it (17" mono, 8Meg, etc.) and you have
it's University price of $2700.

The latest Unix World has an ad for what feels like a SS-II clone
(called SparcClone 2) for $8k. Now that makes sense -- Sun is
making a lot of money on the SS-II at $15k (try adding up spare
parts prices again).  It should be possible for a clonemaker to
beat 'em on price here handily.

There are a host of SS-1 clones at prices like $5k these days, a
market which Sun has essentially exited.

BTW, I just saw a SPARC clone built by Xerox at a USC Sun computer
fair.  That sounds rather important to me -- isn't a clone by
Xerox big news??

-- 
_______________________________________________________________________________
Ajay Shah, (213)734-3930, ajayshah@usc.edu
                             The more things change, the more they stay insane.
_______________________________________________________________________________

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (04/29/91)

In article <772@cadlab.sublink.ORG>, martelli@cadlab.sublink.ORG (Alex Martelli) writes:
>sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes:
>	...
>:I am referring to marketing strategy. Sun has very successfully (to this date)
>:worked a "Middle of the Road" approach, letting Solbourne and others work on
>:either faster/stronger/multiprocessor options and affordable/cheaper/commodity
>:machines (such as the CompuAdd box). 
>
>I have not yet seen ONE Sun clone that is cheaper than a trueblue Sparcstation
>SLC from Sun itself (possibly with 3rd party 8->16 meg RAM expansion, and
>with a 3rd party external SCSI-connected box with disc/tape/whatever) - so
>much for 'affordable/cheaper/commodity' machines!

Most companies have knocked off the SS-I, with 3 S-bus slots and monitor (see
CompuAdd, other no-namers). 

Look at what you get with the SLC vs a SparcStation I clone. Both list at
around $5,000, but the Sparclone is a better value (expandable, not fudged
in terms of RAM expansion). 

Using "true blue" and "Sun" in the same sentence is a contradiction in terms. 
Sun doesn't make IBM equipment :-)

     Signature envy: quality of some people to put 24+ lines in their .sigs
  -- >                  SYSMGR@CADLAB.ENG.UMD.EDU                        < --