irf@kuling.UUCP (Bo Thide') (03/27/91)
Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let loose, the official HP info has become available. Some of this info follows. There are three models, the desktop (114mm*508mm*470mm) 720 (Cobra) and 730 (King Cobra) and the deskside (610mm*220mm*595mm) 750 (Coral). They come initially with HP-UX 8.01 to be upgraded to HP-UX 8.05 in June. Later OSF/1 will be available. Clock: 50 MHZ (720) or 66 MHz (730, 750) Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data. Interfaces: SCSI-II, EISA, LAN, RS-232 (to 460.8 kbaud), HP-HIL, Centronics. HP-IB optional (via EISA!). Monitors: 72 Hz, 19" 1280x1024 8-bit grayscale (GRX) or 8+8 color planes (CRX). Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA. Languages: C, C++, Pascal, FORTRAN, ANSI C, Assembler. FORTRAN compiler with "+800" option for series 800 compatibility. Series 800 binaries run on 700 series. Performance (with HP-UX 8.05) and comparison with other workstations: ----------------------------------------------------------------------------- SPEC Khorner- Linp2P x11- Dhry- mark int fp stones MIPS MFLOPS perf stone2.0 ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 72.2 51.0 91.0 143974 76 22 10460 114680 HP9000/720 G/CRX 55.5 39.0 70.2 119213 57 17 8244 87000 IBM 6000/550 54.3 34.5 73.5 n/a 56 23 n/a n/a IBM 6000/320 24.6 16.3 32.4 54661 29.5 8.5 1520 45250 DECstation 5000/200PXGT 18.5 19.0 18.5 26456 24.2 3.7 3256 38760 DECstation 3100 11.3 11.8 10.9 15285 14.9 1.6 1702 23470 Sun SPARCstation 2GX 21.0 20.2 21.5 27142 28.5 4.2 n/a 35590 Sun SPARCstation IPC 11.8 12.4 11.4 13329 15.7 1.7 n/a 22830 ----------------------------------------------------------------------------- Linp2P = Linpack Double precision, 100*100 FORTRAN BLAS, rolled. x11perf = geometric mean of the x11perf1.2 component tests (excluding 1 and 500 pixel tests). Selected x11perf Tests: ----------------------------------------------------------------------------- 10 pixel 10*10 TR create & map Dots lines rects text subwins (50 kids) ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 1630000 911000 278000 273000 6000 HP9000/720 G/CRX 1260000 874000 272000 245000 4500 DECstation 5000/200PXGT 370000 455000 256000 90900 1750 Sun SPARCstation 2GX 101100 147000 83500 49000 1050 ----------------------------------------------------------------------------- Graphics Performance: ----------------------------------------------------------------------------- 2D floating 3D floating pt pt vectors/s vectors/s (peak) ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 1120000 1150000 HP9000/720 G/CRX 1120000 1150000 DECstation 5000/200PXGT 300000 300000 Sun SPARCstation 2GX 450000 240000 ----------------------------------------------------------------------------- Sequential Disk Access Rates: ----------------------------------------------------------------------------- Read (kB/s) Write (kB/s) ----------------------------------------------------------------------------- HP9000/700, 1*210MByte disk 1120 1140 HP9000/700, 1*420MByte disk 1520 1510 HP9000/700, 2*210MByte disk 2070 1800 HP9000/700, 2*420MByte disk 2460 2140 Sun SPARCstation 2, 207MByte disk 744 794 ----------------------------------------------------------------------------- ANSYS SP-3 results (smaller = better): ----------------------------------------------------------------------------- CPU seconds ----------------------------------------------------------------------------- Cray 2 27 HP9000/730,750 G/CRX 49 DEC VAX9000 65 HP9000/720 G/CRX 66 IBM 6000/540 68 DECstation 5000 145 IBM 6000/320 107 Sun SPARCstation 1+ 311 Sun SPARCstation 2 225 ----------------------------------------------------------------------------- HP numbers were measured with series 800 compiler code. No series 700 specific optimizations used.
irf@kuling.UUCP (Bo Thide') (03/27/91)
Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let loose, the official HP info has become available. Some of this info follows. There are three models, the desktop (114mm*508mm*470mm) 720 (Cobra) and 730 (King Cobra) and the deskside (610mm*220mm*595mm) 750 (Coral). They come initially with HP-UX 8.01 to be upgraded to HP-UX 8.05 in June. Later OSF/1 will be available. Clock: 50 MHZ (720) or 66 MHz (730, 750) Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data. Interfaces: SCSI-II, EISA, LAN, RS-232 (to 460.8 kbaud), HP-HIL, Centronics. HP-IB optional (via EISA!). Monitors: 72 Hz, 19" 1280x1024 8-bit grayscale (GRX) or 8+8 color planes (CRX). Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA. Languages: C, C++, Pascal, FORTRAN, ANSI C, Assembler. FORTRAN compiler with "+800" option for series 800 compatibility. Series 800 binaries run on series 700 machines. Performance (with HP-UX 8.05) and comparison with other workstations: ----------------------------------------------------------------------------- SPEC Khorner- Linp2P x11- Dhry- mark int fp stones MIPS MFLOPS perf stone2.0 ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 72.2 51.0 91.0 143974 76 22.9 10460 114680 HP9000/720 G/CRX 55.5 39.0 70.2 119213 57 17.2 8244 87000 IBM 6000/550 54.3 34.5 73.5 n/a 56 23 n/a n/a IBM 6000/320 24.6 16.3 32.4 54661 29.5 8.5 1520 45250 Sun SPARCstation 2GX 21.0 20.2 21.5 27142 28.5 4.2 n/a 35590 DECstation 5000/200PXGT 18.5 19.0 18.5 26456 24.2 3.7 3256 38760 DECstation 3100 11.3 11.8 10.9 15285 14.9 1.6 1702 23470 Sun SPARCstation IPC 11.8 12.4 11.4 13329 15.7 1.7 n/a 22830 ----------------------------------------------------------------------------- Linp2P = Linpack Double precision, 100*100 FORTRAN BLAS, rolled. x11perf = geometric mean of the x11perf1.2 component tests (excluding 1 and 500 pixel tests). Selected x11perf Tests: ----------------------------------------------------------------------------- 10 pixel 10*10 TR create & map Dots lines rects text subwins (50 kids) ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 1630000 911000 278000 273000 6000 HP9000/720 G/CRX 1260000 874000 272000 245000 4500 DECstation 5000/200PXGT 370000 455000 256000 90900 1750 Sun SPARCstation 2GX 101100 147000 83500 49000 1050 ----------------------------------------------------------------------------- Graphics Performance: ----------------------------------------------------------------------------- 2D floating 3D floating pt pt vectors/s vectors/s (peak) ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 1120000 1150000 HP9000/720 G/CRX 1120000 1150000 DECstation 5000/200PXGT 300000 300000 Sun SPARCstation 2GX 450000 240000 ----------------------------------------------------------------------------- Sequential Disk Access Rates: ----------------------------------------------------------------------------- Read (kB/s) Write (kB/s) ----------------------------------------------------------------------------- HP9000/700, 1*210MByte disk 1120 1140 HP9000/700, 1*420MByte disk 1520 1510 HP9000/700, 2*210MByte disk 2070 1800 HP9000/700, 2*420MByte disk 2460 2140 Sun SPARCstation 2, 207MByte disk 744 794 ----------------------------------------------------------------------------- ANSYS SP-3 results (smaller = better): ----------------------------------------------------------------------------- CPU seconds ----------------------------------------------------------------------------- Cray 2 27 HP9000/730,750 G/CRX 49 DEC VAX9000 65 HP9000/720 G/CRX 66 IBM 6000/540 68 DECstation 5000 145 IBM 6000/320 107 Sun SPARCstation 1+ 311 Sun SPARCstation 2 225 ----------------------------------------------------------------------------- HP numbers were measured with series 800 compiler code. No series 700 specific optimizations used.
nazgul@alphalpha.com (Kee Hinckley) (03/27/91)
In article <1998@kuling.UUCP> irf@kuling.DoCS.UU.SE (Bo Thide') writes: >Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA. ^^^^^^^^^^^^ I don't believe this. 1.2 uses the R5 Intrinsics, and while HP is a consortium member and the contractor doing the 1.2 work I can't believe that any of that stuff is stable enough to use. It's not even in beta yet from OSF. If they are releasing it then it's sure to change before the official release. (And we won't even talk about bugs.) -- Alfalfa Software, Inc. | Poste: The EMail for Unix nazgul@alfalfa.com | Send Anything... Anywhere 617/646-7703 (voice/fax) | info@alfalfa.com I'm not sure which upsets me more: that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
iyengar@gradient.cis.upenn.edu (Anand Iyengar) (03/28/91)
In article <1998@kuling.UUCP> irf@kuling.DoCS.UU.SE (Bo Thide') writes: >Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let >... >Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data. Are these external caches (sound too big to be on chip)? How much (if any) delay does a cache access cost? Anand. -- "The nearer your destination, the more you're slip-sliding away..." iyengar@grad1.cis.upenn.edu --- Lbh guvax znlor vg'yy ybbx orggre ebg-guvegrrarg? --- Disclaimer: It's a forgery. -- "The nearer your destination, the more you're slip-sliding away..." iyengar@grad1.cis.upenn.edu --- Lbh guvax znlor vg'yy ybbx orggre ebg-guvegrrarg? ---
daryl@hpcupt3.cup.hp.com (Daryl Odnert) (03/28/91)
> Does anyone know how these numbers were achieved? > > Curtis Yes, I know how they were achieved. 1) Fast hardware. The processor is running at 50MHz or 66MHz, depending on which model you consider. 2) Improved processor architecture. The processor in the Snakes workstation is based on the PA-RISC 1.1 architecture, which is a compatible upgrade from the original PA-RISC 1.0 architecture. Among the significant changes are an expanded floating-point coprocessor register file that now has 32 64-bit registers, also addressable as 64 32-bit registers. There were also new multiply-and-add and multiply-and-subtract floating-point instructions, and an integer multiply instruction. 3) Enhanced compilers. Several new optimizations have been implemented in HP's PA-RISC compilers, and more are on the way which will continue to improve benchmark and application performance. For more information, see the 3 papers from HP published in the Spring '91 COMPCON Digest of Papers, pages 202-218. Regards, Daryl Odnert daryl@hpcllla.cup.hp.com Hewlett-Packard California Language Lab Cupertino, California
daryl@hpcupt3.cup.hp.com (Daryl Odnert) (03/28/91)
> Are the new PA machines binary compatible with the old ones? Reading > the press, it sounds like there are new instructions in the new > machines so that while the old programs will run, the full performance > will not be achieved unless you recompile. Is this true? > > Donald A. Lewine Yes, this is true. The new PA-RISC machines, the HP9000 Series 700, are binary compatible in the sense that PA-RISC 1.0 HP-UX binaries will run without modification on the PA-RISC 1.1 processor running HP-UX. To take advantage of the new instructions and registers in PA-RISC 1.1 you do need to recompile. Code compiled for PA-RISC 1.1 will not run on PA-RISC 1.0 based systems. Daryl Odnert daryl@hpcllla.cup.hp.com Hewlett-Packard California Language Lab Cuptertino, California
linley@hpcuhe.cup.hp.com (Linley Gwennap) (03/29/91)
How's I/O on this thing? Would use as a fileserver be a shameful waste? Raul Rockwell ---------- HP has worked hard to optimize system performance on the Series 700 instead of focusing on a few small benchmarks. The June '91 release of HP-UX will support a high-performance EISA SCSI adapter providing a 10MB/second (burst) transfer rate to the disk. When combined with HP's differential disk storage system, this is among the highest performance SCSI subsystems available. For example, based on numbers from the SPARCstation2 Performance Brief, the Series 700 provides 50% more throughput on sequential disk accesses. The Series 700 provides superior LAN performance as well. In fact, our tests show NFS, Berkeley Stream Socket, and FTP transfers at up to 1000 KB/second, nearly the theoretical maximum of 1200 KB/second for IEEE802.3 LAN transfers. Of course, these results will vary depending on the CPU load at the server, but the Series 700 has plenty of CPU power to solve that problem! --Linley Gwennap Hewlett-Packard
linley@hpcuhe.cup.hp.com (Linley Gwennap) (03/29/91)
(Donald Lewine) asks: Are the new PA machines binary compatible with the old ones? Reading the press, it sounds like there are new instructions in the new machines so that while the old programs will run, the full performance will not be achieved unless you recompile. Is this true? ---------- The Series 700 systems are fully binary compatible with our Series 800 RISC product family of workstations and multi-user systems. It is true that older programs will not take advantage of the new PA-RISC 1.1 instructions, which primarily affect floating point and graphic applications. Dhrystones, for example, are unaffected; SPECmarks are improved by 20%-30% by recompiling. Of course, since the Series 700 uses the same HP-UX operating system as the Series 800, recompiling is simple and fast (as long as you have source). If you cannot recompile, your application will still run faster than it would on any other workstation available today. --Linley Gwennap Hewlett-Packard
linley@hpcuhe.cup.hp.com (Linley Gwennap) (03/29/91)
(Anand Iyengar) asks
>Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.
Are these external caches (sound too big to be on chip)? How much
(if any) delay does a cache access cost?
----------
The caches on the Series 700 are all implemented external to the chip
using standard commercially available SRAMs. There is no delay for a
cache access; so long as the cache is hit, instructions execute one
per clock cycle.
--Linley Gwennap
Hewlett-Packard
wcs) (04/01/91)
In article <32580009@hpcuhe.cup.hp.com> linley@hpcuhe.cup.hp.com (Linley Gwennap) writes:
] > Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.
] > Are these external caches (sound too big to be on chip)? How much
] > (if any) delay does a cache access cost?
] The caches on the Series 700 are all implemented external to the chip
] using standard commercially available SRAMs. There is no delay for a
] cache access; so long as the cache is hit, instructions execute one
] per clock cycle.
Are you saying LOAD and STORE instructions take 1 cycle? !??!
I thought the 700 took 2-4 cycles, like most machines.
Can you at least overlap loads and stores if you use separate
registers, for applications like bcopy()?
--
# Bill Stewart 908-949-0705 erebus.att.com!wcs AT&T Bell Labs 4M-312 Holmdel NJ
(Little Girl:) When I grow up, I want to be a nurse } From this week's UFT
(Little Boy:) When I grow up, I want to be an engineer } radio commercial
.... guess the Political Correctness Police don't run NYC's teachers' union yet
linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/02/91)
> Are you saying LOAD and STORE instructions take 1 cycle? !??! > I thought the 700 took 2-4 cycles, like most machines. > Can you at least overlap loads and stores if you use separate > registers, for applications like bcopy()? > -- > # Bill Stewart In "real life", loads take 2 cycles and stores take 3 cycles on the Series 700 processors. However, the CPU will appear to execute loads and stores in a single cycle as long as there is no interlock or dependency on the target register. The compiler will usually schedule instructions to avoid such interlocks. In your example, loads and stores can be overlapped so long as separate registers are used. To address another question (I'm try to restrict myself to one posting here), HP is committed to delivering OSF/1 on the Series 700 by the end of the year (1991). OSF/1 is not available for immediate delivery. Thus, no problems with excessive bugs and/or changes. --Linley Gwennap Hewlett-Packard Co.
linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/02/91)
I'd like to take a moment to respond to comments that the Series 700 has achieved its high performance due primarily to high (66 MHz) clock frequencies resulting from advanced CMOS processes. While HP's IC processes are as good as anyone's, the Series 700 CPU is implemented in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard throughout the industry. It is nearly identical to IBM's 1.0 micron process used in their 20-30 MHz RS/6000s, and not as dense as the 0.8 micron process used in IBM's 41 MHz Model 550. TUTORIAL MODE ON: The cycle time of a CPU is determined by the longest amount of time required to complete a single pipeline stage. This in turn is driven by the number of gate delays, the length of the gate delay, the number of chip crossings, and the length of time to cross between chips. Of these factors, only the length of the gate delay is determined directly by the IC process. The number of gate delays is determined by the number of pipeline stages and the complexity of the design (and of course the skill of the designer). The R4000 uses a longer (8-stage) pipeline to reduce the number of gates needed for a single stage and thus reduce the cycle time. The Series 700 focused on reducing the complexity of the design to achieve a fast cycle time. The RS/6000 is not focused on cycle time at all, resulting in a complex design and a relatively slow clock frequency. The RS/6000 also suffers from the chip crossing problem. The complex superscalar design requires 8 chips to implement instead of 3 on the Series 700, forcing signals to move from chip to chip in a single clock cycle. This additional overhead further slows the RS/6000 clock. To improve its clock frequency, IBM must either (a) use a denser IC process to cram more circuitry onto a smaller number of chips; (b) use a multi-chip module to reduce chip crossing delays; or (c) simplify the America design by eliminating some of the superscalar complexities. Of course, cycle time in itself does not determine CPU performance, but that is a different discussion. TUTORIAL MODE COMPLETE In conclusion, the Series 700 uses a simple, efficient processor design, coupled with state-of-the-art IC processes, to achieve high clock frequencies and high performance. HP's 1.0 micron IC process has as much room to evolve as other vendors' processes, and more headroom than IBM's 41 MHz RS/6000. The Series 700 family will continue to improve in performance as IC processes improve. --Linley Gwennap Hewlett-Packard
linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/02/91)
By request, complete SPEC numbers on the Series 700: gcc expres li eqntot spice doduc nasa matrix fpppp tomcat Model 720 35.2 42.5 38.1 40.6 46.9 48.6 58.0 210.0 81.4 52.9 Model 730 46.5 55.2 50.3 52.6 60.9 64.0 73.7 273.3 107.0 67.4 Model 750 46.5 55.2 50.3 52.6 60.9 64.0 73.7 273.3 107.0 67.4 These are current results obtained from compilers to be released in June '91. They may change slightly before submission to SPEC. --Linley Gwennap Hewlett-Packard
cag@hpfcso.FC.HP.COM (Craig Gleason) (04/02/91)
>(Anand Iyengar) asks >>Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data. > Are these external caches (sound too big to be on chip)? How much >(if any) delay does a cache access cost? >---------- >The caches on the Series 700 are all implemented external to the chip >using standard commercially available SRAMs. There is no delay for a >cache access; so long as the cache is hit, instructions execute one >per clock cycle. > --Linley Gwennap > Hewlett-Packard This isn't strictly true. All loads and I-fetches take one cycle. There is a load-use penalty of one cycle when the instruction after the load uses the load data as an operand. Stores take three cycles on the data cache, but if they are followed by non-data cache accesses, there will be no penalty. Therefore the store penalty can be zero, one or two states, and the compilers can schedule accordingly. The processor was optimized for loads/I-fetches since the biggest performance lever is processor frequency. Slowing down to allow a one or two cycle store would not have been a good performance tradeoff. Craig Gleason Hewlett-Packard
jbs@WATSON.IBM.COM (04/02/91)
Linley Gwennap writes:
If you cannot recompile, your application will still run
faster than it would on any other workstation available today.
Isn't this a bit of an exaggeration? Suppose your application
resembles tomcatv?
James B. Shearer
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (04/02/91)
In article <32580010@hpcuhe.cup.hp.com> linley@hpcuhe.cup.hp.com (Linley Gwennap) writes: >To address another question (I'm try to restrict myself to one posting >here), HP is committed to delivering OSF/1 on the Series 700 by the >end of the year (1991). OSF/1 is not available for immediate delivery. >Thus, no problems with excessive bugs and/or changes. I'm rather interested in the availability of 4.4BSD or 4.3BSD-reno. Dose anyone know anything about that? HP-UX or any other SysV based OS is too painful to administrate. Masataka Ohta
gary@chpc.utexas.edu (Gary Smith) (04/02/91)
Have any of the computational chemistry software systems, such as GAUSSIAN-90, been ported to the new HP machines? -- Gary Smith <gary@chpc.utexas.edu> Systems Group, Center for High Performance Computing The University of Texas System Commons Building 1.151C, Balcones Research Center 10100 Burnet Road Austin, TX 78758 (512) 471-2411
peter@ficc.ferranti.com (Peter da Silva) (04/03/91)
In article <31@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: > HP-UX or any other SysV based OS is too painful to administrate. What have you been smoking? I want some. BSD system administration hasn't changed significantly since V7. You still have to add drivers by editing makefiles, one way or the other. Compared to the System V sysadm stuff, and the idmk* programs, it's like stone knives and bear skins. And where Berkeley *has* innovated in system administration it's not done such a good job. I'm no big fan of MMDF, but next to sendmail.cf it's a masterpiece of clarity. We have some SPARCstations here and I'm dreading hooking them into the mail network. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
mslater@cup.portal.com (Michael Z Slater) (04/03/91)
Linley Gwennap from HP writes: >While HP's IC >processes are as good as anyone's, the Series 700 CPU is implemented >in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard >throughout the industry. I think this understates the value of HP's process. IBM may have 3-layer metal, but I am not aware of any commercially available microprocessor fabricated in a 3-layer metal process. I'm not a process expert by a long shot, but from what I understand, there is a speed advantage here. Also, from casual discussions with others who have had chips fabbed at various foundries, including HP, HP's process has a reputation for being one of the fastest around. Michael Slater, MIcroprocessor Report mslater@cup.portal.com
jdr@sloth.mlb.semi.harris.com (Jim Ray) (04/03/91)
In article <40812@cup.portal.com> mslater@cup.portal.com (Michael Z Slater) writes: >Linley Gwennap from HP writes: > >>While HP's IC >>processes are as good as anyone's, the Series 700 CPU is implemented >>in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard >>throughout the industry. > >I think this understates the value of HP's process. IBM may have 3-layer >metal, but I am not aware of any commercially available microprocessor >fabricated in a 3-layer metal process. I'm not a process expert by a long >shot, but from what I understand, there is a speed advantage here. > >Also, from casual discussions with others who have had chips fabbed at various >foundries, including HP, HP's process has a reputation for being one of the >fastest around. > >Michael Slater, MIcroprocessor Report mslater@cup.portal.com I could have swarn that at least one of the chips ( one of the 3 used in the "snakes") , is foundered by Hitachi. Anyone else hear anything of this? -- Jim Ray Harris Semiconductor Internet: jdr@semi.harris.com PO Box 883 MS 62B-022 Phone: (407) 729-5059 Melbourne, FL 32901
glew@pdx007.intel.com (Andy Glew) (04/03/91)
>Stores take three cycles on the data cache, but if they are followed by >non-data cache accesses, there will be no penalty. Therefore the store >penalty can be zero, one or two states, and the compilers can schedule >accordingly. What are the penalties for back-to-back loads? With/without dependencies? Back-to-back stores? With/without deperndencies? I.e. what are the penalties for the following code fragments: (1) Load followed by independent non-memory code (assuming all registers immediately ready): load r1 := M[imm] add r2 := r3 + r4 add r2 := r3 + r4 add r2 := r3 + r4 (2) Load followed by dependent non-memory code (assuming all registers immediately ready): load r1 := M[imm] add r2 := r3 + r1 add r2 := r3 + r1 add r2 := r3 + r1 (3) Load followed by independent load: load r1 := M[imm] load r2 := M[imm2] (4) Load followed by dependent load: load r1 := M[imm] load r2 := M[r1] (5) Store followed by independent non-memory code: store M[imm] := r1 add r2 := r3 + r4 add r2 := r3 + r4 add r2 := r3 + r4 (6) Store followed by store store M[imm] := r1 store M[imm2] := r2 (7) Store followed by independent load: store M[imm] := r1 load r2 := M[imm2] (8) Store followed by dependent load: store M[imm] := r1 load r2 := M[imm] (9) Load followed by independent store: load r1 := M[imm] store M[imm2] := r2 (10) Load followed by dependent store: load r1 := M[imm] store M[r1] := r2 Have I missed any cases? -- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
mlord@bwdls58.bnr.ca (Mark Lord) (04/04/91)
In article <> cag@hpfcso.FC.HP.COM (Craig Gleason) writes:
<... All loads and I-fetches take one cycle. There
<is a load-use penalty of one cycle when the instruction after the load
<uses the load data as an operand.
<
<Stores take three cycles on the data cache, but if they are followed by
<non-data cache accesses, there will be no penalty. Therefore the store
<penalty can be zero, one or two states, and the compilers can schedule
<accordingly.
Thanks for this very useful info. A question: what happens with successive
stores? We seem to have trouble avoiding a lot of these in applications
around here (BNR), even after the optimizer has its go at things. Mostly these
occur early in procedures which need to save lots of parameters before invoking
another procedure as the first activity in the current procedure (am I confused
or what!). As such, the optimizer has trouble spreading them out, and quite
often they end up clumped together: store,store,store,store.
What sort of penalty does this incur cycle-wise (and why) ?
--
MLORD@BNR.CA Ottawa, Ontario *** Personal views only ***
begin 644 NOTSHARE.COM ; Free MS-DOS utility - use instead of SHARE.EXE
MZQ.0@/P/=`J`_!9T!2[_+H``L/_/+HX&+`"T2<TAO@,!OX0`N1(`C,B.P/.DS
<^K@A-<TAB1Z``(P&@@"ZA`"X(27-(?NZE@#-)P#-5
``
end
maj@hpfcso.FC.HP.COM (Mike Jassowski) (04/05/91)
> I could have swarn that at least one of the chips ( one of the 3 used > in the "snakes") , is foundered by Hitachi. Anyone else hear anything > of this? ---------- The floating point coprocessor was developed jointly by HP and TI and is fabbed at TI. --Mike Jassowski
clc5q@madras.cs.Virginia.EDU (Clark L. Coleman) (04/06/91)
In article <32580012@hpcuhe.cup.hp.com> linley@hpcuhe.cup.hp.com (Linley Gwennap) writes: >I'd like to take a moment to respond to comments that the Series 700 >has achieved its high performance due primarily to high (66 MHz) clock >frequencies resulting from advanced CMOS processes. While HP's IC >processes are as good as anyone's, the Series 700 CPU is implemented >in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard >throughout the industry. It is nearly identical to IBM's 1.0 micron >process used in their 20-30 MHz RS/6000s, and not as dense as the >0.8 micron process used in IBM's 41 MHz Model 550. Maybe you could clarify this for me. I read that the main CPU was 1.0 micron geometry and the Texas Instruments floating point coprocessor was 0.8 micron from a TI process. It sounds like the floating point unit has a process at least as good as any competitors' processes, and the main CPU and cache chips are about average for this market segment. Is this right? ----------------------------------------------------------------------------- "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence." E.W.Dijkstra, 18th June 1975. ||| clc5q@virginia.edu (Clark L. Coleman)
danw@tbird9.prime.com (Dan Westerberg) (04/06/91)
In article <1991Apr5.183233.26573@murdoch.acc.Virginia.EDU>, clc5q@madras.cs.Virginia.EDU (Clark L. Coleman) writes: |> In article <32580012@hpcuhe.cup.hp.com> linley@hpcuhe.cup.hp.com (Linley Gwennap) writes: |> >I'd like to take a moment to respond to comments that the Series 700 |> >has achieved its high performance due primarily to high (66 MHz) clock |> >frequencies resulting from advanced CMOS processes. While HP's IC |> >processes are as good as anyone's, the Series 700 CPU is implemented |> >in a 1.0 micron, 3-metal-layer CMOS process which is pretty standard |> >throughout the industry. It is nearly identical to IBM's 1.0 micron |> >process used in their 20-30 MHz RS/6000s, and not as dense as the |> >0.8 micron process used in IBM's 41 MHz Model 550. |> |> Maybe you could clarify this for me. I read that the main CPU was 1.0 micron |> geometry and the Texas Instruments floating point coprocessor was 0.8 micron |> from a TI process. It sounds like the floating point unit has a process |> at least as good as any competitors' processes, and the main CPU and cache |> chips are about average for this market segment. Is this right? Please correct me if I'm wrong since I design gate arrays, but are 3-metal-layer processes *average* ? I was under the impression that 3-metal-layer's are on the cutting edge of technology, tough to design and fab, atleast wrt gate arrays. Is this radically different in full-custom design? |> |> ----------------------------------------------------------------------------- |> "The use of COBOL cripples the mind; its teaching should, therefore, be |> regarded as a criminal offence." E.W.Dijkstra, 18th June 1975. |> ||| clc5q@virginia.edu (Clark L. Coleman) dan -- =============================================================================== | | | Daniel I. Westerberg email: danw@tbird9.prime.com (or) | | Prime Computer Inc. danw@s49.prime.com | | MS 10-9 | | 500 Old Connecticut Path phone: 508-620-2800 x3644 | | Framingham, MA 10701 fax: 508-879-9098 | | | ===============================================================================
linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/09/91)
(Michael Z Slater) notes: > IBM may have 3-layer > metal, but I am not aware of any commercially available microprocessor > fabricated in a 3-layer metal process. I'm not a process expert by a long > shot, but from what I understand, there is a speed advantage here. The third metal layer is indeed an advantage for high speed microprocessors. It can allow lower skew on internal clock distribution networks, more area devoted to power/ground busing to reduce noise, and often shorter signal interconnects which minimize RC delays. It is not, however, as significant as shrinking the overall device size below one micron. While HP's IC fabrication processes are very good, they are not out on the "bleeding edge", so there's plenty of room for future improvement. (For bleeding edge, try TI's 0.8 micron BiCMOS process!) To clarify another inquiry, of the three large chips (CPU, FPU, MC) in the Series 700, two are fabbed by HP. The FPU is fabbed by TI in a 0.8 micron, two metal-layer CMOS process. While Hitachi is not involved in the Series 700, they have licensed the PA-RISC architecture and are expected to produce PA-RISC chips in the future. --------------------------------------------------------------------------- DISCLAIMER: The views expressed here do not Linley Gwennap represent the views of the Hewlett-Packard PA-RISC Marketing Company. Caveat emptor. Hewlett-Packard
linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/12/91)
(Dan Westerberg) asks: > Please correct me if I'm wrong since I design gate arrays, but are > 3-metal-layer processes *average*? Let me clarify. HP's IC processes are among the industry leaders for large-scale CPU designs (CPUs tend to push the boundaries). I would not call our IC processes "average". The 3-metal-layer is one area where we are probably ahead of most of the industry. In the area of device size, at 1 micron we are with the industry leaders but not really pushing the limits. In short, there are plenty of technological advances left to apply to the Series 700 CPU chip. The FPC chip is fabricated by Texas Instruments in a different process. This chip uses a 0.8 micron, 2-metal-layer TI process. This denser process helps keep the cost of the FPC chip down. --------------------------------------------------------------------------- DISCLAIMER: The views expressed here do not Linley Gwennap represent the views of the Hewlett-Packard PA-RISC Marketing Company. Caveat emptor. Hewlett-Packard
sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (04/13/91)
In article <32580016@hpcuhe.cup.hp.com>, linley@hpcuhe.cup.hp.com (Linley Gwennap) writes: > While HP's IC >fabrication processes are very good, they are not out on the "bleeding >edge", so there's plenty of room for future improvement. (For bleeding >edge, try TI's 0.8 micron BiCMOS process!) Can you (without lotta headaches) put HP-PA on Gallium Arsenide <sp>? >To clarify another inquiry, of the three large chips (CPU, FPU, MC) in >the Series 700, two are fabbed by HP. The FPU is fabbed by TI in a 0.8 >micron, two metal-layer CMOS process. So the TI floating-point goodie is the most expensive chip sitting in your box, due to the fabrication process, among other things? (TI: Good, but pricy) Could you offer a Real Cheap Snake (snakette?) with a goal of 20MIPS/5K given the current technology? > While Hitachi is not involved in the Series 700, they have licensed the >PA-RISC architecture and are >expected to produce PA-RISC chips in the future. Will they be offering complementary low-end machines to the Snake? Have any other companies considered licensing PA-RISC? Signature envy: quality of some people to put 24+ lines in their .sigs -- > SYSMGR@CADLAB.ENG.UMD.EDU < --
mike@UC780.UMD.EDU (Mike Santangelo) (04/17/91)
In article <00947104.8B2D8080@KING.ENG.UMD.EDU>, sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes: >In article <32580016@hpcuhe.cup.hp.com>, linley@hpcuhe.cup.hp.com (Linley Gwennap) writes: >> While HP's IC >>fabrication processes are very good, they are not out on the "bleeding >>edge", so there's plenty of room for future improvement. (For bleeding >>edge, try TI's 0.8 micron BiCMOS process!) > >Can you (without lotta headaches) put HP-PA on Gallium Arsenide <sp>? > >>To clarify another inquiry, of the three large chips (CPU, FPU, MC) in >>the Series 700, two are fabbed by HP. The FPU is fabbed by TI in a 0.8 >>micron, two metal-layer CMOS process. > >So the TI floating-point goodie is the most expensive chip sitting in your box, >due to the fabrication process, among other things? (TI: Good, but pricy) Could >you offer a Real Cheap Snake (snakette?) with a goal of 20MIPS/5K given the >current technology? > Rumor has it that HP will indeed introduce a "snakette" and a 90Mhz version of the current systems in the Fall. >> While Hitachi is not involved in the Series 700, they have licensed the >>PA-RISC architecture and are >>expected to produce PA-RISC chips in the future. > >Will they be offering complementary low-end machines to the Snake? Have any >other companies considered licensing PA-RISC? > I don't think Hitachi is in anyway involved with the snakes and doesn't plan to be. They will introduce their own line as I recall sometime in the not so distant future. > > Signature envy: quality of some people to put 24+ lines in their .sigs > -- > SYSMGR@CADLAB.ENG.UMD.EDU < -- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Michael F. Santangelo + Inet: mike@uc780.umd.edu VMS / UNIX Systems + mike@socrates.umd.edu Academic Computing UMUC + Bnet: MIKE@UC780 (The University of Maryland, + MIKE@UMUC (not visited often) University College) +<Your clever net-phrase here>
sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (04/18/91)
>Rumor has it that HP will indeed introduce a "snakette" and a 90Mhz >version of the current systems in the Fall. > Thanks, I can read the rumors column like anyone else ;-). I was hoping to get a comment or two from an HP employee <carefully waving six-pack of beer around> I'm also interested in the cost of the TI floating-point chip when compared to the other HP-PA stuff, and if the floating-point chip could be made a socketed option... >>Will they be offering complementary low-end machines to the Snake? Have any >>other companies considered licensing PA-RISC? > >I don't think Hitachi is in anyway involved with the snakes and doesn't plan >to be. They will introduce their own line as I recall sometime in the not >so distant future. I am referring to marketing strategy. Sun has very successfully (to this date) worked a "Middle of the Road" approach, letting Solbourne and others work on either faster/stronger/multiprocessor options and affordable/cheaper/commodity machines (such as the CompuAdd box). Hitachi and a Korean manufacturer (name escapes me) have licensed HP-PA. If they don't have something which is binary-compatible with the HP-PA lines, I'd be really surprised. Kinda waste of time and money.... Signature envy: quality of some people to put 24+ lines in their .sigs -- > SYSMGR@CADLAB.ENG.UMD.EDU < --
mike@vlsivie.tuwien.ac.at (Michael K. Gschwind) (04/19/91)
In article <00947104.8B2D8080@KING.ENG.UMD.EDU> sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes: >So the TI floating-point goodie is the most expensive chip sitting in your box, >due to the fabrication process, among other things? (TI: Good, but pricy) Could >you offer a Real Cheap Snake (snakette?) with a goal of 20MIPS/5K given the >current technology? > >> While Hitachi is not involved in the Series 700, they have licensed the >>PA-RISC architecture and are >>expected to produce PA-RISC chips in the future. > >Will they be offering complementary low-end machines to the Snake? Have any >other companies considered licensing PA-RISC? Samsung is supposed to build low-end Snakes. According to what I hear from HP, they will cut back on floating point performance to achieve this goal. The HP people I talked with only mentioned Hitachi and Samsung as licensees. mike Michael K. Gschwind, Dept. of VLSI-Design, Vienna University of Technology mike@vlsivie.tuwien.ac.at 1-2-3-4 kick the lawsuits out the door mike@vlsivie.uucp 5-6-7-8 innovate don't litigate e182202@awituw01.bitnet 9-A-B-C interfaces should be free Voice: (++43).1.58801 8144 D-E-F-O look and feel has got to go! Fax: (++43).1.569697
frank@grep.co.uk (Frank Wales) (04/20/91)
In article <00947448.7A748120@KING.ENG.UMD.EDU> sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes: >Hitachi and a Korean manufacturer (name escapes me) have licensed HP-PA. Samsung. It seems Mitsubishi have also adopted PA for future workstations. -- Frank Wales, Grep Limited, [frank@grep.co.uk<->uunet!grep!frank] Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303
martelli@cadlab.sublink.ORG (Alex Martelli) (04/23/91)
sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes:
...
:I am referring to marketing strategy. Sun has very successfully (to this date)
:worked a "Middle of the Road" approach, letting Solbourne and others work on
:either faster/stronger/multiprocessor options and affordable/cheaper/commodity
:machines (such as the CompuAdd box).
I have not yet seen ONE Sun clone that is cheaper than a trueblue Sparcstation
SLC from Sun itself (possibly with 3rd party 8->16 meg RAM expansion, and
with a 3rd party external SCSI-connected box with disc/tape/whatever) - so
much for 'affordable/cheaper/commodity' machines!
--
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia
Email: (work:) martelli@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434;
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).
ajayshah@alhena.usc.edu (Ajay Shah) (04/27/91)
In article <772@cadlab.sublink.ORG> martelli@cadlab.sublink.ORG (Alex Martelli) writes: >I have not yet seen ONE Sun clone that is cheaper than a trueblue Sparcstation >SLC from Sun itself (possibly with 3rd party 8->16 meg RAM expansion, and >with a 3rd party external SCSI-connected box with disc/tape/whatever) - so >much for 'affordable/cheaper/commodity' machines! The SLC is truly a cheap machine. But RAM <= 16Meg, no SBus slots, 12.5 mips without expandability is not a solution for everyone. But it's really priced low -- just add up the cost of the spare parts going into it (17" mono, 8Meg, etc.) and you have it's University price of $2700. The latest Unix World has an ad for what feels like a SS-II clone (called SparcClone 2) for $8k. Now that makes sense -- Sun is making a lot of money on the SS-II at $15k (try adding up spare parts prices again). It should be possible for a clonemaker to beat 'em on price here handily. There are a host of SS-1 clones at prices like $5k these days, a market which Sun has essentially exited. BTW, I just saw a SPARC clone built by Xerox at a USC Sun computer fair. That sounds rather important to me -- isn't a clone by Xerox big news?? -- _______________________________________________________________________________ Ajay Shah, (213)734-3930, ajayshah@usc.edu The more things change, the more they stay insane. _______________________________________________________________________________
sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (04/29/91)
In article <772@cadlab.sublink.ORG>, martelli@cadlab.sublink.ORG (Alex Martelli) writes: >sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes: > ... >:I am referring to marketing strategy. Sun has very successfully (to this date) >:worked a "Middle of the Road" approach, letting Solbourne and others work on >:either faster/stronger/multiprocessor options and affordable/cheaper/commodity >:machines (such as the CompuAdd box). > >I have not yet seen ONE Sun clone that is cheaper than a trueblue Sparcstation >SLC from Sun itself (possibly with 3rd party 8->16 meg RAM expansion, and >with a 3rd party external SCSI-connected box with disc/tape/whatever) - so >much for 'affordable/cheaper/commodity' machines! Most companies have knocked off the SS-I, with 3 S-bus slots and monitor (see CompuAdd, other no-namers). Look at what you get with the SLC vs a SparcStation I clone. Both list at around $5,000, but the Sparclone is a better value (expandable, not fudged in terms of RAM expansion). Using "true blue" and "Sun" in the same sentence is a contradiction in terms. Sun doesn't make IBM equipment :-) Signature envy: quality of some people to put 24+ lines in their .sigs -- > SYSMGR@CADLAB.ENG.UMD.EDU < --