mcg@omepd (Steven McGeady) (04/07/88)
The following is a ***PRESS RELEASE*** distributed by Intel today (4/6). If anyone thinks that the repetition of press releases in this forum is inappropriate, please stop reading here and take the matter up with me via e-mail. On the other hand, I feel this is a service to the net. I have offered the release with a minimum of editing to remove the more content-free parts that might most offend the net, and have added some details present in other distributed materials. When I have time (probably in about a week), I will post a more detailed discussion of the 80960 architecture. In the meantime, For detailed information, please contact your local Intel sales office or the phone number listed below. S. McGeady Intel Corporation mcg@omepd.intel.com mcg@iwarp.intel.com tektronix!ogcvax!omepd!mcg intelca!omepd!mcg ----------------------------------------------------------------------------- INTEL ANNOUNCES FIRST EMBEDDED CONTROL PRODUCTS AND TOOLS BASED ON NEW 80960 ARCHITECTURE Chandler, AZ, April 6, 1988 - Intel Corp. today announced a new 32-bit microprocessor architecture that integrates RISC design techniques, and is optimized for high-performance embedded control applications. The 32-bit core architecture, the 80960, has parallelism and modular features to enable future processors to have very high performance levels, beyond those scaled to typical speed increases. The modularity of the core architecture also provides the basis for Intel to develop market-specific processors. Applications for these processors include image-processing, protocol handling and motor control. The 80960 processors' performance start at 7.5 VAX MIPS on a single 32-bit processor. "The 80960 architecture has been created specifically to address the product development requirements of the embedded control marketplace well into the 1990's," said Dave House, Intel Microprocessor Components Group senior vice president. "We have incorporated features that assure continuing growth in performance, coupled with features that make the 80960 family cost effective and easy-to-use." -- The 80960KB and the 80960KA are the first two available processors based on the core architecture. These embedded processors, based on more than 350,000 transistors, incorporate specific attributes to meet the high-performance needs of the system control segment of the embedded control marketplace. Immediate applications for these two embedded processors are numerics processing, robotics and high-speed wide-area telecommunications. "The 80960 embedded processors provide significant price-performance advantages over most other single-chip, 32-bit embedded solutions," said Alan Steinberg, product line marketing manager. "For example, the 80960KB is the only processor available which integrates an on-chip floating-point unit - at four MegaWhetstones - with a 20MHz clock. That is more than twice The performance at one-half the cost of other available processors." -- The highly-integrated 80960KB has a number of functions on-chip that are characteristic of multiple-chip solutions. On-chip functions include 32 32-bit registers, the FPU [with four additional 80-bit registers], a 512-byte instruction cache, a stack frame cache, and a 32-bit multiplexed burst bus. [Interrupt controller - 256 programmable vectors] [IEEE-754 compliant FPU, with single, double, and extended (80-bit) precision operations.] [Burst bus can load four words at a time.] ... Every design decision was made toward optimizing overall system costs. "The market for embedded control applications usually has strict cost points for end systems. Providing the option to use lower-cost DRAMs is a good example of how we help designers contain overall costs without sacrificing performance." [80960KA is 80960KB without FPU]. -- Intel is providing development tools ... for the new embedded processors today. The Starter Kit ... contains the EVA-960KB software evaluation vehicle [a plug-in board for the PC-AT with a 20MHz processor a 1Mb of SRAM] and the ASM-960 assembler [also the linker, librarian, namelister, etc, based on familiar UN*X tools]. [This starter kit] is priced at $6000 ... [A second] Starter Kit is tuned to embedded control application benchmarking as well as large, sophisticated code development for the 80960KB. [This kit] also consists of the EVA-960 and ASM-960, plus the iC960 C language compiler with ANSI extensions [prototypes, const, volatile, etc]. iC960 also includes a retargetable STDIO library, full 32-, 64-, and 80-bit IEEE-compatible floating-point library, and in-line assembly languages [inserts, that fit with quality compiler register allocation.] [This start kit is priced at] $6800. In addition to the development tools provided by Intel, a broad range of products supporting the 80960 architecture are being offered by independent software and hardware vendors [including] Bauer Electronics [Postscript clone], GenRad, Advanced Computer Techniques [compilers], JMI, Logic Automation, Mentor Graphics [CAD design support], Ready Systems [Real-time kernel], and Tartan Labs [Ada compiler]. -- The 80960KA and the 80960KB are both available in 20MHz CHMOS* III configurations. Both embedded processors operate at a sustained 7.5 MIPS and 15K Dhrystones rates. The 80960KB is priced at $390 in 100-piece quantities, and is packaged in a 132-lead pin grid array. The 80960KA, available in the fourth quarter of 1988, will be $174 in 100-piece quantities. [The KB is available in quantity now.] Intel plans to offer 25MHz versions of the two embedded processors in early 1989. For more information, call a local Intel sales office or 1-800-548-4725, or write Intel Corp., Literature Dept. #W427, 3065 Bowers Ave, Santa Clara, CA 95051. [Other interesting information: the 80960 silicon has been used extensively inside Intel since early 1986, and has run (literally) millions of lines of code in a variety of applications. The chip is *very* well debugged. The reference to "parallelism and modualr features" in the first paragraph is a reference to other materials which allude to (near) future implementations which will be able to execute three instructions in the same clock cycle. The 80960KA and KB currently can overlap two instructions in certain cases. The KA and KB implement "scoreboarding" of registers and condition codes to allow multiple instruction execution. This scoreboarding allows the 80960 architecture to hide the details of its instruction pipeline, allowing complete binary software compatibility with future implementations with different pipeline restrictions. The 80960 is a three-address load/store architecture with 32 general registers, 16 standard ("global") registers, and 16 ("local") registers that are provided fresh for a routine invoked by the "call" instruction. Implementations cache multiple sets of these local registers on chip, flushing previous sets to memory. The KA and KB store four sets on chip, for a total of 80 on-chip general registers. More Later....] -----------------------------------------------------------------------------
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/08/88)
Questions on the 80960: 1) why now? 2) why didn't they release this instead of the 80386? 3) why is it for "embedded applications" (as opposed to general use)? 4) what about memory management? I suspect that the answers to 2,3,4 are realted... -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
rminnich@udel.EDU (Ron Minnich) (04/09/88)
In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: >Questions on the 80960: > 2) why didn't they release this instead of the 80386? > 3) why is it for "embedded applications" (as opposed to general use)? Conjecture: with all the 'high level' architectures Intel has released it is politically impossible (inside the company) for them to embrace a general-purpose RISC. So ya shunt the RISC into embedded microcontroller applications. Course, whether all those high level architectures have been worth much is another story ... Just Guessing. -- ron (rminnich@udel.edu)
jimv@radix (Jim Valerio) (04/10/88)
In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) asks: > 2) why didn't they release [the 80960] instead of the 80386? Instead? The 386 was a guaranteed business success; it would have been crazy not to capitalize on the marketplace. Perhaps a better question is why weren't both architectures sold back then? As I understand it, there were lots of reasons, including but certainly not limited to concern about whether there was sufficient Fab capacity for both processors, and what message two "competing" processor architectures would give to customers. You should also remember that around the time this was happening, Intel was reporting losses for the first time since it had started being profitable (1972?), and tight times aren't usually the best times for unnecessary risks. > 4) what about memory management? Also announced was the 80960MC. "The 80960MC is a military qualified version of the KB with memory management and Ada tasking support." > 3) why is it for "embedded applications" (as opposed to general use)? The simple answer is that that is the organization that wanted a new processor architecture was the embedded controller organization, and not the microprocessor organization, which seems to be firmly committed to future 86 family products. Personally, I see a marketing tightrope being walked here. You will note that the memory management version is only being announced with military spec's, presumably also at military prices. I expect that Motorola is walking a similar line with the 68K and 88K product lines. > 1) why now? Sorry, I won't touch this. :-) -- Jim Valerio {verdix,omepd}!radix!jimv, "radix!jimv"@omepd.intel.com
mcg@omepd (Steven McGeady) (04/12/88)
In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: >Questions on the 80960: > 1) why now? Although the processor silicon has been available and working since December 1985, it was felt that it was proper that we give ourselves more time to develop quality software tools for the processors, to more thoroughly validate the correctness of the silicon, and (frankly) to let the 386 more thoroughly saturate the marketplace. Rather than the strategy of some other vendors, we felt it important that we release real silicon with real tools, so people could start working with the chips *today*, not a year from now. With the 80960, there was no installed base that needed to be apprised of developments, so there were no pre-announcements. > 2) why didn't they release this instead of the 80386? Well, this doesn't require much thought. The world is chock-full of MS-DOS applications, and their is a clear market for the 80386 processor. In fact, it can be said without fear of contradiction that it has been the *most* popular reprogrammable (vs. embedded) single-chip microprocessor in history. One doesn't shoot one's milk cow when one acquires a horse. Don't for a moment believe that there won't be a 486, 586, and so forth. To paraphrase the old quip about Fortran: "I don't know what processors I'll be using in the year 2000, but one of them will have an '86' in the part number." > 3) why is it for "embedded applications" (as opposed to general use)? The 80960 starter kit does not come with a pair of handcuffs that prevents you from building a reprogrammable product with the processor. However, since Intel already has a processor that is performing admirably in the reprogrammable marketplace, and because the 80960's architecture is well-tuned to embedded applications, and because the embedded market is growing as fast or faster than the reprogrammable marketplace, it was felt that this was the most profitable area for an initial thrust. > 4) what about memory management? The press release failed to mention that a third member of the family, the 80960MC, has also been released. The 80960MC implements the 80960 architecture, and includes the same floating-point unit as the 80960KB, and also includes an on-chip memory management unit which supports a standard virtual memory management system. Key features of this memory management system are: 4k pages, one- and two-level indirect page tables, page dirty pits, protection bits, cachable bit for off-chip data caches, etc. This processor, the 80960MC is available in a mil-spec package, and is targeted at military and high-reliability embedded applications that require hardware protection of concurrent processes. >I suspect that the answers to 2,3,4 are realted... Not really. S. McGeady Intel Corp.
chris@mimsy.UUCP (Chris Torek) (04/12/88)
I took a (very) quick peek (~30 min) through an 80960 architecture manual that showed up in our department today. It looks nice! There are 16 global registers, but one of them (g15 as I recall) is the frame pointer, so you really get 15. The KB stores four sets of the 16 local registers, but you can only talk directly to the current 16, and three of these are tied up (r0 = prev FP, r1 = prev IP?, r2 = ? forgot), so you really have 13. The other three sets of local registers cache the last three stack frames; you can reach into an outer frame's registers by executing a `flushreg' instruction to push them back out and then diddling with the frame, but then you might as well use memory. (Still need flushreg sometimes.) There are no goofy special registers beyond the usual PSL-type-thing. IO space access is a bit muddy to me (but I skipped the section on it). Standard User/Supervisor separation. 256 interrupt vectors, but 8 are useless (ipl 0 vectors interrupt when you are below ipl 0, i.e., never) and hence suppressed, and a bunch of ipl 31 vectors are `reserved', so you really have about 240 vectors. There is hardware `scoreboarding' (interlocking) on the registers, so you can ignore the pipelining, although naturally it goes faster if you reorder. Address space is 32 bits, but branch space is smaller. (There is an `anywhere' branch but most are 24 bit offsets.) All instructions are 32 bits so this really can cover 2^26 space (I forget whether it does, but would seem silly not to). Instruction data types are byte, short (word=16 bit), long (32 bit), `tripleword' (80 bit), and `quadword' (128 bit), with signed and unsigned (`ordinal') variants for everthing <=32 bits. Signed store will trap if you try, e.g., ldsb addr,r3 # fetch signed byte & extend to long -128..127 stsb addr,r3 # (r3,addr?) store it back, no trap addo r3,$256,r3 # add ordinal: now it is in 128..347 stsb addr,r3 # trap As for faults, some are `indeterminate' and leave inconsistent and hence not restartable trails, but sequencing and restartability can be forced on a case basis (there is a `wait for pending results' instruction) or overall (set the No Ind. Fault flag in the PSL). The usual set of faults turns up, although integer divide by zero is separate from F.P. divide by zero (perhaps because FP is architecturally optional). FP is IEEE of course, with `plain' 32 bit real, 64 bit double, and 80 bit `extended' precisions; there are instructions galore for (e.g.) exp, sin, cos, tan. Best of all :-) the assembler syntax in the examples in the manual is Vax Unix style. .word, .align, .space directives. No more silly ALL CAPS STUFF! Hooray! :-) [there, perhaps this will persuade mcg to elaborate :-) ] -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
kenr@microsoft.UUCP (Kenneth Reneris) (04/12/88)
A recent Intel press article forwarded by Steve McGeady states: (intelca!omepd!mcg) > INTEL ANNOUNCES FIRST EMBEDDED CONTROL PRODUCTS AND TOOLS BASED ON NEW > 80960 ARCHITECTURE ... > "The 80960 embedded processors provide significant price-performance > advantages over most other single-chip, 32-bit embedded solutions," said > Alan Steinberg, product line marketing manager. "For example, the 80960KB > is the only processor available which integrates an on-chip floating-point ^^^^ > unit - at four MegaWhetstones - with a 20MHz clock. That is more than > twice The performance at one-half the cost of other available > processors." ... Inmos Corp's transputer T800-20 is also a single chip CPU with an integrated FPU. As a 20Mhz RISC processor it also runs 4 million whetstones a second. (Reference material: Electronic, Nev 27, 196. p. 57. & Electronics, Aug 20, 1987). I'm not sure what "4.5 VAX MIPS" is, but a single T800-20 breaks the stop watch at 15 MIPS. In addition, the multitasking is all handled by the hardware, in the microcode (along with message passing). It has four link lines with operate at 20Mbit/sec in each direction. Last I knew Inmos was working on releasing a 30Mhz model of the T800. The transputer seems to meet all of Alan Steinburg's requirements. I'm sure he is just unaware of certain vital facts about his competition. Kenneth Reneris {uw-beaver,decvax,sun,attunix,uunet}!microsof!kenr DISCLAIMER: My opinions are my own, not those of my employer.
mdr@reed.UUCP (Mike Rutenberg) (04/12/88)
In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: >Questions on the 80960: > 2) why didn't they release this instead of the 80386? > 3) why is it for "embedded applications" (as opposed to general use)? Controllers for "embedded applications" are a huge and growing market that seems to have been largely ignored by other RISC chip manufacturer (where the main orientation seems to be toward workstations). If there is to be a shakeout in the computer RISC market, so why should Intel even get involved in that? They have experience, customers (millions of those little i8051s are out there) and infinite growth potential in the *fast* controller field. Even better, they can easily put together custom parts based on standard building blocks for specific applications. This is good if I want real-time dashboard display updates in my Cadillac, but don't really need the FPU. I also suspect that this is not being presented as a workstation chip because that would confuse and somewhat scare the popular world (among them investors and big IBM PC customers) who really need to feel the 80x86 is Intel's architecture for furture computers. If the RISC chip happens to get designed into computers, that is fine, but I doubt they will push for it immediately. Remember that there are lots of IBM 801s acting as channel controllers for IBM 3090s. RISCs can be used for "embedded applications." Mike -- Mike Rutenberg for fast, robust food and software (503)771-5516
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/13/88)
In article <8755@reed.UUCP> mdr@reed.UUCP (Mike Rutenberg) writes: >In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: >>Questions on the 80960: >> 2) why didn't they release this instead of the 80386? >> 3) why is it for "embedded applications" (as opposed to general use)? > >Controllers for "embedded applications" are a huge and growing market >that seems to have been largely ignored by other RISC chip manufacturer >(where the main orientation seems to be toward workstations). If there I phrased that one badly... the real question was "why is this an embedded CPU rather than a general purpose unit," and the answer seems to be marketing rather than technical. Like the RPM40 this would make a nice workstation chip, perhaps in many ways better than the RPM40. It's too bad that the initial thrust is in that direction, but I would be surprized if someone doesn't build a testbed workstation inhouse just to see what the costs really are. A UNIX port is getting easier to do all the time, since there are more good people around. If only a PCC style compiler were needed I suspect that it could be done in a minimal way (kernel + C) in a year. Not that ordering wouldn't make it faster, but the scoreboard seems to make it practical to run a less than optimal sode generator. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
david@sun.uucp (David DiGiacomo) (04/13/88)
>INTEL ANNOUNCES FIRST EMBEDDED CONTROL PRODUCTS AND TOOLS BASED ON NEW > 80960 ARCHITECTURE ... > The 80960KA and the 80960KB are both available in 20MHz CHMOS* III >configurations. Both embedded processors operate at a sustained 7.5 MIPS >and 15K Dhrystones rates. Why is the integer performance so low? Do most instructions take 2 cycles?
chris@mimsy.UUCP (Chris Torek) (04/13/88)
In article <10382@steinmetz.ge.com> davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes: >A UNIX port is getting easier to do all the time, since there are more >good people around. If only a PCC style compiler were needed I suspect >that it could be done in a minimal way (kernel + C) in a year. I am not sure quite what you mean here, but a half-decent 4.3BSD should not take even a year. Just code up the machine dependent part of the kernel---mostly locore.s and drivers---patch up a PCC back end (stealing liberally from the Tahoe back end, since the Tahoe architecture is closer to the 80960 than is the Vax), write a weak `optimiser' that does trivial reordering, and compile it and go. 4.3BSD's portability has become noticeably better since 4.3BSD was ported to the CCI Power 6/32; the same source tree compiles on okeeffe (a CCI Power 6/32 aka Harris HCX-7) and vangogh (a Vax 8650) with literally no changes. (The machine dependent pieces are put in subdirectories; make cleverly predefines ${MACHINE} as either `vax' or `tahoe', so one writes, e.g., `cd pcc.${MACHINE}; make'.) I think it would be neat if someone (mt Xinu might be a good candidate; Berkeley folks spend too much time breaking, er, augmenting the kernel in other ways to be able to do this) ported 4.3-tahoe to every architecture in sight, just to create a truly portable base system. But this is drifting rather far afield of the original subject (and newsgroup!). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
johnl@ima.ISC.COM (John R. Levine) (04/13/88)
In article <3363@omepd> mcg@iwarpo3.UUCP (Steve McGeady) writes: >> 4) what about memory management? >... The 80960MC implements the 80960 ... and also includes an on-chip memory >management unit which supports a standard virtual memory management system. >... This processor is available in a mil-spec package, and is targeted at >military and high-reliability embedded applications that require hardware >protection of concurrent processes. I would be fascinated to hear about high reliability embedded applications that use virtual memory. Seems to me you'd need a pretty artful designer to come up with a system that satisfies the sort of real-time constraints generally present in embedded systems while handling page faults. Or perhaps you could have a system with Unix, vi, and troff and X windows burnt in so that fighter pilots can type up their reports on the way home from a mission, using only an eye-tracking mouse equivalent built into the helmet. And then send it home via uucp. The possibilities are limitless. -- John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869 { ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.something Rome fell, Babylon fell, Scarsdale will have its turn. -G. B. Shaw
baum@apple.UUCP (Allen J. Baum) (04/13/88)
-------- [] >In article <49265@sun.uucp> david@sun.uucp (David DiGiacomo) asks: >> The 80960KA and the 80960KB are both available in 20MHz CHMOS* III >>configurations. Both embedded processors operate at a sustained 7.5 MIPS >>and 15K Dhrystones rates. > >Why is the integer performance so low? Do most instructions take 2 cycles? Actually, yes. Despite some fairly clever scoreboarding, many simple instructions take two cycles. This appears to happen because they have a single port register file. For example: A+B->C, D+E->F. The second addition will take 2 cycles. But: A+B->C, C+E->F. The second addition will take 1 cycles. This is because they forward the ALU result to the second addition, which saves them a cycle. Ironic, since forwarding usually make instructions run just as fast as they would if there were no data dependencies; here, data dependencies make it run faster! NOTE: This is one PARTICULAR implementation. It is NOT an architectural mis-feature. There are no architectural reasons why future versions shouldn't run much faster (with the same clock rate). -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
mcg@omepd (Steven McGeady) (04/14/88)
I hate to present the 80960 architecture in such a peek-a-boo manner, but I have been far too busy to come up with a long diatribe (being in the engineering department rather than the marketing department). It's much easier to motivate myself to answer specific questions, so .... In article <11026@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >I took a (very) quick peek (~30 min) through an 80960 architecture >manual that showed up in our department today. It looks nice! Why, thank you. This is what we hoped for. >There are no goofy special registers beyond the usual PSL-type-thing. Namely arithmetic controls (rounding mode, fault on overflow, etc), and process controls (trace pending, supervisor mode), and trace controls (trace fault on {instruction, call, branch, return, pre-return}). >IO space access is a bit muddy to me The 80960 has no special I/O - It is entirely memory mapped. I/O registers (or whatever) can occur anywhere in the address space. The upper 16Mb of the 4Gb address space is typically reserved for processor-specific functions and I/O. >Address space is 32 bits, but branch space is smaller. (There is an >`anywhere' branch but most are 24 bit offsets.) All instructions are >32 bits so this really can cover 2^26 space (I forget whether it does, >but would seem silly not to). There are actually only 22 bits in the encoding for displacements, so, with the restriction of word-aligned instructions, the overall range is 2^24. There are branch and call-extended instructions which take absolute addresses. > Instruction data types are byte, >short (word=16 bit), long (32 bit), > `tripleword' (80 bit), and `quadword' >(128 bit), with signed and unsigned Triples are actually 96 bits, as you would suspect. They are, however, used to move 80-significant-bit extended-precision floating-point numbers around. > Signed store will trap if you try, e.g., > > ldsb addr,r3 # fetch signed byte & extend to long -128..127 > stsb addr,r3 # (r3,addr?) store it back, no trap > addo r3,$256,r3 # add ordinal: now it is in 128..347 > stsb addr,r3 # trap > Only if you have the integer overflow mask bit in the processor controls set. C programs normally clear this bit so that integer operations do not cause faults. However, we did take some care to specify the C compiler so that things would work more-or-less the way you expect them to if you had overflow faulting turned on. The "more-or-less" part means that we didn't avoid optimizations that would hide potential faults (such as constant folding in variable expressions). >As for faults, some are `indeterminate' and leave inconsistent and >hence not restartable trails, but sequencing and restartability can be >forced on a case basis (there is a `wait for pending results' >instruction) or overall (set the No Ind. Fault flag in the PSL). The >usual set of faults turns up, although integer divide by zero is >separate from F.P. divide by zero (perhaps because FP is >architecturally optional). In the current implementations (KA, KB, MC), all faults in the current implementation are 'precise', because, while the instruction stream is pipelined, potentialy imprecise faults from previous instructions are known before any irreversable actions are taken on in-progress instructions. In future implementations which execute multiple instructions per clock in parallel functional units, the fault record will contain enough information to restart most imprecise faults. The FP is indeed optional, as the KA implementation does not include it. > >FP is IEEE of course, with `plain' 32 bit real, 64 bit double, and 80 >bit `extended' precisions; there are instructions galore for (e.g.) >exp, sin, cos, tan. >Best of all :-) the assembler syntax in the examples in the manual >is Vax Unix style. .word, .align, .space directives. No more silly >ALL CAPS STUFF! Hooray! :-) > >[there, perhaps this will persuade mcg to elaborate :-) ] The tools (sans compiler) were based on the UNIX System V.3 toolset, so they should look pretty familiar to all of you. They support flexnames, portable ar format, and COFF. Interesting additions we have made include link-time leaf-procedure optimizations (turning 'call' instructions into branch-and-link instructions to preserve the stack frame cache), and 'system-call' optimizations (changing 'call' instructions into 'syscall' instructions). The toolkit includes as, ld, ar, nm, dump (COFF dumper), dis (disassembler), M4 (for those who feel they need a macro assembler), size, strip, and a ROM formatter. The compiler supports October 1987 draft ANSI (pre-noalias), including const, volatile, function prototypes, and the new C preprocessor. It has slighlty modified (but still, in our opinion, legal dpANS) floating-point widening rules to more closely model IEEE computational models, and has a 'long double' extended precision floating-point type. It also supports pre-register allocation inline assembly language support similar to the AT&T 3B2 model, which allows use of normal local and global variable names as arguments to in-lined asm functions without a need to know what registers they will be in. The compiler comes with a full (V.3) Stdio, carefully hacked to be runnable on bare hardware with underlying support only for open, close, creat, read, write, lseek, and ioctl. Libraries that support these functions for standard UARTs (for terminal I/O only) are provided. All the remaining obvious libc functions (str*, malloc, etc) are supported. Despite their System V origins, all of the tools currently run on PC-AT MS-DOS machines, and will soon be available on VAX/UNIX (Ultrix & 4.3BSD), and on VAX/VMS. In case you wonder what my involvement is, I was a member of the processor architecture group headed by Glen Myers (author of 'Advances in Computer Architecture' and other bestsellers), and later a manager of the Software Tools group. The processor was architected over a period of several years by many people who deserve much credit: among them Konrad Lai, Jim Valerio, Fred Pollack, and Dave Budde, and implemented very ably by Mike Imel, Glenn Hinton, Randy Steck, and many others. This is merely a representative, and not a comprehensive, list of the people who made the 80960 possible. Also, the 80960KB Programmer's Reference Manual is available now, and is Order Number 270567-001, and the Hardware Designer's Reference Manual is Order Number 270564-001. Both are available by calling (800) 548-4725, or by writing Intel Literature, P.O. Box 58130, Santa Clara, CA 95052-8130, or by calling your local Intel sales office. From literature sales, the manuals are $21 and $18, respectively, and I am told they are immediately available. S. McGeady Intel Corp.
marc@ima.ISC.COM (Marc Evans) (04/14/88)
I remember a few years ago that Intel announced a processor called (I think) the 432...Now that I have read about this processor (80960) in some of the industry rags, as well as on the net, it seems to me that the 80960 is just a repackaged, supercharged version of the 432. Can anybody comment on this? As a side... How 'bout the prices on the 80960MC $2490 (ref Electronic Products pg 16), and the bus exchange unit M82965 @ $1750... YOW!
tim@amdcad.AMD.COM (Tim Olson) (04/14/88)
In article <949@ima.ISC.COM> johnl@ima.UUCP (John R. Levine) writes: | I would be fascinated to hear about high reliability embedded applications | that use virtual memory. Seems to me you'd need a pretty artful designer to | come up with a system that satisfies the sort of real-time constraints | generally present in embedded systems while handling page faults. Memory management does not imply demand-paged virtual memory. It can be used simply for protection checking (to catch errant programs before they do real damage). -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)
dik@cwi.nl (Dik T. Winter) (04/14/88)
In article <7543@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: > Actually, yes. Despite some fairly clever scoreboarding, many simple > instructions take two cycles. This appears to happen because they have a single > port register file. For example: A+B->C, D+E->F. The second addition will > take 2 cycles. But: A+B->C, C+E->F. The second addition will take 1 cycles. > This is because they forward the ALU result to the second addition, which > saves them a cycle. Ironic, since forwarding usually make instructions run > just as fast as they would if there were no data dependencies; here, data > dependencies make it run faster! > In vector machines this is a well known feature, called short-stop. For Cray-1 and Cray XMP this is true for operations on vector registers. For Cyber 205 and ETA 10 this is true for operations on scalar registers. It requires careful scheduling of your instructions. E.g. on the Cray a short stop occurs some 7 cycles after instruction start; if you miss it you have to wait till the previous instruction terminates. This makes it possible that programs tuned for the Cray-1 run slower on the XMP. Similar things hold for the 205. Here in fact, if I remember correctly, the instruction that uses the result of a previous instruction must be issued in a very small time frame after the previous instruction to benefit from the short stop. It should not be issued too early. E.g. A+B->C;C+D->E might run slower than A+B->C;NOP;NOP;C+D->E (You could of course issue other instructions than NOP: A+B->C;P+Q->R;V+W->Z;C+D->E the instructions are pipelined.) A compiler writers nightmare I believe. -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
pardo@june.cs.washington.edu (David Keppel) (04/14/88)
In article <953@ima.ISC.COM> marc@ima.UUCP (Marc Evans) writes: >I remember a few years ago that Intel announced a processor called (I think) >the 432...Now that I have read about this processor (80960) in some of the >industry rags, as well as on the net, it seems to me that the 80960 is just >a repackaged, supercharged version of the 432. Can anybody comment on this? Yes. Unless I'm missing some big stuff about the 80960, they are not at all the same. The 432 supported objects and capabilities in hardware. Thus, the hardware recognized and protected the specific data type of "access descriptor" and encapsulated /refinement/ (essentially a suid on a capability). The 80960 is just your generic modern microprocessor. For more information on the 432, a description of capabilities, and a comparison of various related architectures, see "Capability-Based Computer Systems" by Henry M. Levy, (C) 1984 Digital Equipment Corp., printed by Digital Press. (Gosh, I *knew* we could somehow force the VAX into this discussion!) ;-D on ( But then again, I could be ) Pardo
bwong@sundc.UUCP (Brian Wong) (04/14/88)
In article <953@ima.ISC.COM>, marc@ima.ISC.COM (Marc Evans) writes: > I remember a few years ago that Intel announced a processor called (I think) > the 432...[ stuff deleted ] it seems to me that the 80960 is just > a repackaged, supercharged version of the 432. Can anybody comment on this? > pg 16), and the bus exchange unit M82965 @ $1750... YOW! Yargh! The Intel 432 was a (very) CISC machine, heavily microcoded, and intended to be essentially an "ada machine." Given what we've read here about the 80960, it's a RISC machine. I doubt that the two chips have almost anything in common!
phil@amdcad.AMD.COM (Phil Ngai) (04/14/88)
In article <953@ima.ISC.COM> marc@ima.UUCP (Marc Evans) writes: >As a side... How 'bout the prices on the 80960MC $2490 (ref Electronic Products >pg 16), and the bus exchange unit M82965 @ $1750... YOW! That's not so bad. It works out to be about a dollar a pound when you consider all the paperwork you have to do to sell a military product. :-) Did you ever wonder why an AGM-54 Phoenix long range air-to-air missile costs half a million dollars? -- America is finally exporting affordable cars to Japan: the Honda Accord. I speak for myself, not the company. Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or phil@amd.com
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/14/88)
In article <953@ima.ISC.COM> marc@ima.UUCP (Marc Evans) writes: | I remember a few years ago that Intel announced a processor called (I think) | the 432...Now that I have read about this processor (80960) in some of the | industry rags, as well as on the net, it seems to me that the 80960 is just | a repackaged, supercharged version of the 432. Can anybody comment on this? 432 80960 addressing bit byte instruction set CSIC RISC model object register They don;t even have the same pin count or process. | As a side... How 'bout the prices on the 80960MC $2490 (ref Electronic Products | pg 16), and the bus exchange unit M82965 @ $1750... YOW! I *think* Intel is shooting themselves in the foot on that one. While they can make a large profit per chip (and how!), I believe that there is a market for alternatives to the SPARC chipset in the workstation market. I think that there are people who associate *86 with PC, no matter what the speed. When I mentioned getting a Sun Roadrunner, the comment was made "but, for that price you could buy a workstation." Intel may rethink that price...I thnik someone is doing a mil spec SPARC, and there is a tiny bit of thought given to price, even with tax money. Plus you could develop software on the same CPU as the target. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
mcg@omepd (Steven McGeady) (04/15/88)
In article <49265@sun.uucp> david@sun.uucp (David DiGiacomo) writes: > ... >> The 80960KA and the 80960KB are both available in 20MHz CHMOS* III >>configurations. Both embedded processors operate at a sustained 7.5 MIPS >>and 15K Dhrystones rates. > >Why is the integer performance so low? Do most instructions take 2 cycles? The short answer is yes, many instructions take two cycles in the current implementation. For the long answer, read on. Well, first, while 7.5 MIPS might seem slow for a $2000/chip workstation CPU, the price/performance of the 80960KA and KB is very good compared to its competition (whomever that may be) in the embedded marketplace. Claims of "the fastest microprocessor ever!" are: a) often false; and b) seldom true for very long. The 80960KA was the fastest microprocessor around when we hit silicon in 12/85, but we knew very well that fast silicon without quality tools and support wasn't very useful. I won't get dragged into the "my MIPS number is longer than your MIPS number" game that goes on here all too often. Second, as I hope to demonstrate soon, the 7.5 MIPS number is actually relatively conservative, and depends on the mix that you run. In other words, don't feel that you have to apply an automatic derating to that number because of past (mis-) deeds of unrelated marketeers. Our number is based on the integer Stanford benchmarks, grep, diff, compress, and other UNIX programs jerry-rigged to run in an embedded environment, and various customer benchmarks. I'm trying to gather some up-to-date benchmark info to post, but it's taking some time to get it together in a form the net's performance mavens won't shoot holes in. The *technical* answer to your question is that the register file *in the current implementations* is not multi-ported (enough ways) and that "RISC" instructions (typically 1 cycle) suffer an additional cycle latency if the value it needs is not either a literal or the destination register from the previous instruction. If the register file can be "bypassed", normal instructions execute in 1 cycle, otherwise they run in 2. Certain other instructions (bit extract, bit modify, check bit, compare-and-increment/decrement) take 2 cycles with bypass, 3 without. For completeness: Move instructions take 1 cycle per word. integer multiply takes 9-21 cycles (depending on # significant bits) typically 18. integer divide takes twice as long. The processor uses an early-out Booth multiplier. Branch instructiond take 0 (yes, zero) to 2 cycles. In the former case, branchs can often be overlapped with previous instructions. Loads and stores are pipelined (3 deep), and loads take 4 to 5 cycles, stores 2 to 3 cycles. Other (unrelated) instructions can be executed in the delay slot after the load. Thus, 3 loads can be executed in 7 cycles (due to the pipelining) and up to 3 additional instructions can be executed in the delay slots (safely, because of register scoreboarding). Call instructions take 9 cycles when a register set in the cache is available. Flushing a set of local registers takes an additional 24 cycles, depending on memory speed. Return takes 7 cycles, with the same caveat. The processor only flushes or reloads the register cache when necessary. The "call" and "return" instructions, contrary to normal RISC practice, do most of what is required to perform a subroutine linkage. The 80960 C entry prologue/epilogue is: _foo: # foo takes four integer args, has int [100] auto array ldconst 400,r15 addo sp,r15,sp # allocate auto space on stack movq g0,r4 # save parameter registers (move quad) ... mov ???,g0 # return value ret "ldconst" is a pseudo-op which expands to the most optimal way of loading a constant value. The stack adjustment is only done if there are local variables that do not fit in registers. The saving of the parameter registers is only done if the procedure is not a leaf procedure. Floating-point instructions take anywhere from 10 cycles (add-real) to 441 cycles (cosine). Most floating-point instructions are interruptible and resumable. The next generation of 80960, now under development, will remove the bypass miss limitation, as well as exploit more opportunities for fine-grained parallelism in the architecture. More I cannot say. S. McGeady Intel Corp.
bcase@Apple.COM (Brian Case) (04/15/88)
In article <3368@omepd> mcg@iwarpo3.UUCP (Steve McGeady) writes: > >In case you wonder what my involvement is, I was a member of the processor >architecture group headed by Glen Myers (author of 'Advances in Computer >Architecture' and other bestsellers), and later a manager of the Software >Tools group. > >S. McGeady Steve, is this the same Glen Myers who said: "Ones eyebrows should rise whenever a future architecture is developed with a register-oriented instruction set?" [Comp. Arch. News, Aug 1977, pp. 7-10] Perhaps he was quoted out of context; he actually meant the eyebrows should rise in delight? :-) :-) Just kidding, I know we all say things we regret (I certainly have!). I just found it interesting that he should be involved at all with such a register-intensive architecture!
randys@mipon2.intel.com (Randy Steck) (04/15/88)
In article <953@ima.ISC.COM> marc@ima.UUCP (Marc Evans) writes: >I remember a few years ago that Intel announced a processor called (I think) >the 432...Now that I have read about this processor (80960) in some of the >industry rags, as well as on the net, it seems to me that the 80960 is just >a repackaged, supercharged version of the 432. Can anybody comment on this? The iAPX432 was about as far away from the architecture of the 80960 as you can possibly get. The 432 is typically referenced when talking about CISC architectures (at the extreme CISC end of the spectrum). And there were alot of mistakes made that really killed the performance. Some of these were no registers (all operations were memory-based), bit-level encoded instructions (Huffman encoding anyone?), two-chip implementation with only a narrow microinstruction bus between them, and extremely long call/return/branch times since everything was done in microcode. In other words, the 432 was a dog on performance and therefore a failure, even though some of the architectural features may have been interesting/useful. The 80960 architecture is in no way related, and in fact is very close to the other end of the spectrum (closer to the RISC end). Many RISC ideas have been incorporated into the processor to allow an implementation to achieve high performance (CPU *AND* system performance). As Glen Myers (the architect of the 960) said in a videotape at the announcement of the family, we feel that it is a balanced architecture with no undue emphasis being placed on any particular area to the detriment of others. The 960's ideas on fine-grained parallelism and complete hardware interlocking could go a long way in future implementations. Randy Steck Intel Corp. Hillsboro, Oregon These comments are my own. Intel would certainly disavow any agreement with them. What?!? You don't believe me? Why not call them up and ask?
mcg@omepd (Steven McGeady) (04/16/88)
In article <8266@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes: >In article <3368@omepd> mcg@iwarpo3.UUCP (Steve McGeady) writes: >> >>architecture group headed by Glen Myers (author of 'Advances in Computer > >Steve, is this the same Glen Myers who said: "Ones eyebrows should rise >whenever a future architecture is developed with a register-oriented >instruction set?" [Comp. Arch. News, Aug 1977, pp. 7-10] Perhaps he >was quoted out of context; Well, I was sort of hoping that someone would rise to the bait on this one - I'm glad it was you, Brian. I wish I had a videotape of Glen's 5-year "roast" here at Intel two years ago - one of our group members culled through a number of his old books and found a pile of gems like this one. Some of the people up here that were involved in the 432, as well as Glen (who had no direct involvement in it) were "born-again" in the fire-baptism of the 432, and decided that the time had come to implement a *fast* microprocessor. The mythology has it that the CISC-ish object-oriented folks who wouldn't accept the new order were banished into software groups and obscure research projects :-). This is, of course, nothing but mythology ... [I was not at Intel when the 432 was being built.] It is even more amusing that the current program manager for the project, one Bill Pohlman, was the program director of the original 8086 development. He starts customer presentations by saying that he's atoning for his sins by pushing the 80960. For the record, Glen Myers is no longer at Intel - he now is a principal at Radix Microsystems in Beaverton, OR. Among other endeavours, he is preparing a book on the 80960 architecture and its development. S. "Flat is where it's at" McGeady Intel Corp.
guy@gorodish.Sun.COM (Guy Harris) (04/16/88)
> >I remember a few years ago that Intel announced a processor called (I think) > >the 432...Now that I have read about this processor (80960) in some of the > >industry rags, as well as on the net, it seems to me that the 80960 is just > >a repackaged, supercharged version of the 432. Can anybody comment on this? Could the reason that the industry rags were under this delusion be that some of the people involved in the 432 were also involved in the '960 (Myers and Konrad Lai come to mind), and therefore (in typical ignorant industry-rag reporter fashion) assumed that one was derived from the other?
mcg@omepd (Steven McGeady) (04/18/88)
In article <49681@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >> >I remember a few years ago that Intel announced a processor called (I think) >> >the 432...Now that I have read about this processor (80960) in some of the >> >industry rags, as well as on the net, it seems to me that the 80960 is just >> >a repackaged, supercharged version of the 432. Can anybody comment on this? > >Could the reason that the industry rags were under this delusion be that some >of the people involved in the 432 were also involved in the '960 (Myers and >Konrad Lai come to mind), and therefore (in typical ignorant industry-rag >reporter fashion) assumed that one was derived from the other? The industry rags did in fact get this wrong, but it probably had more to do where *where* the development was happening. Intel's Oregon Microcomputer Engineering group was responsible for the 432. That effort taught a lot of people a lot of important things at many levels (from architecture to circuit design) that were applied to the 80960. But, apart from a geographic and personnel lineage, it is difficult to trace the published 80960 architecture back to the 432. Persons who see similarities between the 80960KA, KB, and MC and the 432 are ignorant of the architectures of one or the other. Incidentally, Glen Myers wrote about the 432 when he worked for IBM. The 432 effort was all but over by the time he joined Intel. S. McGeady Intel Corp.
andy@pcsbst.UUCP (Andre Wolper) (04/18/88)
>For the record, Glen Myers is no longer at Intel - he now is a principal >at Radix Microsystems in Beaverton, OR. Among other endeavours, he is ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What does this company do, who were it's founders? How many people involved? (I'm asking out of personal interest only, please respond via E-mail, it'll be appreciated. Thanks!) ********************************** Andre Wolper ...usual disclaimers ...unido!pcsbst!andy **********************************
sid@linus.UUCP (Sid Stuart) (04/18/88)
> The highly-integrated 80960KB has a number of functions on-chip that >are characteristic of multiple-chip solutions. On-chip functions include >32 32-bit registers, the FPU [with four additional 80-bit registers], >a 512-byte instruction cache, a stack frame cache, and a 32-bit multiplexed >burst bus. ^^^^^^^^^^^^^^^^^? I have a copy of the 80960 Programmer's Reference Manual. I can find no reference in it to a "stack frame cache". Can someone point out where this is mentioned and what size this mythical cache is? Are the four sets of local registers supposed to be the stack frame cache? sid@linus.arpa BTW I would like to thank Mr. McGeady for his timely posting of the Intel press release.