[comp.arch] Press Release: Intel announces 80960 architecture

mcg@omepd (Steven McGeady) (04/07/88)

The following is a ***PRESS RELEASE*** distributed by Intel today (4/6).
If anyone thinks that the repetition of press releases in this forum is
inappropriate, please stop reading here and take the matter up with me via
e-mail.  On the other hand, I feel this is a service to the net.  I have
offered the release with a minimum of editing to remove the more content-free
parts that might most offend the net, and have added some details present
in other distributed materials.

When I have time (probably in about a week), I will post a more detailed
discussion of the 80960 architecture.  In the meantime, For detailed
information, please contact your local Intel sales office or the phone
number listed below.

S. McGeady
Intel Corporation

mcg@omepd.intel.com		mcg@iwarp.intel.com
tektronix!ogcvax!omepd!mcg	intelca!omepd!mcg

-----------------------------------------------------------------------------

INTEL ANNOUNCES FIRST EMBEDDED CONTROL PRODUCTS AND TOOLS BASED ON NEW
			80960 ARCHITECTURE


Chandler, AZ, April 6, 1988 -

Intel Corp. today announced a new 32-bit microprocessor architecture that
integrates RISC design techniques, and is optimized for
high-performance embedded control applications.

   The 32-bit core architecture, the 80960, has parallelism and modular
features to enable future processors to have very high performance
levels, beyond those scaled to typical speed increases.  The modularity
of the core architecture also provides the basis for Intel to develop
market-specific processors.  Applications for these processors include
image-processing, protocol handling and motor control.  The 80960
processors' performance start at 7.5 VAX MIPS on a single 32-bit
processor.

   "The 80960 architecture has been created specifically to address the
product development requirements of the embedded control marketplace
well into the 1990's," said Dave House, Intel Microprocessor Components
Group senior vice president.

   "We have incorporated features that assure continuing growth in
performance, coupled with features that make the 80960 family cost
effective and easy-to-use."

--

   The 80960KB and the 80960KA are the first two available processors
based on the core architecture.  These embedded processors, based on more
than 350,000 transistors, incorporate specific attributes to meet the
high-performance needs of the system control segment of the embedded
control marketplace.  Immediate applications for these two embedded
processors are numerics processing, robotics and high-speed
wide-area telecommunications.

   "The 80960 embedded processors provide significant price-performance
advantages over most other single-chip, 32-bit embedded solutions," said
Alan Steinberg, product line marketing manager.  "For example, the 80960KB
is the only processor available which integrates an on-chip floating-point
unit - at four MegaWhetstones - with a 20MHz clock.  That is more than twice
The performance at one-half the cost of other available processors."

--

   The highly-integrated 80960KB has a number of functions on-chip that
are characteristic of multiple-chip solutions.  On-chip functions include
32 32-bit registers, the FPU [with four additional 80-bit registers],
a 512-byte instruction cache, a stack frame cache, and a 32-bit multiplexed
burst bus.

	[Interrupt controller - 256 programmable vectors]

	[IEEE-754 compliant FPU, with single, double, and extended
	 (80-bit) precision operations.]

	[Burst bus can load four words at a time.]

  ... Every design decision was made toward optimizing overall system costs.
"The market for embedded control applications usually has strict cost points
for end systems.  Providing the option to use lower-cost DRAMs is a good
example of how we help designers contain overall costs without sacrificing
performance."

	[80960KA is 80960KB without FPU].

--

   Intel is providing development tools ... for the new embedded processors
today.  The Starter Kit ... contains the EVA-960KB software evaluation
vehicle [a plug-in board for the PC-AT with a 20MHz processor a 1Mb of
SRAM] and the ASM-960 assembler [also the linker, librarian, namelister,
etc, based on familiar UN*X tools].  [This starter kit] is priced at
$6000 ...

   [A second] Starter Kit is tuned to embedded control application
benchmarking as well as large, sophisticated code development for the
80960KB.  [This kit] also consists of the EVA-960 and ASM-960, plus the
iC960 C language compiler with ANSI extensions [prototypes, const, volatile,
etc].  iC960 also includes a retargetable STDIO library, full 32-, 64-,
and 80-bit IEEE-compatible floating-point library, and in-line assembly
languages [inserts, that fit with quality compiler register allocation.]
[This start kit is priced at] $6800.

   In addition to the development tools provided by Intel, a broad range
of products supporting the 80960 architecture are being offered by
independent software and hardware vendors [including] Bauer Electronics
[Postscript clone], GenRad, Advanced Computer Techniques [compilers],
JMI, Logic Automation, Mentor Graphics [CAD design support], Ready Systems
[Real-time kernel], and Tartan Labs [Ada compiler].

--

   The 80960KA and the 80960KB are both available in 20MHz CHMOS* III
configurations.  Both embedded processors operate at a sustained 7.5 MIPS
and 15K Dhrystones rates.  The 80960KB is priced at $390 in 100-piece
quantities, and is packaged in a 132-lead pin grid array.  The 80960KA,
available in the fourth quarter of 1988, will be $174 in 100-piece quantities.
[The KB is available in quantity now.]  Intel plans to offer 25MHz versions
of the two embedded processors in early 1989.

   For more information, call a local Intel sales office or 1-800-548-4725,
or write Intel Corp., Literature Dept. #W427, 3065 Bowers Ave, Santa Clara,
CA 95051.


[Other interesting information: the 80960 silicon has been used extensively
 inside Intel since early 1986, and has run (literally) millions of lines of
 code in a variety of applications.  The chip is *very* well debugged.

 The reference to "parallelism and modualr features" in the first paragraph
 is a reference to other materials which allude to (near) future 
 implementations which will be able to execute three instructions in the
 same clock cycle.  The 80960KA and KB currently can overlap two instructions
 in certain cases.  The KA and KB implement "scoreboarding" of registers and
 condition codes to allow multiple instruction execution.

 This scoreboarding allows the 80960 architecture to hide the details of its
 instruction pipeline, allowing complete binary software compatibility with
 future implementations with different pipeline restrictions.

 The 80960 is a three-address load/store architecture with 32 general
 registers, 16 standard ("global") registers, and 16 ("local") registers
 that are provided fresh for a routine invoked by the "call" instruction.
 Implementations cache multiple sets of these local registers on chip, flushing
 previous sets to memory.  The KA and KB store four sets on chip, for a total
 of 80 on-chip general registers.

 More Later....]

-----------------------------------------------------------------------------

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/08/88)

Questions on the 80960:
  1) why now?
  2) why didn't they release this instead of the 80386?
  3) why is it for "embedded applications" (as opposed to general use)?
  4) what about memory management?

I suspect that the answers to 2,3,4 are realted...
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

rminnich@udel.EDU (Ron Minnich) (04/09/88)

In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>Questions on the 80960:
>  2) why didn't they release this instead of the 80386?
>  3) why is it for "embedded applications" (as opposed to general use)?
Conjecture: with all the 'high level' architectures Intel has released
it is politically impossible (inside the company) for them to 
embrace a general-purpose RISC. So ya shunt the RISC into embedded
microcontroller applications. 
   Course, whether all those high level architectures have been worth
much is another story ...
   Just Guessing.
-- 
ron (rminnich@udel.edu)

jimv@radix (Jim Valerio) (04/10/88)

In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) asks:
>  2) why didn't they release [the 80960] instead of the 80386?

Instead?  The 386 was a guaranteed business success; it would have been
crazy not to capitalize on the marketplace.

Perhaps a better question is why weren't both architectures sold back then?
As I understand it, there were lots of reasons, including but certainly not
limited to concern about whether there was sufficient Fab capacity for both
processors, and what message two "competing" processor architectures would
give to customers.

You should also remember that around the time this was happening, Intel was
reporting losses for the first time since it had started being profitable
(1972?), and tight times aren't usually the best times for unnecessary risks.

>  4) what about memory management?

Also announced was the 80960MC.  "The 80960MC is a military qualified version
of the KB with memory management and Ada tasking support."

>  3) why is it for "embedded applications" (as opposed to general use)?

The simple answer is that that is the organization that wanted a new processor
architecture was the embedded controller organization, and not the
microprocessor organization, which seems to be firmly committed to future
86 family products.

Personally, I see a marketing tightrope being walked here.  You will note that
the memory management version is only being announced with military spec's,
presumably also at military prices.  I expect that Motorola is walking a
similar line with the 68K and 88K product lines.

>  1) why now?

Sorry, I won't touch this.  :-)

--
Jim Valerio	{verdix,omepd}!radix!jimv, "radix!jimv"@omepd.intel.com

mcg@omepd (Steven McGeady) (04/12/88)

In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>Questions on the 80960:
>  1) why now?

Although the processor silicon has been available and working since
December 1985, it was felt that it was proper that we give ourselves more time
to develop quality software tools for the processors, to more thoroughly
validate the correctness of the silicon, and (frankly) to let the 386 more
thoroughly saturate the marketplace.

Rather than the strategy of some other vendors, we felt it important that
we release real silicon with real tools, so people could start working
with the chips *today*, not a year from now.  With the 80960, there was no
installed base that needed to be apprised of developments, so there were
no pre-announcements.

>  2) why didn't they release this instead of the 80386?

Well, this doesn't require much thought.  The world is chock-full of MS-DOS
applications, and their is a clear market for the 80386 processor.  In fact,
it can be said without fear of contradiction that it has been the *most*
popular reprogrammable (vs. embedded) single-chip microprocessor in history.
One doesn't shoot one's milk cow when one acquires a horse.

Don't for a moment believe that there won't be a 486, 586, and so forth.
To paraphrase the old quip about Fortran: "I don't know what processors I'll
be using in the year 2000, but one of them will have an '86' in the part
number."

>  3) why is it for "embedded applications" (as opposed to general use)?

The 80960 starter kit does not come with a pair of handcuffs that prevents
you from building a reprogrammable product with the processor.  However,
since Intel already has a processor that is performing admirably in the
reprogrammable marketplace, and because the 80960's architecture is well-tuned
to embedded applications, and because the embedded market is growing as fast
or faster than the reprogrammable marketplace, it was felt that this was
the most profitable area for an initial thrust.

>  4) what about memory management?

The press release failed to mention that a third member of the family, the
80960MC, has also been released.  The 80960MC implements the 80960
architecture, and includes the same floating-point unit as the 80960KB,
and also includes an on-chip memory management unit which supports a
standard virtual memory management system.  Key features of this memory
management system are: 4k pages, one- and two-level indirect page tables,
page dirty pits, protection bits, cachable bit for off-chip data caches,
etc.  This processor, the 80960MC is available in a mil-spec package,
and is targeted at military and high-reliability embedded applications that
require hardware protection of concurrent processes.

>I suspect that the answers to 2,3,4 are realted...

Not really.

S. McGeady
Intel Corp.

chris@mimsy.UUCP (Chris Torek) (04/12/88)

I took a (very) quick peek (~30 min) through an 80960 architecture
manual that showed up in our department today.  It looks nice!  There
are 16 global registers, but one of them (g15 as I recall) is the frame
pointer, so you really get 15.  The KB stores four sets of the 16 local
registers, but you can only talk directly to the current 16, and three
of these are tied up (r0 = prev FP, r1 = prev IP?, r2 = ? forgot), so
you really have 13.  The other three sets of local registers cache the
last three stack frames; you can reach into an outer frame's registers
by executing a `flushreg' instruction to push them back out and then
diddling with the frame, but then you might as well use memory.  (Still
need flushreg sometimes.)

There are no goofy special registers beyond the usual PSL-type-thing.
IO space access is a bit muddy to me (but I skipped the section on
it).  Standard User/Supervisor separation.  256 interrupt vectors, but
8 are useless (ipl 0 vectors interrupt when you are below ipl 0, i.e.,
never) and hence suppressed, and a bunch of ipl 31 vectors are
`reserved', so you really have about 240 vectors.

There is hardware `scoreboarding' (interlocking) on the registers, so
you can ignore the pipelining, although naturally it goes faster
if you reorder.

Address space is 32 bits, but branch space is smaller.  (There is an
`anywhere' branch but most are 24 bit offsets.)  All instructions are
32 bits so this really can cover 2^26 space (I forget whether it does,
but would seem silly not to).  Instruction data types are byte,
short (word=16 bit), long (32 bit), `tripleword' (80 bit), and `quadword'
(128 bit), with signed and unsigned (`ordinal') variants for everthing
<=32 bits.  Signed store will trap if you try, e.g.,

	ldsb	addr,r3		# fetch signed byte & extend to long -128..127
	stsb	addr,r3		# (r3,addr?) store it back, no trap
	addo	r3,$256,r3	# add ordinal: now it is in 128..347
	stsb	addr,r3		# trap

As for faults, some are `indeterminate' and leave inconsistent and
hence not restartable trails, but sequencing and restartability can be
forced on a case basis (there is a `wait for pending results'
instruction) or overall (set the No Ind. Fault flag in the PSL).  The
usual set of faults turns up, although integer divide by zero is
separate from F.P. divide by zero (perhaps because FP is
architecturally optional).

FP is IEEE of course, with `plain' 32 bit real, 64 bit double, and 80
bit `extended' precisions; there are instructions galore for (e.g.)
exp, sin, cos, tan.

Best of all :-) the assembler syntax in the examples in the manual
is Vax Unix style.  .word, .align, .space directives.  No more silly
ALL CAPS STUFF!  Hooray!  :-)

[there, perhaps this will persuade mcg to elaborate :-) ]
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

kenr@microsoft.UUCP (Kenneth Reneris) (04/12/88)

A recent Intel press article forwarded by Steve McGeady states:
(intelca!omepd!mcg)

> INTEL ANNOUNCES FIRST EMBEDDED CONTROL PRODUCTS AND TOOLS BASED ON NEW
>                       80960 ARCHITECTURE
  ...
> "The 80960 embedded processors provide significant price-performance
> advantages over most other single-chip, 32-bit embedded solutions," said
> Alan Steinberg, product line marketing manager. "For example, the 80960KB
> is the only processor available which integrates an on-chip floating-point
	 ^^^^
> unit - at four MegaWhetstones - with a 20MHz clock. That is more than
> twice The performance at one-half the cost of other available
> processors."
  ...

Inmos Corp's transputer T800-20 is also a single chip CPU with an integrated
FPU. As a 20Mhz RISC processor it also runs 4 million whetstones a second.
(Reference material: Electronic, Nev 27, 196. p. 57. & Electronics, Aug 20,
1987). I'm not sure what "4.5 VAX MIPS" is, but a single T800-20 breaks the
stop watch at 15 MIPS. In addition, the multitasking is all handled by the
hardware, in the microcode (along with message passing). It has four link
lines with operate at 20Mbit/sec in each direction. Last I knew Inmos was
working on releasing a 30Mhz model of the T800.

The transputer seems to meet all of Alan Steinburg's requirements. I'm sure
he is just unaware of certain vital facts about his competition.

Kenneth Reneris
{uw-beaver,decvax,sun,attunix,uunet}!microsof!kenr
DISCLAIMER: My opinions are my own, not those of my employer.

mdr@reed.UUCP (Mike Rutenberg) (04/12/88)

In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>Questions on the 80960:
>  2) why didn't they release this instead of the 80386?
>  3) why is it for "embedded applications" (as opposed to general use)?

Controllers for "embedded applications" are a huge and growing market
that seems to have been largely ignored by other RISC chip manufacturer
(where the main orientation seems to be toward workstations).  If there
is to be a shakeout in the computer RISC market, so why should Intel
even get involved in that?  They have experience, customers (millions
of those little i8051s are out there) and infinite growth potential in
the *fast* controller field.  Even better, they can easily put together
custom parts based on standard building blocks for specific
applications.  This is good if I want real-time dashboard display
updates in my Cadillac, but don't really need the FPU.

I also suspect that this is not being presented as a workstation chip
because that would confuse and somewhat scare the popular world (among
them investors and big IBM PC customers) who really need to feel the
80x86 is Intel's architecture for furture computers.  If the RISC chip
happens to get designed into computers, that is fine, but I doubt they
will push for it immediately.

Remember that there are lots of IBM 801s acting as channel controllers
for IBM 3090s.  RISCs can be used for "embedded applications."

Mike
-- 
Mike Rutenberg    for fast, robust food and software      (503)771-5516

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/13/88)

In article <8755@reed.UUCP> mdr@reed.UUCP (Mike Rutenberg) writes:
>In article <10320@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>>Questions on the 80960:
>>  2) why didn't they release this instead of the 80386?
>>  3) why is it for "embedded applications" (as opposed to general use)?
>
>Controllers for "embedded applications" are a huge and growing market
>that seems to have been largely ignored by other RISC chip manufacturer
>(where the main orientation seems to be toward workstations).  If there

I phrased that one badly... the real question was "why is this an
embedded CPU rather than a general purpose unit," and the answer seems
to be marketing rather than technical. Like the RPM40 this would make a
nice workstation chip, perhaps in many ways better than the RPM40. It's
too bad that the initial thrust is in that direction, but I would be
surprized if someone doesn't build a testbed workstation inhouse just to
see what the costs really are.

A UNIX port is getting easier to do all the time, since there are more
good people around. If only a PCC style compiler were needed I suspect
that it could be done in a minimal way (kernel + C) in a year. Not that
ordering wouldn't make it faster, but the scoreboard seems to make it
practical to run a less than optimal sode generator.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

david@sun.uucp (David DiGiacomo) (04/13/88)

>INTEL ANNOUNCES FIRST EMBEDDED CONTROL PRODUCTS AND TOOLS BASED ON NEW
>			80960 ARCHITECTURE
 ...
>   The 80960KA and the 80960KB are both available in 20MHz CHMOS* III
>configurations.  Both embedded processors operate at a sustained 7.5 MIPS
>and 15K Dhrystones rates.

Why is the integer performance so low?  Do most instructions take 2 cycles?

chris@mimsy.UUCP (Chris Torek) (04/13/88)

In article <10382@steinmetz.ge.com> davidsen@steinmetz.ge.com
(William E. Davidsen Jr) writes:
>A UNIX port is getting easier to do all the time, since there are more
>good people around. If only a PCC style compiler were needed I suspect
>that it could be done in a minimal way (kernel + C) in a year.

I am not sure quite what you mean here, but a half-decent 4.3BSD should
not take even a year.  Just code up the machine dependent part of the
kernel---mostly locore.s and drivers---patch up a PCC back end
(stealing liberally from the Tahoe back end, since the Tahoe
architecture is closer to the 80960 than is the Vax), write a weak
`optimiser' that does trivial reordering, and compile it and go.
4.3BSD's portability has become noticeably better since 4.3BSD was
ported to the CCI Power 6/32; the same source tree compiles on okeeffe
(a CCI Power 6/32 aka Harris HCX-7) and vangogh (a Vax 8650) with
literally no changes.  (The machine dependent pieces are put in
subdirectories; make cleverly predefines ${MACHINE} as either `vax' or
`tahoe', so one writes, e.g., `cd pcc.${MACHINE}; make'.)

I think it would be neat if someone (mt Xinu might be a good candidate;
Berkeley folks spend too much time breaking, er, augmenting the kernel
in other ways to be able to do this) ported 4.3-tahoe to every architecture
in sight, just to create a truly portable base system.  But this is
drifting rather far afield of the original subject (and newsgroup!).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

johnl@ima.ISC.COM (John R. Levine) (04/13/88)

In article <3363@omepd> mcg@iwarpo3.UUCP (Steve McGeady) writes:
>>  4) what about memory management?
>... The 80960MC implements the 80960 ... and also includes an on-chip memory
>management unit which supports a standard virtual memory management system.
>... This processor is available in a mil-spec package, and is targeted at
>military and high-reliability embedded applications that require hardware
>protection of concurrent processes.

I would be fascinated to hear about high reliability embedded applications
that use virtual memory. Seems to me you'd need a pretty artful designer to
come up with a system that satisfies the sort of real-time constraints
generally present in embedded systems while handling page faults.

Or perhaps you could have a system with Unix, vi, and troff and X windows
burnt in so that fighter pilots can type up their reports on the way home from
a mission, using only an eye-tracking mouse equivalent built into the helmet.
And then send it home via uucp.  The possibilities are limitless.
-- 
John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869
{ ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.something
Rome fell, Babylon fell, Scarsdale will have its turn.  -G. B. Shaw

baum@apple.UUCP (Allen J. Baum) (04/13/88)

--------
[]
>In article <49265@sun.uucp> david@sun.uucp (David DiGiacomo) asks:
>>   The 80960KA and the 80960KB are both available in 20MHz CHMOS* III
>>configurations.  Both embedded processors operate at a sustained 7.5 MIPS
>>and 15K Dhrystones rates.
>
>Why is the integer performance so low?  Do most instructions take 2 cycles?

Actually, yes. Despite some fairly clever scoreboarding, many simple
instructions take two cycles. This appears to happen because they have a single
port register file. For example: A+B->C, D+E->F. The second addition will
take 2 cycles. But: A+B->C, C+E->F. The second addition will take 1 cycles.
This is because they forward the ALU result to the second addition, which
saves them a cycle. Ironic, since forwarding usually make instructions run
just as fast as they would if there were no data dependencies; here, data
dependencies make it run faster! 

NOTE: This is one PARTICULAR implementation. It is NOT an architectural
mis-feature. There are no architectural reasons why future versions shouldn't
run much faster (with the same clock rate).

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

mcg@omepd (Steven McGeady) (04/14/88)

I hate to present the 80960 architecture in such a peek-a-boo manner, but
I have been far too busy to come up with a long diatribe (being in the
engineering department rather than the marketing department).  It's
much easier to motivate myself to answer specific questions, so ....

In article <11026@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>I took a (very) quick peek (~30 min) through an 80960 architecture
>manual that showed up in our department today.  It looks nice!

Why, thank you.  This is what we hoped for.

>There are no goofy special registers beyond the usual PSL-type-thing.

Namely arithmetic controls (rounding mode, fault on overflow, etc), and
process controls (trace pending, supervisor mode), and trace controls
(trace fault on {instruction, call, branch, return, pre-return}).

>IO space access is a bit muddy to me

The 80960 has no special I/O - It is entirely memory mapped.  I/O registers
(or whatever) can occur anywhere in the address space.  The upper 16Mb of
the 4Gb address space is typically reserved for processor-specific functions
and I/O.

>Address space is 32 bits, but branch space is smaller.  (There is an
>`anywhere' branch but most are 24 bit offsets.)  All instructions are
>32 bits so this really can cover 2^26 space (I forget whether it does,
>but would seem silly not to).

There are actually only 22 bits in the encoding for displacements, so,
with the restriction of word-aligned instructions, the overall range is 2^24.
There are branch and call-extended instructions which take absolute addresses.

> Instruction data types are byte, >short (word=16 bit), long (32 bit),
> `tripleword' (80 bit), and `quadword' >(128 bit), with signed and unsigned

Triples are actually 96 bits, as you would suspect.  They are, however,
used to move 80-significant-bit extended-precision floating-point numbers
around.

> Signed store will trap if you try, e.g.,
>
>	ldsb	addr,r3		# fetch signed byte & extend to long -128..127
>	stsb	addr,r3		# (r3,addr?) store it back, no trap
>	addo	r3,$256,r3	# add ordinal: now it is in 128..347
>	stsb	addr,r3		# trap
>

Only if you have the integer overflow mask bit in the processor controls
set.  C programs normally clear this bit so that integer operations do not
cause faults.  However, we did take some care to specify the C compiler
so that things would work more-or-less the way you expect them to if you
had overflow faulting turned on.  The "more-or-less" part means that we didn't
avoid optimizations that would hide potential faults (such as constant
folding in variable expressions).

>As for faults, some are `indeterminate' and leave inconsistent and
>hence not restartable trails, but sequencing and restartability can be
>forced on a case basis (there is a `wait for pending results'
>instruction) or overall (set the No Ind. Fault flag in the PSL).  The
>usual set of faults turns up, although integer divide by zero is
>separate from F.P. divide by zero (perhaps because FP is
>architecturally optional).

In the current implementations (KA, KB, MC), all faults in the current
implementation are 'precise', because, while the instruction stream is
pipelined, potentialy imprecise faults from previous instructions
are known before any irreversable actions are taken on in-progress
instructions.

In future implementations which execute multiple instructions per clock
in parallel functional units, the fault record will contain enough information
to restart most imprecise faults.

The FP is indeed optional, as the KA implementation does not include it.

>
>FP is IEEE of course, with `plain' 32 bit real, 64 bit double, and 80
>bit `extended' precisions; there are instructions galore for (e.g.)
>exp, sin, cos, tan.

>Best of all :-) the assembler syntax in the examples in the manual
>is Vax Unix style.  .word, .align, .space directives.  No more silly
>ALL CAPS STUFF!  Hooray!  :-)
>
>[there, perhaps this will persuade mcg to elaborate :-) ]

The tools (sans compiler) were based on the UNIX System V.3 toolset, so they
should look pretty familiar to all of you.  They support flexnames,
portable ar format, and COFF.  Interesting additions we have made include
link-time leaf-procedure optimizations (turning 'call' instructions into
branch-and-link instructions to preserve the stack frame cache), and
'system-call' optimizations (changing 'call' instructions into 'syscall'
instructions).  The toolkit includes as, ld, ar, nm, dump (COFF dumper),
dis (disassembler), M4 (for those who feel they need a macro assembler),
size, strip, and a ROM formatter.

The compiler supports October 1987 draft ANSI (pre-noalias), including
const, volatile, function prototypes, and the new C preprocessor.  It has
slighlty modified (but still, in our opinion, legal dpANS) floating-point
widening rules to more closely model IEEE computational models, and has
a 'long double' extended precision floating-point type.  It also
supports pre-register allocation inline assembly language support similar
to the AT&T 3B2 model, which allows use of normal local and global variable
names as arguments to in-lined asm functions without a need to know what
registers they will be in.  The compiler comes with a full (V.3) Stdio,
carefully hacked to be runnable on bare hardware with underlying support only
for open, close, creat, read, write, lseek, and ioctl.  Libraries that support
these functions for standard UARTs (for terminal I/O only) are provided.
All the remaining obvious libc functions (str*, malloc, etc) are supported.

Despite their System V origins, all of the tools currently run on PC-AT MS-DOS
machines, and will soon be available on VAX/UNIX (Ultrix & 4.3BSD), and on
VAX/VMS.

In case you wonder what my involvement is, I was a member of the processor
architecture group headed by Glen Myers (author of 'Advances in Computer
Architecture' and other bestsellers), and later a manager of the Software
Tools group.  The processor was architected over a period of several years
by many people who deserve much credit: among them Konrad Lai, Jim Valerio,
Fred Pollack, and Dave Budde, and implemented very ably by Mike Imel,
Glenn Hinton, Randy Steck, and many others.  This is merely a representative,
and not a comprehensive, list of the people who made the 80960 possible.

Also, the 80960KB Programmer's Reference Manual is available now, and is
Order Number 270567-001, and the Hardware Designer's Reference Manual is
Order Number 270564-001.  Both are available by calling (800) 548-4725, or
by writing Intel Literature, P.O. Box 58130, Santa Clara, CA 95052-8130,
or by calling your local Intel sales office.  From literature sales, the
manuals are $21 and $18, respectively, and I am told they are immediately
available.

S. McGeady
Intel Corp.

marc@ima.ISC.COM (Marc Evans) (04/14/88)

I remember a few years ago that Intel announced a processor called (I think)
the 432...Now that I have read about this processor (80960) in some of the
industry rags, as well as on the net, it seems to me that the 80960 is just
a repackaged, supercharged version of the 432. Can anybody comment on this?

As a side... How 'bout the prices on the 80960MC $2490 (ref Electronic Products
pg 16), and the bus exchange unit M82965 @ $1750... YOW!

tim@amdcad.AMD.COM (Tim Olson) (04/14/88)

In article <949@ima.ISC.COM> johnl@ima.UUCP (John R. Levine) writes:
| I would be fascinated to hear about high reliability embedded applications
| that use virtual memory. Seems to me you'd need a pretty artful designer to
| come up with a system that satisfies the sort of real-time constraints
| generally present in embedded systems while handling page faults.

Memory management does not imply demand-paged virtual memory.  It can be
used simply for protection checking (to catch errant programs before
they do real damage).

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)

dik@cwi.nl (Dik T. Winter) (04/14/88)

In article <7543@apple.UUCP> baum@apple.UUCP (Allen Baum) writes:
 > Actually, yes. Despite some fairly clever scoreboarding, many simple
 > instructions take two cycles. This appears to happen because they have a single
 > port register file. For example: A+B->C, D+E->F. The second addition will
 > take 2 cycles. But: A+B->C, C+E->F. The second addition will take 1 cycles.
 > This is because they forward the ALU result to the second addition, which
 > saves them a cycle. Ironic, since forwarding usually make instructions run
 > just as fast as they would if there were no data dependencies; here, data
 > dependencies make it run faster! 
 > 
In vector machines this is a well known feature, called short-stop.
For Cray-1 and Cray XMP this is true for operations on vector registers.
For Cyber 205 and ETA 10 this is true for operations on scalar registers.
It requires careful scheduling of your instructions.

E.g. on the Cray a short stop occurs some 7 cycles after instruction start;
if you miss it you have to wait till the previous instruction terminates.
This makes it possible that programs tuned for the Cray-1 run slower
on the XMP.

Similar things hold for the 205.  Here in fact, if I remember correctly,
the instruction that uses the result of a previous instruction must be
issued in a very small time frame after the previous instruction to benefit
from the short stop.  It should not be issued too early.  E.g.
	A+B->C;C+D->E
might run slower than
	A+B->C;NOP;NOP;C+D->E
(You could of course issue other instructions than NOP:
	A+B->C;P+Q->R;V+W->Z;C+D->E
the instructions are pipelined.)
A compiler writers nightmare I believe.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

pardo@june.cs.washington.edu (David Keppel) (04/14/88)

In article <953@ima.ISC.COM> marc@ima.UUCP (Marc Evans) writes:
>I remember a few years ago that Intel announced a processor called (I think)
>the 432...Now that I have read about this processor (80960) in some of the
>industry rags, as well as on the net, it seems to me that the 80960 is just
>a repackaged, supercharged version of the 432. Can anybody comment on this?

Yes.

Unless I'm missing some big stuff about the 80960, they are not at all
the same.

The 432 supported objects and capabilities in hardware.  Thus, the hardware
recognized and protected the specific data type of "access descriptor" and
encapsulated /refinement/ (essentially a suid on a capability).

The 80960 is just your generic modern microprocessor.

For more information on the 432, a description of capabilities, and a
comparison of various related architectures, see "Capability-Based
Computer Systems" by Henry M. Levy, (C) 1984 Digital Equipment Corp.,
printed by Digital Press.  (Gosh, I *knew* we could somehow force the
VAX into this discussion!)

	;-D on  ( But then again, I could be )  Pardo

bwong@sundc.UUCP (Brian Wong) (04/14/88)

In article <953@ima.ISC.COM>, marc@ima.ISC.COM (Marc Evans) writes:
> I remember a few years ago that Intel announced a processor called (I think)
> the 432...[ stuff deleted ] it seems to me that the 80960 is just
> a repackaged, supercharged version of the 432. Can anybody comment on this?
> pg 16), and the bus exchange unit M82965 @ $1750... YOW!

Yargh!  The Intel 432 was a (very) CISC machine, heavily microcoded, and 
intended to be essentially an "ada machine."  Given what we've read here
about the 80960, it's a RISC machine.  I doubt that the two chips have
almost anything in common!

phil@amdcad.AMD.COM (Phil Ngai) (04/14/88)

In article <953@ima.ISC.COM> marc@ima.UUCP (Marc Evans) writes:
>As a side... How 'bout the prices on the 80960MC $2490 (ref Electronic Products
>pg 16), and the bus exchange unit M82965 @ $1750... YOW!

That's not so bad. It works out to be about a dollar a pound when you
consider all the paperwork you have to do to sell a military product.
:-)

Did you ever wonder why an AGM-54 Phoenix long range air-to-air
missile costs half a million dollars?
-- 
America is finally exporting affordable cars to Japan: the Honda Accord.

I speak for myself, not the company.
Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or phil@amd.com

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/14/88)

In article <953@ima.ISC.COM> marc@ima.UUCP (Marc Evans) writes:
| I remember a few years ago that Intel announced a processor called (I think)
| the 432...Now that I have read about this processor (80960) in some of the
| industry rags, as well as on the net, it seems to me that the 80960 is just
| a repackaged, supercharged version of the 432. Can anybody comment on this?

			432	80960
addressing		bit	byte
instruction set		CSIC	RISC
model			object	register

They don;t even have the same pin count or process.

| As a side... How 'bout the prices on the 80960MC $2490 (ref Electronic Products
| pg 16), and the bus exchange unit M82965 @ $1750... YOW!

I *think* Intel is shooting themselves in the foot on that one. While
they can make a large profit per chip (and how!), I believe that there
is a market for alternatives to the SPARC chipset in the workstation
market. I think that there are people who associate *86 with PC, no
matter what the speed. When I mentioned getting a Sun Roadrunner, the
comment was made "but, for that price you could buy a workstation."

Intel may rethink that price...I thnik someone is doing a mil spec
SPARC, and there is a tiny bit of thought given to price, even with tax
money. Plus you could develop software on the same CPU as the target.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

mcg@omepd (Steven McGeady) (04/15/88)

In article <49265@sun.uucp> david@sun.uucp (David DiGiacomo) writes:
> ...
>>   The 80960KA and the 80960KB are both available in 20MHz CHMOS* III
>>configurations.  Both embedded processors operate at a sustained 7.5 MIPS
>>and 15K Dhrystones rates.
>
>Why is the integer performance so low?  Do most instructions take 2 cycles?

The short answer is yes, many instructions take two cycles in the current
implementation.  For the long answer, read on.

Well, first, while 7.5 MIPS might seem slow for a $2000/chip workstation
CPU, the price/performance of the 80960KA and KB is very good compared to its
competition (whomever that may be) in the embedded marketplace.   Claims
of "the fastest microprocessor ever!" are: a) often false; and b) seldom
true for very long.  The 80960KA was the fastest microprocessor around
when we hit silicon in 12/85, but we knew very well that fast silicon
without quality tools and support wasn't very useful.  I won't get dragged
into the "my MIPS number is longer than your MIPS number" game that goes
on here all too often.

Second, as I hope to demonstrate soon, the 7.5 MIPS number is actually
relatively conservative, and depends on the mix that you run.  In other words,
don't feel that you have to apply an automatic derating to that number
because of past (mis-) deeds of unrelated marketeers.

Our number is based on the integer Stanford benchmarks, grep, diff, compress,
and other UNIX programs jerry-rigged to run in an embedded environment, and
various customer benchmarks.  I'm trying to gather some up-to-date benchmark
info to post, but it's taking some time to get it together in a form the net's
performance mavens won't shoot holes in.

The *technical* answer to your question is that the register file
*in the current implementations* is not multi-ported (enough ways) and that
"RISC" instructions (typically 1 cycle) suffer an additional cycle latency if 
the value it needs is not either a literal or the destination register
from the previous instruction.  If the register file can be "bypassed",
normal instructions execute in 1 cycle, otherwise they run in 2.
Certain other instructions (bit extract, bit modify, check bit,
compare-and-increment/decrement) take 2 cycles with bypass, 3 without.

For completeness:
  Move instructions take 1 cycle per word.

  integer multiply takes 9-21 cycles (depending on # significant bits)
  typically 18.  integer divide takes twice as long.  The processor uses an
  early-out Booth multiplier.

  Branch instructiond take 0 (yes, zero) to 2 cycles.  In the former case,
  branchs can often be overlapped with previous instructions.

  Loads and stores are pipelined (3 deep), and loads take 4 to 5 cycles, stores
  2 to 3 cycles.  Other (unrelated) instructions can be executed in the delay
  slot after the load.  Thus, 3 loads can be executed in 7 cycles (due to
  the pipelining) and up to 3 additional instructions can be executed in
  the delay slots (safely, because of register scoreboarding).

  Call instructions take 9 cycles when a register set in the cache is
  available.  Flushing a set of local registers takes an additional 24 cycles,
  depending on memory speed.  Return takes 7 cycles, with the same caveat.
  The processor only flushes or reloads the register cache when necessary.
  The "call" and "return" instructions, contrary to normal RISC practice,
  do most of what is required to perform a subroutine linkage.  The 80960
  C entry prologue/epilogue is:

	_foo:	# foo takes four integer args, has int [100] auto array
		ldconst	400,r15
		addo	sp,r15,sp	# allocate auto space on stack
		movq	g0,r4		# save parameter registers (move quad)
		...
		mov	???,g0		# return value
		ret

  "ldconst" is a pseudo-op which expands to the most optimal way of loading
  a constant value.  The stack adjustment is only done if there are local
  variables that do not fit in registers.  The saving of the parameter
  registers is only done if the procedure is not a leaf procedure.

  Floating-point instructions take anywhere from 10 cycles (add-real) to 441
  cycles (cosine).  Most floating-point instructions are interruptible and
  resumable.

The next generation of 80960, now under development, will remove the bypass
miss limitation, as well as exploit more opportunities for fine-grained
parallelism in the architecture.  More I cannot say.

S. McGeady
Intel Corp.

bcase@Apple.COM (Brian Case) (04/15/88)

In article <3368@omepd> mcg@iwarpo3.UUCP (Steve McGeady) writes:
>
>In case you wonder what my involvement is, I was a member of the processor
>architecture group headed by Glen Myers (author of 'Advances in Computer
>Architecture' and other bestsellers), and later a manager of the Software
>Tools group.
>
>S. McGeady

Steve, is this the same Glen Myers who said:  "Ones eyebrows should rise
whenever a future architecture is developed with a register-oriented
instruction set?"  [Comp. Arch. News, Aug 1977, pp. 7-10]  Perhaps he
was quoted out of context; he actually meant the eyebrows should rise
in delight?  :-) :-)  Just kidding, I know we all say things we regret
(I certainly have!).  I just found it interesting that he should be
involved at all with such a register-intensive architecture!

randys@mipon2.intel.com (Randy Steck) (04/15/88)

In article <953@ima.ISC.COM> marc@ima.UUCP (Marc Evans) writes:
>I remember a few years ago that Intel announced a processor called (I think)
>the 432...Now that I have read about this processor (80960) in some of the
>industry rags, as well as on the net, it seems to me that the 80960 is just
>a repackaged, supercharged version of the 432. Can anybody comment on this?

The iAPX432 was about as far away from the architecture of the 80960 as you
can possibly get.  The 432 is typically referenced when talking about CISC
architectures (at the extreme CISC end of the spectrum).  And there were
alot of mistakes made that really killed the performance.  Some of these
were no registers (all operations were memory-based), bit-level encoded
instructions (Huffman encoding anyone?), two-chip implementation with only
a narrow microinstruction bus between them, and extremely long
call/return/branch times since everything was done in microcode.

In other words, the 432 was a dog on performance and therefore a failure,
even though some of the architectural features may have been
interesting/useful.

The 80960 architecture is in no way related, and in fact is very close to
the other end of the spectrum (closer to the RISC end).  Many RISC ideas
have been incorporated into the processor to allow an implementation to
achieve high performance (CPU *AND* system performance).  As Glen Myers
(the architect of the 960) said in a videotape at the announcement of the
family, we feel that it is a balanced architecture with no undue emphasis
being placed on any particular area to the detriment of others.

The 960's ideas on fine-grained parallelism and complete hardware
interlocking could go a long way in future implementations.

Randy Steck
Intel Corp.
Hillsboro, Oregon

These comments are my own.  Intel would certainly disavow any agreement
with them.  What?!?  You don't believe me?  Why not call them up and ask?

mcg@omepd (Steven McGeady) (04/16/88)

In article <8266@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes:
>In article <3368@omepd> mcg@iwarpo3.UUCP (Steve McGeady) writes:
>>
>>architecture group headed by Glen Myers (author of 'Advances in Computer
>
>Steve, is this the same Glen Myers who said:  "Ones eyebrows should rise
>whenever a future architecture is developed with a register-oriented
>instruction set?"  [Comp. Arch. News, Aug 1977, pp. 7-10]  Perhaps he
>was quoted out of context;

Well, I was sort of hoping that someone would rise to the bait on this
one - I'm glad it was you, Brian.

I wish I had a videotape of Glen's 5-year "roast" here at Intel two years
ago - one of our group members culled through a number of his old books and
found a pile of gems like this one.  Some of the people up here that were
involved in the 432, as well as Glen (who had no direct involvement in it)
were "born-again" in the fire-baptism of the 432, and decided that the time
had come to implement a *fast* microprocessor.  The mythology has it that
the CISC-ish object-oriented folks who wouldn't accept the new order were
banished into software groups and obscure research projects :-).  This is, of
course, nothing but mythology ...  [I was not at Intel when the 432 was
being built.]  It is even more amusing that the current program manager for
the project, one Bill Pohlman, was the program director of the original
8086 development.  He starts customer presentations by saying that he's
atoning for his sins by pushing the 80960.

For the record, Glen Myers is no longer at Intel - he now is a principal
at Radix Microsystems in Beaverton, OR.  Among other endeavours, he is
preparing a book on the 80960 architecture and its development.

S. "Flat is where it's at" McGeady
Intel Corp.

guy@gorodish.Sun.COM (Guy Harris) (04/16/88)

> >I remember a few years ago that Intel announced a processor called (I think)
> >the 432...Now that I have read about this processor (80960) in some of the
> >industry rags, as well as on the net, it seems to me that the 80960 is just
> >a repackaged, supercharged version of the 432. Can anybody comment on this?

Could the reason that the industry rags were under this delusion be that some
of the people involved in the 432 were also involved in the '960 (Myers and
Konrad Lai come to mind), and therefore (in typical ignorant industry-rag
reporter fashion) assumed that one was derived from the other?

mcg@omepd (Steven McGeady) (04/18/88)

In article <49681@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>> >I remember a few years ago that Intel announced a processor called (I think)
>> >the 432...Now that I have read about this processor (80960) in some of the
>> >industry rags, as well as on the net, it seems to me that the 80960 is just
>> >a repackaged, supercharged version of the 432. Can anybody comment on this?
>
>Could the reason that the industry rags were under this delusion be that some
>of the people involved in the 432 were also involved in the '960 (Myers and
>Konrad Lai come to mind), and therefore (in typical ignorant industry-rag
>reporter fashion) assumed that one was derived from the other?

The industry rags did in fact get this wrong, but it probably had more to do
where *where* the development was happening.  Intel's Oregon Microcomputer
Engineering group was responsible for the 432.  That effort taught a lot
of people a lot of important things at many levels (from architecture to
circuit design) that were applied to the 80960.  But, apart from a
geographic and personnel lineage, it is difficult to trace the published
80960 architecture back to the 432.  Persons who see similarities between
the 80960KA, KB, and MC and the 432 are ignorant of the architectures of
one or the other.

Incidentally, Glen Myers wrote about the 432 when he worked for IBM.  The
432 effort was all but over by the time he joined Intel.

S. McGeady
Intel Corp.

andy@pcsbst.UUCP (Andre Wolper) (04/18/88)

>For the record, Glen Myers is no longer at Intel - he now is a principal
>at Radix Microsystems in Beaverton, OR.  Among other endeavours, he is
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

What does this company do, who were it's founders? How many people involved?
(I'm asking out of personal interest only, please respond via E-mail, it'll
 be appreciated. Thanks!)

**********************************
Andre Wolper  ...usual disclaimers

...unido!pcsbst!andy
**********************************

sid@linus.UUCP (Sid Stuart) (04/18/88)

>   The highly-integrated 80960KB has a number of functions on-chip that
>are characteristic of multiple-chip solutions.  On-chip functions include
>32 32-bit registers, the FPU [with four additional 80-bit registers],
>a 512-byte instruction cache, a stack frame cache, and a 32-bit multiplexed
>burst bus.			^^^^^^^^^^^^^^^^^?

	I have a copy of the 80960 Programmer's Reference Manual. I
can find no reference in it to a "stack frame cache". Can someone point
out where this is mentioned and what size this mythical cache is? Are the
four sets of local registers supposed to be the stack frame cache? 

sid@linus.arpa

BTW

I would like to thank Mr. McGeady for his timely posting of the
Intel press release.