clif@intelca.intel.com (Ken Shoemaker) (03/04/89)
The following information is taken from the i860 TM 64-Bit Microprocessor data sheet order number 240296-001. I hope that this posting does not generate a meta-discussion about appropriateness of the posting. I believe that it contains more technical information than a typical comp.arch posting. We, Intel, will try to answer questions regarding the architecture . However due to work pressures and the need for approval prior to posting non-technical information their will probably be a delay. i860 64-bit Microprocessor Highlights: Parallel Architecture: 3 instructions Clock - one integer or control instruction - up to to Floating Point Instructions High Performance Design - 33.3/40 MHz Clock Rate - 80 MFLOP Peak Single Precision MFLOPs - 60 MFLOP Peak Double Precision MFLOPs - 64-bit External Data Bus - 64-bit Internal Instruction Cache Bus - 128-bit Internal Data Cache Bus Measured Performance with Current Compilers - 24 Megawhetsones (40 MHz) - 83K Dhrystones (40 MHz) Highly Integrated - 32/64-bit Pipelined Floating-Point Adder and Multipler - 32-bit Integer and Control Unit - 64-Bit 3-D Graphics Unit - Paging Unitg with TLB - 4K Byte Instruction Cache - 8K Byte Data Cache The core execution unit controls overall operation of the i860 TM CPU. The core unit executes load, store, integer, bit, and control-transfer operations, and fetches instructions for the floating-point unit as well. A set of 32 32-bit general-purpose registers are provided for the manipulation of integer data. Load and store instructions move 8-, 16-, and 32-bit data to and from these registers. Its full set of integer, logical, and control-transfer instructions give the core unit the ability to execute complete systems software and applications programs. A trap mechanism provides rapid response to exceptions and external interrupts. Debugging is supported by the ability to trap on data or instruction reference. The floating-point hardware is connected to a separate set of floating-point registers, which can be accessed as 16 64-bit registers, or 32 32-bit registers. Special load and store instructions can also access these same registers as 8 128-bit registers. All floating-point instructions use these registers as their source and destination operands. The floating-point control unit controls both the floating-point adder and the floating-point multiplier, issuing instructions, handling all source and result exceptions, and updating status bits in the floating-point status register. The adder and multiplier can operate in parallel, producing up to two results per clock. The floating-point data types, floating-point instructions, and exception handling all support the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985). The floating-point adder performs addition, subtraction, comparison, and conversions on 64- and 32-bit floating-point values. An adder instruction executes in three to four clocks; however, in pipelined mode, a new result is generated every clock. The floating-point multiplier performs floating-point and integer multiply and floating-point reciprocal operations on 64- and 32-bit floating-point values. A multiplier instruction executes in three to four clocks; however, in pipelined mode, a new result can be generated every clock for single-precision and every other clock for double precision. The graphics unit has special integer logic that supports three-dimensional drawing in a graphics frame buffer, with color intensity shading and hidden surface elimination via the Z-buffer algorithm. The graphics unit recognizes the pixel as an 8-, 16-, or 32-bit data type. It can compute individual red, blue, and green color intensity values within a pixel; but it does so with parallel operations that take advantage of the 64-bit internal word size and 64-bit external bus. The graphics features of the i860 microprocessor assume that the surface of a solid object is drawn with polygon patches whose shapes approximate the original object. The color intensities of the vertices of the polygon and their distances from the viewer are known, but the distances and intensities of the other points must be calculated by interpolation. The graphics instructions of 860 CPU the directly aid such interpolation. The paging unit implements protected, paged, virtual memory via a 64-entry, four-way set-associative memory called the TLB (Translation Lookaside Buffer). The paging unit uses the TLB to perform the translation of logical address to physical address, and to check for access violations. The access protection scheme employs two levels of privilege: user and supervisor. {Editors note the i860 CPU's paging mechanism is the same as the 386 CPU.} The instruction cache is a two-way set-associative memory of four Kbytes, with 32-byte blocks. It transfers up to 64 bits per clock (266 Mbyte/sec at 33.3 MHz). The data cache is a two-way set-associative memory of eight Kbytes, with 32-byte blocks. It transfers up to 128 bits per clock (533 Mbyte/sec at 33.3 MHz). The 860 CPU normally uses writeback caching, i.e. memory writes update the cache (if applicable) without necessarily updating memory immediately; however, caching can be inhibited by software where necessary. The bus and cache control unit performs data and instruction accesses for the core unit. It receives cycle requests and specifications from the core unit, performs the data-cache or instruction-cache miss processing, controls TLB translation, and provides the interface to the external bus. Its pipelined structure supports up to three outstanding bus cycles. Clif Purkiser Intel Corp, Santa Clara Microcomputer Division
mark@mips.COM (Mark G. Johnson) (03/05/89)
Thanks for Clif Purkiser for an informative posting! <208@intelca.intel.com> did raise a question, though: >Highly Integrated > - 32/64-bit Pipelined Floating-Point Adder and Multipler > - 32-bit Integer and Control Unit > - 64-Bit 3-D Graphics Unit > - Paging Unitg with TLB > - 4K Byte Instruction Cache > - 8K Byte Data Cache Perhaps the list above is simply incomplete; by an omission it leads to speculations like: 1. Is there a Floating-Point Divider in hardware? 2. Are there Floating-Point Divide instructions (IEEE 32b & 64b) in the 80860 architecture? 3. How many clocks does it take to do an IEEE 32b divide? 64b? Thanks. -- -- Mark Johnson MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086 ...!decwrl!mips!mark (408) 991-0208
mash@mips.COM (John Mashey) (03/05/89)
In article <208@intelca.intel.com> clif@intelca.intel.com (Ken Shoemaker) writes: >The following information is taken from the i860 TM 64-Bit Microprocessor >data sheet order number 240296-001. I hope that this posting does not >generate a meta-discussion about appropriateness of the posting.... Appropriate posting; thanx; it's much better than seeing random rumors and misinformation, and there's plenty of technical content. There a few questions though: I suspect this was just an oversight, as somebody MUST know the answers, but 2 of the numbers need clarification, or they are almost meaningless: >Measured Performance with Current Compilers I assume this was measured on real hardware, so are you allowed to say what the memory system looks like? i.e., read latency and write retirement rates, for example? (of course, for these particular benchmarks it probably doesn't matter too much, since their cache miss rates are neglible :-) > - 24 Megawhetsones (40 MHz) 1) Was this single precision or double precision? 2) Whichever it was, what was the other one? > - 83K Dhrystones (40 MHz) 1) Which version: 1.1 or 2.1? I assume this wasn't 1.0, whose numbers are 15% better than 1.1. 2) What level of optimization? any inlining? any unusual options? (like, for example: the manual shows normal use of a frame pointer, which costs 4 cycle/call, but could be suppressed if you know things like alloca won't be used. Since a typical 32-bit RISC would use 30-40 cycles/call, suppressing the fp-manipulation gains about 10%.) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
seanf@sco.COM (Sean Fagan) (03/06/89)
In article <14616@obiwan.mips.COM> mark@mips.COM (Mark G. Johnson) writes: >Perhaps the list above is simply incomplete; by an omission it leads to >speculations like: > 1. Is there a Floating-Point Divider in hardware? No. > 2. Are there Floating-Point Divide instructions (IEEE 32b & 64b) > in the 80860 architecture? No. > 3. How many clocks does it take to do an IEEE 32b divide? 64b? Depends. I think it might be somewhere around 30-40 (40-50?), but I'm not sure. It doesn't have divide in hardware; what it has is reciporacal approximations (1.0/x), so you do that (plus a little bit to get rid of the errors), then multiply. Kinda like a Cray, right? 8-) -- Sean Eric Fagan | "What the caterpillar calls the end of the world, seanf@sco.UUCP | the master calls a butterfly." -- Richard Bach (408) 458-1422 | Any opinions expressed are my own, not my employers'.