pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/13/90)
I would like to draw attention to the latest issue of BYTE, November 1990, which I have just received today. On page 19: "Minimalist architecture promises speed, chips that can mimic others" ------------------------------------------------------------------- It is about a very fast superscalar MISC that is degiend to emulate other architectures at hgh speed, like Sinclair's. On page 24: "TI's new printer technology does it with mirrors" -------------------------------------------------------------- It is about microengineered mirrors on the syrface of a chip, used to deflect lasers. This is interesting for things like laser printers and optical disks. Note however that the technology is old -- several years ago I read an article on it in the IBM RD Journal, and I suspect it is already used in some IBM product. On page 28: "AMD accelerates RISC line with FPU" ------------------------------------------------ The AMD 29050 has an embedded FPU claimed to have a peak speed of 80 MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?). I would also like to draw attention to one article in the "state of the art" section: on page 283 there is "Crystal clear storage", which, if delivered, could radically change the tradeoffs in the designs of computers and operating systems and databases. Also, in the "feature" section: page 342; the most interesting material in the sort term may be on pages 348-350, for GaAs and JJ technologies. Apparently there is a TI 150Mhz 6-stage pipeline GaAs RISC, and in Japan ETL has a prototype of a 1Ghz 4-chip RISC... -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
richard@cayman.amd.com (Richard Relph) (11/14/90)
In article <PCG.90Nov12181003@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >On page 28: "AMD accelerates RISC line with FPU" >------------------------------------------------ >The AMD 29050 has an embedded FPU claimed to have a peak speed of 80 >MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?). Yes, that's right, two flops per cycle. In addition to the "simple" floating point operations defined for the 29K family (and implemented in the Am29000/Am29005 via software) that are register = register op register form, we added 4 new "two flop" instructions - FMAC, DMAC, FMSM, and DMSM. These instructions take advantage of 4 new floating point accumulators as an implied 4th operand for operations of the form X = A * B + C. Using FMAC as an example, X is one of the 4 accumulators, A is a general purpose register, B is either a GP register or a constant 1.0 and C is either the accumulator or the constant 0.0. The signs of B and C are fully programmable as well. The FMAC instruction defines all of the operands (and the destination) and issues in 1 cycle. Since the adder is fully pipelined and the multiplier is fully pipelined for single precision multiplies, one can issue a new FMAC every cycle. This is particularly useful for the matrix multiplication the commonly occurs in graphics and other applications. A 4x4 by 1x4 matrix multiplications occurs in just 22 cycles (including waiting for the last FMACs to complete). Also, the accumulators may be either single or double precision without affecting performance of the FMAC instruction. For the FMSM instruction, X, B, and C are all GP registers and A is always accumulator 0. Again, FMSM can be issued every cycle, resulting in 2 flops per cycle.
ruehl@ethz.UUCP (Roland Ruehl) (11/14/90)
In article <1990Nov13.160952.13856@mozart.amd.com>, richard@cayman.amd.com (Richard Relph) writes: > In article <PCG.90Nov12181003@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: > >On page 28: "AMD accelerates RISC line with FPU" > >------------------------------------------------ > >The AMD 29050 has an embedded FPU claimed to have a peak speed of 80 > >MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?). > Yes, that's right, two flops per cycle. ......... Even more interesting: when can one get a solid 29050 C compiler exploiting all these goodies ? -- Roland Ruehl uucp: uunet!mcsun!ethz!ruehl Tel: (01) 256 5146 (Switzerland) eunet: ruehl@iis.ethz.ch +411 256 5146 (International) Integrated Systems Laboratory ETH-Zentrum 8092 Zurich
richard@cayman.amd.com (Richard Relph) (11/16/90)
In article <6619@ethz.UUCP> ruehl@ethz.UUCP (Roland Ruehl) writes: >In article <1990Nov13.160952.13856@mozart.amd.com>, richard@cayman.amd.com (Richard Relph) writes: >> In article <PCG.90Nov12181003@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >> >On page 28: "AMD accelerates RISC line with FPU" >> >------------------------------------------------ >> >The AMD 29050 has an embedded FPU claimed to have a peak speed of 80 >> >MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?). >> Yes, that's right, two flops per cycle. ......... > > Even more interesting: when can one get a solid 29050 C compiler > exploiting all these goodies ? Both the MetaWare compiler and a GCC compiler have the capability to generate the instructions that execute 2 flops. Both compilers are "in testing" and are expected to be generally available this year. Here's some sample code produced by one of the compilers: float t1 (float *res1, float *v1, float *v2, float scale, float offset) { int i; float accum = 0.0; float accum2 = 0.0; for (i = 0; i < 100; i++) { accum += v1[i] * v2[i]; accum2 += v1[i] * (- v2[i]); } *res1 = accum2 * scale + offset; return accum; } _t1: sll gr119,lr5,0 const gr116,0 mtacc gr116,1,3 mtacc gr116,1,0 const gr116,396 add gr118,lr4,gr116 L5: load 0,0,gr116,lr3 load 0,0,gr117,lr4 fmac 0,3,gr116,gr117 fmac 1,0,gr116,gr117 add lr4,lr4,4 cple gr116,lr4,gr118 jmpt gr116,L5 add lr3,lr3,4 fmsm gr116,gr119,lr6 mfacc gr96,1,3 jmpi lr0 store 0,0,gr116,lr2