[comp.arch] "Rumours" from BYTE - November 1990

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/13/90)

I would like to draw attention to the latest issue of BYTE, November
1990, which I have just received today.

On page 19: "Minimalist architecture promises speed, chips that can
mimic others"
-------------------------------------------------------------------
It is about a very fast superscalar MISC  that is degiend to emulate
other architectures at hgh speed, like Sinclair's.

On page 24: "TI's new printer technology does it with mirrors"
--------------------------------------------------------------
It is about microengineered mirrors on the syrface of a chip, used to
deflect lasers. This is interesting for things like laser printers and
optical disks. Note however that the technology is old -- several years
ago I read an article on it in the IBM RD Journal, and I suspect it is
already used in some IBM product.

On page 28: "AMD accelerates RISC line with FPU"
------------------------------------------------
The AMD 29050 has an embedded FPU claimed to have a peak speed of 80
MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?).

I would also like to draw attention to one article in the "state of the
art" section: on page 283 there is "Crystal clear storage", which, if
delivered, could radically change the tradeoffs in the designs of
computers and operating systems and databases. Also, in the "feature"
section: page 342; the most interesting material in the sort term may be
on pages 348-350, for GaAs and JJ technologies. Apparently there is a TI
150Mhz 6-stage pipeline GaAs RISC, and in Japan ETL has a prototype of a
1Ghz 4-chip RISC...
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

richard@cayman.amd.com (Richard Relph) (11/14/90)

In article <PCG.90Nov12181003@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>On page 28: "AMD accelerates RISC line with FPU"
>------------------------------------------------
>The AMD 29050 has an embedded FPU claimed to have a peak speed of 80
>MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?).
Yes, that's right, two flops per cycle. In addition to the "simple"
floating point operations defined for the 29K family (and implemented
in the Am29000/Am29005 via software) that are register = register op
register form, we added 4 new "two flop" instructions - FMAC, DMAC,
FMSM, and DMSM. These instructions take advantage of 4 new floating
point accumulators as an implied 4th operand for operations of the
form X = A * B + C. Using FMAC as an example, X is one of the 4
accumulators, A is a general purpose register, B is either a GP
register or a constant 1.0 and C is either the accumulator or the
constant 0.0. The signs of B and C are fully programmable as well.
The FMAC instruction defines all of the operands (and the destination)
and issues in 1 cycle. Since the adder is fully pipelined and the
multiplier is fully pipelined for single precision multiplies, one
can issue a new FMAC every cycle. This is particularly useful for
the matrix multiplication the commonly occurs in graphics and other
applications. A 4x4 by 1x4 matrix multiplications occurs in just
22 cycles (including waiting for the last FMACs to complete). Also,
the accumulators may be either single or double precision without
affecting performance of the FMAC instruction.
For the FMSM instruction, X, B, and C are all GP registers and A
is always accumulator 0. Again, FMSM can be issued every cycle,
resulting in 2 flops per cycle.

ruehl@ethz.UUCP (Roland Ruehl) (11/14/90)

In article <1990Nov13.160952.13856@mozart.amd.com>, richard@cayman.amd.com (Richard Relph) writes:
> In article <PCG.90Nov12181003@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
> >On page 28: "AMD accelerates RISC line with FPU"
> >------------------------------------------------
> >The AMD 29050 has an embedded FPU claimed to have a peak speed of 80
> >MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?).
> Yes, that's right, two flops per cycle. .........

  Even more interesting: when can one get a solid 29050 C compiler
  exploiting all these goodies ?

--

Roland Ruehl                            uucp:  uunet!mcsun!ethz!ruehl
Tel: (01) 256 5146 (Switzerland)        eunet: ruehl@iis.ethz.ch
     +411 256 5146 (International)

Integrated Systems Laboratory
ETH-Zentrum
8092 Zurich

richard@cayman.amd.com (Richard Relph) (11/16/90)

In article <6619@ethz.UUCP> ruehl@ethz.UUCP (Roland Ruehl) writes:
>In article <1990Nov13.160952.13856@mozart.amd.com>, richard@cayman.amd.com (Richard Relph) writes:
>> In article <PCG.90Nov12181003@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>> >On page 28: "AMD accelerates RISC line with FPU"
>> >------------------------------------------------
>> >The AMD 29050 has an embedded FPU claimed to have a peak speed of 80
>> >MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?).
>> Yes, that's right, two flops per cycle. .........
>
>  Even more interesting: when can one get a solid 29050 C compiler
>  exploiting all these goodies ?
Both the MetaWare compiler and a GCC compiler have the capability to
generate the instructions that execute 2 flops. Both compilers are
"in testing" and are expected to be generally available this year.
Here's some sample code produced by one of the compilers:

float t1 (float *res1, float *v1, float *v2, float scale, float offset)
{
  int i;
  float accum = 0.0;
  float accum2 = 0.0;

  for (i = 0; i < 100; i++) {
      accum += v1[i] * v2[i];
      accum2 += v1[i] * (- v2[i]);
      }
  *res1 = accum2 * scale + offset;
  return accum;
}
_t1:	sll gr119,lr5,0
	const gr116,0
	mtacc gr116,1,3
	mtacc gr116,1,0
	const gr116,396
	add gr118,lr4,gr116
L5:	load 0,0,gr116,lr3
	load 0,0,gr117,lr4
	fmac 0,3,gr116,gr117
	fmac 1,0,gr116,gr117
	add lr4,lr4,4
	cple gr116,lr4,gr118
	jmpt gr116,L5
	add lr3,lr3,4
	fmsm gr116,gr119,lr6
	mfacc gr96,1,3
	jmpi lr0
	store 0,0,gr116,lr2