[comp.arch] Signetics VLIW

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (03/02/90)

Philips/Signetics has now officially revealed its VLIW chip.
So far, I've heard:

	2 integer ALUs
	32 bit path to memory
	6 ops/clock
		- 2 integer
		- 1 branch
		- 1 "constant generator" ??
		- 1 memory operation
		- ?that leaves one op unaccounted for?

Pretty scanty. Surely, someone out there has details?
-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (03/07/90)

I recently asked for information about Philips/Signetics' new VLIW
chip.  Things went well, so here it is:

The chip is intended as a prototype for a 32-bit ASIC family.  The
CREATE compiler accepts Pascal-ish programs:  they are considering
various other language frontends.  The compiler reads a file
describing the specific chip.

The prototype chip does 50 MHz and has a 200-bit instruction word,
fetched over 100 pins by cycling them at 100 MHz.  One obvious ASIC
variation is to move the program to an on-chip memory.

The compiler supposedly can allow any number of functional units, and
you just tell it what the various pipeline delays are.  I assume that
there are some lower bounds - at least one ALU, and so on.  The
prototype chip has 6 units: two ALUs, a branch unit, a memory
interface unit, a register unit, and a constant generator.  The
memory interface unit has address/data wires to the outside.  There
is nothing keeping designers from adding custom units, or multiple
memory interfaces.  (However, I expect the first designer who tries
this will trigger some compiler hacking.)

What holds everything together is the "multiport memory", which takes
up a big fraction of the chip.  Each unit has one or two 32-bit paths
from the multiport memory, and one 32-bit path back to it.  The
prototype has something like 13 ports.  Now, they cheated.  You would
think from the word "multiported" that every result is written to an
address, and every fetch is from an address.  Close; they economized
by having a "funnel file" attached to each read port.  This is just a
two-port (1R 1W) memory.  When you write to multiport memory, a mongo
mux takes the data to the funnel files that you specify, and to the
addresses within them that you specify.  When a unit's read port
reads from "multiport memory", what actually happens is that an
address is applied to his specific funnel file.

The funnel files seem like a reasonable trade between density and
generality.  They can all be different sizes, and they all easily
have forwarding (done by referencing a special address).  The
compiler does have to able to remove contention at compile time.
Also, note that this scheme deal in values, not variables.  (Data has
to go to each funnel file that will need it, and multiple copies cost
space rather than time.)

Each unit also has a 1-bit read port to multiport memory.  It uses
these to fetch boolean "guards" that disable writeback or interrupts.
I'm not quite clear on how they compute guards, but it sounds like a
good idea.

They are claiming 75 K Dhrystones, although that's fuzzy because they
transliterated the 2.0 benchmark to their own language.  They claim
50 to 100 VAX MIPS on suitable integer programs.

They give the impression that they can whip up instances in fairly
short order.  If Philips supports this hard enough for the tools to
mature, it could become a very interesting ASIC option.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science