[comp.parallel] info on data-flow machine required

bvle@mullauna.cs.mu.OZ.AU (Binh Van LE) (07/30/90)

I am interested in getting some info on data-flow machine, namely:

	- references,
	- current development
	- personal experiences and opinions

If there is enough interest, I will post a sumary.

Thanks,

Binh.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
bvle@cs.mu.OZ.AU
Computer Science Department,
Melbourne University,
Australia.

sakai@etl.go.jp (Shuichi Sakai) (07/31/90)

In article <9904@hubcap.clemson.edu>, bvle@mullauna.cs.mu.OZ.AU (Binh Van LE) writes:
> I am interested in getting some info on data-flow machine, namely:
> 
> 	- references,
> 	- current development
> 	- personal experiences and opinions
> 

I am Dr. Sakai, Electrotechinical Laboratory, Japan.

Our research section has already constructed two dataflow machines, SIGMA-1
and EM-4. The former consists of 128 PEs, 128 SEs and two layered multistage
network. It really recorded the performance of 170 MFLOPS in the spring of 
1987. The latter is a new machine which has 80 PEs. It actually performs 996
MIPS on the summation of 65,536 numbers. It calculates the first 4,000 digits
of PI in 0.369 sec. This is about 100 times faster than Sparc 330.

As for SIGMA-1, my colleague will report in this newsgroup. I am a designer
of the EM-4, so let me briefly report on it.

1. Features of the EM-4 prototype: (see ISCA89, IFIP89, ICS90,
InfoJapan90, etc. Dr. Hiraki in IBM Watson will report something about
the EM-4 in ICPP90.)

(1) Strongly Connected Arc Model 
	Naive implementation of dataflow is not realistic. Execution locality
should be extracted so as to exploit cache or a register file. EM-4 adopts
this model, which generates a critical section in a dataflow graph. The 
execution order of this section is determined statically and the section
is mapped onto a register based architecture, i.e. RISC.

(2) Pipeline Integration
	Two kinds of pipelines are integrated in the EM-4.  One is a
packet based cyclic pipeline and the other is a register based advanced
control pipeline. The former bypasses the matching stage, when the
order of execution is preliminary determined. Remark!! Cyclic pipeline
cannot stand by itself.

(3) Multiple RISC Scheme with a Single Chip Processor EMC-R
	We developed a single chip CMOS processor EMC-R.  It contains
45,788 gates and it has 299 pins. This chip is now being fabricated
by LSI Logic. It has been fully functional since November 1989.
It actually performs 12.5 MIPS.

EMC-R is a Multiple RISC in the sense:
	- small instruction set
	- few instruction formats
	- few addressing modes
	- no microprograms
	- register file architecture and RISC pipeline
	- 1 clock execution of each instruction
	(the aboves are as a conventional RISC)

	- few packet formats
	- few packet types
	- light synchronization
	- small and effective interconnection network
	- single chip with synchronization and communication facilities
	  both of which are operational independently of and in parallel
	  with the execution part
	(the aboves are as a multiprocessor RISC)

(4) Direct Matching Scheme
	Matching is realized in one clock without any associative
mechanisms. In addtion, inside the strongly connected section, there
are no dynamic synchronization.

(5) Versatile Interconnection Network with Extra Facilities
	I will report this in another paper. It has a deadlock prevention
facilities and automatic load balancing facilities.

(6) Maintenance Architecture
	There is a auxiliary system other than the computation system,
dedicated to maintenance of the whole structure. This can dynamically 
monitor the system actions and support hardware/software debugging,
performance measurement, scheduling strategies, etc.
 
2. Implementation

Size: 60 cm * 92 cm * 140 cm
Performance: max. 1 GIPS
Network Performance: max. 14.63 GB/s
Power: 2.6 KW
Boards:
	- 16 PE groups boards each of which has 5 PEs
	- 2 mother boards which realize the global interconnection
	- Interface Switch (a packet interface between host and EM-4)

3. Software
DFC, DFC-II: a language compatible with C
Another language is now being designed.


4. Status
Hardware with macro assembler: Fully operational since April 1990
Compiler of DFC-II: completed in this year
Performance Report: will be in some papers in conferences

For more questions, please email to the following address. As a matter
of fact, after Dr. Kahaner reported on our machine in this newsgroup,
there have been many sharp questions and valuable comments sent to us.
We have not replied to all of them, sorry, but will surely do so.

						sakai@etl.go.jp
						kodama@etl.go.jp-- 
$@EE;R5;=QAm9g8&5f=j>pJs%"!<%-%F%/%A%cIt7W;;5!J}<08&5f<<!!:d0f=$0l(J 
ETL,Computer Science Division,Computer Architecture Section, Shuichi Sakai
$@<qL#!'C;2N(J  $@2HB2!':J!J@i2E;R!K(J $@D9CK!J7E!K(J  $@<V!'(J$@%S%9%?(J     $@$D$`$8$,$U$?$D(J  
sakai@etl.go.jp   tel. 0298-58-5876  fax. 0298-58-5882  telex 362570 AISTJ