[comp.parallel] 1 TFLOP Tally

eugene@pioneer.arpa (Eugene N. Miya) (06/28/88)

[ Eugene had run a "vote" on comp.arch as to when one teraflop
  would be reached.  Here's the replies.
	Steve
]

I have tallied the votes for when we would have 1 TFLOPS.
I suspect a lot of people would not agree how fast existing machines are
(that might be an interesting tally, but I will let some one else do
that).  Anyway, I threatened taking blood to get conservative estimates
of 1 TFLOP, yet I got a lot of votes for 1995.  So, I guess it's not
worth it to hunt you guys down.

     1995
     1995
     1995 1998                     2010
1994 1995 1998 2000 2002 2003 2008 2010 2024-(never)

I may have clobbered some entries, sorry, if I dropped you below.
I learned a bit more about surveys: you guys still can't give simple
scalar answers (see below).

Sorry this took so long, Usenix, new machines, etc. have me loaded down.
I hope to get to Arvin Park's IOstone next.

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene
  "Send mail, avoid follow-ups.  If enough, I'll summarize."

APPENDIX: (chonological except HAM people)

From: ian@esl.ESL.COM (Ian Kaplan)
We submitted a proposal under DARPA's Teraop grand challenge, so I can't
resist commenting (of course, this does not mean that I know what I am
talking about (but you know that)):

 YEAR: 1993 for Connection Machine like SIMD systems

 YEAR: 2003 (15 years hence) for "general purpose" system

  I know that Thinking Machines has gotten money under the Teraop
initiative.  They, or someone like them, should be able to come out
with a Teraop machine in the near future (next five years).  While
this machine will be commercially available, it will not be very
general purpose.  A general purpose machine must be programmable in an
algorithmic fashion.  Although the programmer may have to think about
parallelism, it should be at a high level (e.g., divide and conquer
algorithms).  The challenges in building a machine like this are
probably greater on the software end than the hardware end.  We are
only starting to understand parallel algorithmic languages like SISAL,
Lucid and ID.  I say 15 years because it will take a long time for the
software to develop.  Unfortunately peoples thinking changes much
slower than hardware technology, and it is the way that people think
that holds back parallel software.

1994
		jim sadler

YEAR:
		1995

Unfortunately, 1 TFLOP doesn't necessarily mean that existing code can be
transformed to achieve this rate nor does it mean that all kinds of float
operations will run that fast or with the precision we might now expect; I
think 1 TFLOP of SUSTAINED computation will happen only for very large
versions of the classic "embarrassingly parallel" applications.

In other words, the TFLOP measurement will probably become as meaningless as
MIPS are now.

					-hankd

YEAR: 1995

I calculate it to be 1993, actually, but then I figured you'd be able to
buy a UNIX box for $2000 by 1984. I didn't figure on UNIX getting bigger
and on IBM freezing microcomputer O/S development. Let's add 2 years. 1995
---
-- Peter da Silva, Ferranti International Controls Corporation.

uicsrd.csrd.uiuc.edu%uxc.cso.uiuc.edu@uxc.cso.uiuc.edu (Steve Turner)
YEAR:
	1995

Jest a guess...  About 7 years for 3 orders of mag may be too soon,
but I'd rather err on the early side for a change.

YEAR:
		1998
	-- Tim Olson
YEAR:
		1998
Hi, Eugene!

But... My above answer assumes you include as legal some sort of
multi-processor (I don't care, data-flow, lattice, hyper-torus,
whatever), since I don't believe we'll reach TFLOP performance in
a single CPU any time soon... unless you count *deep* pipelining
in the FPU, and I think that's as hard to use in an application
program as parallel multi's are. (Besides, really deep pipelining
begins to approach "dataflow", so you might as well look at dataflow.)

Even that presentation at Ames that Alan Huang gave a while back
on the optical computers they're working on at Bell Labs didn't
promise super-duper single-CPU speeds. With 5ns around the loop,
you only get 200M micro-cycles/second. You have to stack wavefronts,
which gets you multiple parallel (slightly out of sync) machines,
*not* a faster single CPU.

No, I think it will take a combination of two technologies,
neither of which will happen without the other, which means
that it will take one big project to make it happen (maybe SDI?)
rather than some incremental development by independents:

1. Truly massive partially-shared-memory machines -- something like a
   hyper-Butterfly or a hypercube with a bus for interconnect instead
   of a serial bus (Intel-style hypercubes are bandwidth-limited) --
   to get enough performance.

2. A step jump in compilers, which will optimize/loop_unfold/assign
   applications to that machine.

The notion is to takes Djkstra's notion of a (terminating) program as
a single function which transforms the input state of the machine to the
output state (assume that the input and output data files fit in main
memory), said function being specified in programmatic (imperative)
form instead of functional form, and transforming it into (more or less)
pure functional form (loop unrolling, convertings loops to recursion,
using lazy evaluation, etc.). THEN the functional form is mapped onto the
processor array...

Anyway, I give it a decade.

Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun,attmail}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403

YEAR: 2002 ... SHIP DATE of a non-special-order commercial configuration
		with sustained teraflops performance.
Reasoning:
If the next NCUBE puts out 1MFLOPS * 4096 CPUs = 4 GFLOPS
If BBN uses the Motorola Hypercard, 256*4 88000's are approx 4GFLOPS

Hence 1990 will see 4GFLOPS as "commercial". (Maybe 1989.)

Apply the old rule, factor-of-4-every-3-years (Bill Joy notwithstanding).

Factor of 256 (1TFLOPS/4GFLOPS = 256) = 4 factors-of-4 = 12 years

hence, 1990+12 = 2002.

ALTERNATIVE THEORY: (technology, not trend lines):

Assume a gallium chip with 500Mhz clock getting sustained 128 MFLOPS.
A machine with 8K of these (say, 1K clusters of 8) sounds viable.
Do do that we need 100k-400k transistors on a gallium chip. 
We currently get about 1k, so, we need a factor of 100-400 in density.
This should be double-every-year (since it rides silicon's coattails).
So, 7..9 years, +1988 = 1995..1997.

I stand by both estimation methods, but I give more confidence to the
less agressive one. Hence, "1995..2002, likelier the latter".
Note that TF-1 clones don't count: anything that won't fit in standard
computer rooms doesn't fit my definition of "product".
--
Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science

From: amdcad!uunet!mcvax!nikhefk!henkp@ames (Henk Peek)
YEAR:
		2008

Date: Sun, 5 Jun 88 13:12:32 PDT
From: amdcad!henry@Sun.COM (Henry McGilton--Software Products)

YEAR:	2010

Hello Again,
	I can't resist these Delphic Poll type of questions.

I think we'll hit the Tera flop before year 2000.  I must admit to
whistling completely in the dark because I don't have enough data
to do the trend curves.

Completely off the top of my head though, we hit

	  10  Mflops around  1966 (the 6600)
	 100  Mflops around  1975 (the Cyber 203/5)
	1000  Mflops around  1986 (the big Crays)

Which seems to me we're getting factors of ten every ten years of
so.  However, I'd predict that new technologies would steepen the
development curve.

Didn't Winograd come up with a limit based on quantum mechanical
limitations?  I've forgotten the article.

According to Bill Joy, we see 2 ** (year - 1984) MIPS each year.  By
that equation we'll see 2**16 MIPS (16 TIPS) by year 2000, so 1
Teraflop can't be far behind that.

Have fun with the poll.

	......... Henry

Tom Lasinski
	2010
GAM
	1998-2000
David Bailey
	1995
Marty Fouts
	2001-2024 (Never)