eugene@pioneer.arpa (Eugene N. Miya) (06/28/88)
[ Eugene had run a "vote" on comp.arch as to when one teraflop would be reached. Here's the replies. Steve ] I have tallied the votes for when we would have 1 TFLOPS. I suspect a lot of people would not agree how fast existing machines are (that might be an interesting tally, but I will let some one else do that). Anyway, I threatened taking blood to get conservative estimates of 1 TFLOP, yet I got a lot of votes for 1995. So, I guess it's not worth it to hunt you guys down. 1995 1995 1995 1998 2010 1994 1995 1998 2000 2002 2003 2008 2010 2024-(never) I may have clobbered some entries, sorry, if I dropped you below. I learned a bit more about surveys: you guys still can't give simple scalar answers (see below). Sorry this took so long, Usenix, new machines, etc. have me loaded down. I hope to get to Arvin Park's IOstone next. Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "Mailers?! HA!", "If my mail does not reach you, please accept my apology." {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene "Send mail, avoid follow-ups. If enough, I'll summarize." APPENDIX: (chonological except HAM people) From: ian@esl.ESL.COM (Ian Kaplan) We submitted a proposal under DARPA's Teraop grand challenge, so I can't resist commenting (of course, this does not mean that I know what I am talking about (but you know that)): YEAR: 1993 for Connection Machine like SIMD systems YEAR: 2003 (15 years hence) for "general purpose" system I know that Thinking Machines has gotten money under the Teraop initiative. They, or someone like them, should be able to come out with a Teraop machine in the near future (next five years). While this machine will be commercially available, it will not be very general purpose. A general purpose machine must be programmable in an algorithmic fashion. Although the programmer may have to think about parallelism, it should be at a high level (e.g., divide and conquer algorithms). The challenges in building a machine like this are probably greater on the software end than the hardware end. We are only starting to understand parallel algorithmic languages like SISAL, Lucid and ID. I say 15 years because it will take a long time for the software to develop. Unfortunately peoples thinking changes much slower than hardware technology, and it is the way that people think that holds back parallel software. 1994 jim sadler YEAR: 1995 Unfortunately, 1 TFLOP doesn't necessarily mean that existing code can be transformed to achieve this rate nor does it mean that all kinds of float operations will run that fast or with the precision we might now expect; I think 1 TFLOP of SUSTAINED computation will happen only for very large versions of the classic "embarrassingly parallel" applications. In other words, the TFLOP measurement will probably become as meaningless as MIPS are now. -hankd YEAR: 1995 I calculate it to be 1993, actually, but then I figured you'd be able to buy a UNIX box for $2000 by 1984. I didn't figure on UNIX getting bigger and on IBM freezing microcomputer O/S development. Let's add 2 years. 1995 --- -- Peter da Silva, Ferranti International Controls Corporation. uicsrd.csrd.uiuc.edu%uxc.cso.uiuc.edu@uxc.cso.uiuc.edu (Steve Turner) YEAR: 1995 Jest a guess... About 7 years for 3 orders of mag may be too soon, but I'd rather err on the early side for a change. YEAR: 1998 -- Tim Olson YEAR: 1998 Hi, Eugene! But... My above answer assumes you include as legal some sort of multi-processor (I don't care, data-flow, lattice, hyper-torus, whatever), since I don't believe we'll reach TFLOP performance in a single CPU any time soon... unless you count *deep* pipelining in the FPU, and I think that's as hard to use in an application program as parallel multi's are. (Besides, really deep pipelining begins to approach "dataflow", so you might as well look at dataflow.) Even that presentation at Ames that Alan Huang gave a while back on the optical computers they're working on at Bell Labs didn't promise super-duper single-CPU speeds. With 5ns around the loop, you only get 200M micro-cycles/second. You have to stack wavefronts, which gets you multiple parallel (slightly out of sync) machines, *not* a faster single CPU. No, I think it will take a combination of two technologies, neither of which will happen without the other, which means that it will take one big project to make it happen (maybe SDI?) rather than some incremental development by independents: 1. Truly massive partially-shared-memory machines -- something like a hyper-Butterfly or a hypercube with a bus for interconnect instead of a serial bus (Intel-style hypercubes are bandwidth-limited) -- to get enough performance. 2. A step jump in compilers, which will optimize/loop_unfold/assign applications to that machine. The notion is to takes Djkstra's notion of a (terminating) program as a single function which transforms the input state of the machine to the output state (assume that the input and output data files fit in main memory), said function being specified in programmatic (imperative) form instead of functional form, and transforming it into (more or less) pure functional form (loop unrolling, convertings loops to recursion, using lazy evaluation, etc.). THEN the functional form is mapped onto the processor array... Anyway, I give it a decade. Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun,attmail}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403 YEAR: 2002 ... SHIP DATE of a non-special-order commercial configuration with sustained teraflops performance. Reasoning: If the next NCUBE puts out 1MFLOPS * 4096 CPUs = 4 GFLOPS If BBN uses the Motorola Hypercard, 256*4 88000's are approx 4GFLOPS Hence 1990 will see 4GFLOPS as "commercial". (Maybe 1989.) Apply the old rule, factor-of-4-every-3-years (Bill Joy notwithstanding). Factor of 256 (1TFLOPS/4GFLOPS = 256) = 4 factors-of-4 = 12 years hence, 1990+12 = 2002. ALTERNATIVE THEORY: (technology, not trend lines): Assume a gallium chip with 500Mhz clock getting sustained 128 MFLOPS. A machine with 8K of these (say, 1K clusters of 8) sounds viable. Do do that we need 100k-400k transistors on a gallium chip. We currently get about 1k, so, we need a factor of 100-400 in density. This should be double-every-year (since it rides silicon's coattails). So, 7..9 years, +1988 = 1995..1997. I stand by both estimation methods, but I give more confidence to the less agressive one. Hence, "1995..2002, likelier the latter". Note that TF-1 clones don't count: anything that won't fit in standard computer rooms doesn't fit my definition of "product". -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science From: amdcad!uunet!mcvax!nikhefk!henkp@ames (Henk Peek) YEAR: 2008 Date: Sun, 5 Jun 88 13:12:32 PDT From: amdcad!henry@Sun.COM (Henry McGilton--Software Products) YEAR: 2010 Hello Again, I can't resist these Delphic Poll type of questions. I think we'll hit the Tera flop before year 2000. I must admit to whistling completely in the dark because I don't have enough data to do the trend curves. Completely off the top of my head though, we hit 10 Mflops around 1966 (the 6600) 100 Mflops around 1975 (the Cyber 203/5) 1000 Mflops around 1986 (the big Crays) Which seems to me we're getting factors of ten every ten years of so. However, I'd predict that new technologies would steepen the development curve. Didn't Winograd come up with a limit based on quantum mechanical limitations? I've forgotten the article. According to Bill Joy, we see 2 ** (year - 1984) MIPS each year. By that equation we'll see 2**16 MIPS (16 TIPS) by year 2000, so 1 Teraflop can't be far behind that. Have fun with the poll. ......... Henry Tom Lasinski 2010 GAM 1998-2000 David Bailey 1995 Marty Fouts 2001-2024 (Never)