zed@mdbs.uucp (Bill Smith) (10/03/90)
When was (or will) machines with the following performance be available? The main parameters I am interested in is the cost for the initial commercial release for a machine with a given number of available (as opposed to theoretical) floating point operations per second. Date: Cost: Mega FLOP -- 10^6 (MFLOP) Giga FLOP -- 10^9 (GFLOP) Tera FLOP -- 10^12 (TFLOP) Peta FLOP -- 10^15 (PFLOP) Exa FLOP -- 10^18 (EFLOP) When in this history will a programming model be available that will be appropriate for solving problems in any part of this entire 10^12 speed range and will port programs to the next higher level without serious loss of efficiency? William W. Smith sawmill!mdbs!zed [not necessarily my employer's opinions]
gcwilliams@watdragon.waterloo.edu (Graeme Williams) (10/27/90)
In article <1990Oct2.190020.15214@mdbs.uucp> zed@mdbs.uucp (Bill Smith) writes: >When was (or will) machines with the following performance be available? > > Date: Cost: >Mega FLOP -- 10^6 (MFLOP) >Giga FLOP -- 10^9 (GFLOP) >Tera FLOP -- 10^12 (TFLOP) >Peta FLOP -- 10^15 (PFLOP) >Exa FLOP -- 10^18 (EFLOP) The higher two of these are probably unattainable in the forseeable future - except perhaps for very large numbers of processors in parallel. For *SINGLE* processors there is a fundamental restriction on the number of operations that can be performed. Namely : An instruction cannot be executed in a time shorter than the time it takes for a light beam to traverse the processing device. Thus if a single processor were to be capable of executing 10^18 instructions per second - the physical size of the processor would have to be smaller than (3*10^8)/10^18 metres i.e. 0.3 nanometres. Unfortunately 0.3 nanometres is the same order of size as a single molecule - so either the processor is *EXTREMELY SIMPLE* :-) ,or it's not made of molecules! In short 10^18 operations/sec will probably be forever impossible for a single processor, indeed 10^15 probably is too. Anyone for engineering a CPU out of a tiny piece of neutron star ??? Graeme Williams gcwilliams@watdragon.waterloo.edu
dhoyt@vx.acs.umn.edu (10/27/90)
In article <1990Oct26.191032.9099@watdragon.waterloo.edu>, gcwilliams@watdragon.waterloo.edu (Graeme Williams) writes... >In article <1990Oct2.190020.15214@mdbs.uucp> zed@mdbs.uucp (Bill Smith) writes: >An instruction cannot be executed in a time shorter than the time >it takes for a light beam to traverse the processing device. This may or may not be true. It is certianly possible for an 'electron' to pass from once side of a gap to the other side, with no elapsed time. (Actually the electron is on both sides of the gap, until it is measured--meow.) I've not seen anything to make me think that zero momentum computing devices are impossible. Of course I'm not offering any venture capital either. david | dhoyt@vx.acs.umn.edu | dhoyt@umnacvx.bitnet
peter@ficc.ferranti.com (Peter da Silva) (10/28/90)
In article <1990Oct26.191032.9099@watdragon.waterloo.edu> gcwilliams@watdragon.waterloo.edu (Graeme Williams) writes: > Anyone for engineering a CPU out of a tiny piece of neutron star ??? Where is Bob Forward when we need him? -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
henry@zoo.toronto.edu (Henry Spencer) (10/28/90)
In article <2581@ux.acs.umn.edu> dhoyt@vx.acs.umn.edu writes: >>An instruction cannot be executed in a time shorter than the time >>it takes for a light beam to traverse the processing device. > This may or may not be true. It is certianly possible for an 'electron' >to pass from once side of a gap to the other side, with no elapsed time. Um, references please. My recollection is that the transition is *not* instantaneous. Contrary to popular belief, it doesn't even jump the gap without occupying the space between: the notion that it is forbidden to occupy said space is an attempt to apply classical dynamics in an area of severely non-classical behavior. If you are talking about the collapse of wave functions, that is a totally different story. One can argue about whether that phenomenon is "real", but it is inarguable that you *cannot* use it to transmit information, which is what we're talking about. The electron doesn't "pass from one side to the other side"; it starts out on both sides, and then decides which side it's on. If you believe in (a) cause and effect, and (b) special relativity -- both of which are rather solidly rooted in experimental evidence -- then faster-than-light transmission of information is fundamentally impossible. (From a suitable viewpoint, the transmission time would appear to be negative, with cause (transmission) coming after effect (reception).) -- The type syntax for C is essentially | Henry Spencer at U of Toronto Zoology unparsable. --Rob Pike | henry@zoo.toronto.edu utzoo!henry
gcwilliams@watdragon.waterloo.edu (Graeme Williams) (10/28/90)
In article <2581@ux.acs.umn.edu> dhoyt@vx.acs.umn.edu writes: >In article <1990Oct26.191032.9099@watdragon.waterloo.edu>, gcwilliams@watdragon.waterloo.edu (Graeme Williams) writes... >>In article <1990Oct2.190020.15214@mdbs.uucp> zed@mdbs.uucp (Bill Smith) writes: >>An instruction cannot be executed in a time shorter than the time >>it takes for a light beam to traverse the processing device. > > This may or may not be true. It is certianly possible for an 'electron' >to pass from once side of a gap to the other side, with no elapsed time. >(Actually the electron is on both sides of the gap, until it is measured--meow.) True the electron has an uncertainty in it's position courtesy of Quantum Physics, unfortunately Mother Nature prohibits one from making use of this fact to transmit information faster than light - so my original statement stands. (This is closely related to Bell's paradox) >I've not seen anything to make me think that zero momentum computing devices >are impossible. Of course I'm not offering any venture capital either. It occurs to me that perhaps one could construct the heart of a processor, *purely* out of interacting photons - it could be very tiny and everything would be done at light speed. Elegant. Still, "more" is never enough, and "much more" only suffices for a wee while, given a few months with such a device I'd soon want a faster one :-). Graeme Williams gcwilliams@watdragon.waterloo.edu
dhoyt@vx.acs.umn.edu (10/29/90)
In article <1990Oct27.235949.6451@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes... >Um, references please. I'll try to dig some up. I seem to remember an article in scientific american (or another broad scope journal) in the past year or two about a research showing two particles in 'communication' even they were outside of each other's time cone. I believe they were using the exclusion principle. If anyone can refer me to this, it would save me some time in the local library. >If you believe in (a) cause and effect, and (b) special relativity -- Special relativity does not produce the same answers as quantum dynamics. As correct as (general) relativity seems on a macro scale, it fails at the micro scale. What often trips people up is thinking of an electrons (or photons) as a particles, when in fact, they are not. They are waves. (Chemists and physists tend to disaggre with each other here.) The achiral nature of ammonia chemistry is a simple proof of this. (If electrons were truely classical (or relativistic) one would expect ammonia to be pyramid shapped, and thus show handed or chiral geometry, which would show up as sterospecificy. Ammonia chemistry does not show sterospecificy, thus is achiral, thus not pyrimid shaped, but rather planar.) In the ammonia ion the 'reactive' electron is a wave form that exists in a cloud both above the plane and below the plane of the ion at the same time. The electron will never exist anywhere between the two clouds. As the wave form exists both above and below the clouds other molecules are free to react with either side of the ion, which will fix the electron wave form to one side of the ion. If the 'electron' actually moved, there would have to be a conservation of momentum, which would mean that new molecule would have to move away from the reaction site, even if the sum of the momentum of the two ions were zero. Which would require the total energy in an adiabadic (closed) system to increase. Which would break the second law of thermodynamics. Which would would make things very hot indeed. Hopefully more later. david | dhoyt@vx.acs.umn.edu | dhoyt@umnacvx.bitnet
rbw00@ccc.amdahl.com ( 213 Richard Wilmot) (10/29/90)
gcwilliams@watdragon.waterloo.edu (Graeme Williams) said in article <1990Oct26.191032.9099@watdragon.waterloo.edu> : > Namely : > An instruction cannot be executed in a time shorter than the time > it takes for a light beam to traverse the processing device. > Thus if a single processor were to be capable of executing 10~18 > instructions per second - the physical size of the processor would > have to be smaller than (3*10~8)/10~18 metres i.e. 0.3 nanometres. > Unfortunately 0.3 nanometres is the same order of size as a single > molecule - so either the processor is *EXTREMELY SIMPLE* :-) ,or it's > not made of molecules! In short 10~18 operations/sec will probably > be forever impossible for a single processor, indeed 10~15 probably > is too. While it is true that an instruction cannot be EXECUTED in less time than light can traverse the computing device, nFLOPS are measures of THROUGHPUT. We can improve throughput by some amount via pipelining of the execution so that each section of a computing device performs only part of the work for the instruction. After an initial delay to fill the pipeline we could produce a result every 1/mth second where this time between results < time for light to traverse the whole CPU. Some people call this parallel computation and it is a form of parallelism. Nearly all processors produced today are pipelined. Vector units are usually (always ?) implemented with pipelining but they could include true multiple execution units (e.g. adding 4 vector elements to 4 others at the same time) to further increase throughtput. I am usually more interested in bandwidth/throughput than the latency of any individual instruction in a stream. -- Dick Wilmot | I declaim that Amdahl might disclaim any of my claims. (408) 746-6108
brooks@maddog.llnl.gov (Eugene Brooks) (10/29/90)
In article <2588@ux.acs.umn.edu> dhoyt@vx.acs.umn.edu writes: > I'll try to dig some up. I seem to remember an article in scientific >american (or another broad scope journal) in the past year or two about a >research showing two particles in 'communication' even they were outside of >each other's time cone. I believe they were using the exclusion principle. If >anyone can refer me to this, it would save me some time in the local library. Two particles in "correlation," not "communication." Just because two coins come up "even," that is one is heads if the other is tails, does not mean that you can use them to send information. The electron, or any other event which could carry information, can not travel faster than light even through quantum effects. Quantum Field Theory is fully consistent with special relativity. brooks@maddog.llnl.gov, brooks@maddog.uucp
brooks@maddog.llnl.gov (Eugene Brooks) (10/29/90)
In article <2588@ux.acs.umn.edu> dhoyt@vx.acs.umn.edu writes: >Special relativity does not produce the same answers as quantum dynamics. As >correct as (general) relativity seems on a macro scale, it fails at the micro >scale. What often trips people up is thinking of an electrons (or photons) as >a particles, when in fact, they are not. They are waves. (Chemists and physists >tend to disaggre with each other here.) The achiral nature of ammonia Electrons and Photons are not either (classical) particles or (classical) waves. They posses features of both, however. If you want to know what an electron is don't ask a Chemist! brooks@maddog.llnl.gov, brooks@maddog.uucp
djh@hawkeye.osc.edu (David Heisterberg) (10/29/90)
In article <2588@ux.acs.umn.edu> dhoyt@vx.acs.umn.edu writes: > The achiral nature of ammonia >chemistry is a simple proof of this. (If electrons were truely classical (or >relativistic) one would expect ammonia to be pyramid shapped, and thus show >handed or chiral geometry, which would show up as sterospecificy. Ammonia >chemistry does not show sterospecificy, thus is achiral, thus not pyrimid >shaped, but rather planar.) This doesn't really have anything to do with computer architecture, but your ammonia analysis is wholly incorrect. The ground state of ammonia is pyramidal. It has C3v symmetry so of course there's no optical activity. It also has a fairly low inversion barrier, as do some umbrellas - the macroscopic analog. Further, relativity certainly need not be considered explicitly for ammonia. Well, one could say it is because of high performance computer architectures that one is able to do detailed quantum calculations to determine ammonia's geometric and electronic structure. -- David J. Heisterberg djh@osc.edu And you all know The Ohio Supercomputer Center djh@ohstpy.bitnet security Is mortals' Columbus, Ohio 43212 ohstpy::djh chiefest enemy.
dhoyt@vx.acs.umn.edu (10/29/90)
In article <2588@ux.acs.umn.edu>, dhoyt@vx.acs.umn.edu writes... > If the 'electron' actually moved, there would have to be I don't know where my head was on this argument, the momentum could be handled by adding to the vibrational state. In any case, this really doesn't have anything to do with comp.arch. Can we move this to private mail, Henry? david | dhoyt@vx.acs.umn.edu
omtzigt-theo@cs.yale.edu (Erwinus Theodorus Leonardus Omtzigt (Theo)) (10/30/90)
>In article <1990Oct2.190020.15214@mdbs.uucp> zed@mdbs.uucp (Bill Smith) writes: > >Anyone for engineering a CPU out of a tiny piece of neutron star ??? > >Graeme Williams >gcwilliams@watdragon.waterloo.edu I tried, but lost my glasses due to the enormous gravity! Theo -- InterNet: omtzigt-theo@cs.yale.edu mail: Becton Center, 15 Prospect Str Phone: (203) 432 - 4261 Yale University, Dept. of EE
herrickd@iccgcc.decnet.ab.com (10/30/90)
In article <2590@ux.acs.umn.edu>, dhoyt@vx.acs.umn.edu writes: > In article <2588@ux.acs.umn.edu>, dhoyt@vx.acs.umn.edu writes... >> If the 'electron' actually moved, there would have to be > I don't know where my head was on this argument, the momentum could be > handled by adding to the vibrational state. In any case, this really doesn't > have anything to do with comp.arch. Can we move this to private mail, Henry? > > david | dhoyt@vx.acs.umn.edu I'm just a spectator, but I was enjoying it. dan herrick herrickd@astro.pc.ab.com
gcwilliams@watdragon.waterloo.edu (Graeme Williams) (10/30/90)
In article <1dbs02b=035v01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com ( 213 Richard Wilmot) writes: >gcwilliams@watdragon.waterloo.edu (Graeme Williams) said in article ><1990Oct26.191032.9099@watdragon.waterloo.edu> : > >> An instruction cannot be executed in a time shorter than the time >> it takes for a light beam to traverse the processing device. > >While it is true that an instruction cannot be EXECUTED in less time >than light can traverse the computing device, nFLOPS are measures of >THROUGHPUT. I don't follow this at all - for a *given* processor is not THROUGHPUT determined by how fast the processor executes ?? >We can improve throughput by some amount via pipelining of >the execution so that each section of a computing device performs only >part of the work for the instruction. After an initial delay to fill the >pipeline we could produce a result every 1/mth second where this time >between results < time for light to traverse the whole CPU. Sure you can do some premliminary work on an instruction using pipelining techniques - but the bottom line is that the EXECUTION of an instruction changes the state of the processor (e.g. a register) and that particular part of the processor involved has a finite physical size and hence a maximum rate at which it can be changed. No amount of pipelining can change this fact. Of course one can do parallel execution (if your algorithm permits) but then you're no longer using a *SINGLE* processor/execution unit are you? Graeme Williams gcwilliams@watdragon.waterloo.edu
rbw00@ccc.amdahl.com ( 213 Richard Wilmot) (11/08/90)
In article <1990Oct29.205812.1641@watdragon.waterloo.edu> gcwilliams@watdragon.waterloo.edu (Graeme Williams) said: > In article <1dbs02b=035v01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com ( 213 Richard Wilmot) writes: > >gcwilliams@watdragon.waterloo.edu (Graeme Williams) said in article > ><1990Oct26.191032.9099@watdragon.waterloo.edu> : > > >> An instruction cannot be executed in a time shorter than the time > >> it takes for a light beam to traverse the processing device. > > >While it is true that an instruction cannot be EXECUTED in less time > >than light can traverse the computing device, nFLOPS are measures of > >THROUGHPUT. > > I don't follow this at all - for a *given* processor is not THROUGHPUT > determined by how fast the processor executes ?? Throughput is determined by how fast the processor executes but not by the time for an instruction to traverse all the pipeline stages. At any time there are usually several instructions in the process of executing in different pipeline stages. It's like an auto assembly line. It might take two days for what will be a car to move from start to finish (execution time for 1 car) but hundreds of cars are finished each hour (that's throughput). Instead of autos we assemble instruction results. > >We can improve throughput by some amount via pipelining of > >the execution so that each section of a computing device performs only > >part of the work for the instruction. After an initial delay to fill the > >pipeline we could produce a result every 1/mth second where this time > >between results < time for light to traverse the whole CPU. > > Sure you can do some premliminary work on an instruction using > pipelining techniques - but the bottom line is that the EXECUTION of > an instruction changes the state of the processor (e.g. a register) > and that particular part of the processor involved has a finite > physical size and hence a maximum rate at which it can be changed. > No amount of pipelining can change this fact. > > Of course one can do parallel execution (if your algorithm permits) > but then you're no longer using a *SINGLE* processor/execution unit > are you? > I'm not always too sure what is meant by a *single* processor. In some pipelined machines the *state* of the machine does not have a simple intuitive definition. *A*register* may have several different values in several different stages of the pipeline. In fact, one of the more interesting aspects of pipelining is that more than one iteration of a loop might be 'in execution' in different pipeline stages at the same time. So the value of register GR3 being loaded in iteration 2 of a loop might be different than the one that was loaded in iteration 1 of the loop if the loop is small enough for more than one iteration to fit into the pipeline. Things become a bit tricky when you try to provide 'precise' interrupts/traps. If GR3 is used as a divisor and is 0 in one iteration of a loop then we will have to present back to the software an image that is consistent with a single machine state despite the fact that while the machine was running it never had a single state. Pipelines are today more common than not (e.g. 80486, most SPARC, MIPS, S/390, etc.). Things are even more complex when designers begin using multiple instruction issue strategies as did the IBM RS6000 designers, where, as often as possible, two instructions are issued in each clock cycle (I understand that, in that implementation, one must be an integer instruction and the other floating point in order to exploit the double issue). I doubt that two is the upper limit on this technique. So, while the two instructions that began at address 44000 are executing, what is the program counter (PC). PC was usually thought of as part of the machine state AND IT WAS SINGLE VALUED. Back to the speed of light as a gating factor to computing throughput. If we can complete an instruction on each cycle and there are 6 stages in our pipeline then no stage can complete its work faster than light can traverse that stage. If this IS a restriction then the throughput will be as if the whole computer were squeezed into the size of one stage, 1/6 of its actual path. We will get a new result EVERY clock and 1 clock will also be the signal traversal time through one stage. This STAGE business is only for the designers' convenience. It is somewhat artificial and may not be a limiting factor. Consider a systolic array that we might use for sorting. The array is composed of a number of comparators, each of which will put the lower of its 2 input values out on its left output and the other out on its right output: I1 I2 I3 I4 \ / \ / \ / \ / C C | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \/ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | C C / \ / \ / \ / \ O1 O2 O3 O4 We may be able to clock the values being sorted between levels of comparators so that at each comparator at each clock period the two inputs become outputs. This can easily be extended from the 2 levels shown [I hope] above to N levels. We will then have N results each cycle with another N results 'in the mill' right behind. One might call each level of comparators a 'stage' and we are clearly operating the comparators in parallel. But maybe the 'stages' above do not have to be a limiting factor. Instead of a single master clock for all comparators, we could include some latching so that a comparator which has finished its prior pair can accept the next pair as soon as they are ready as signaled by latches set by the upstream comparator . Such self-clocking is more reputed to be more difficult than deterministic clocking. It would, however, be quite applicable to comparators which can often finish early (ier than the worst case which must be allowed for in a march step clock). The sorting array was only meant for an easy to visualize example. In some far future designs we might be able to succeed at more complex, less regular designs which could retain some of the wave-of-solution advantages of the systolic arrays but solve less regular, more general problems as found in the usual computer instructions. In such a machine the TIME TO EXECUTE THE FIRST INSTRUCTION MIGHT BE RELATIVELY LONG, but thereafter we might get the results of subsequent instructions in a cascade with very little separation or perhaps in parallel, and maybe even out of order where it doesn't matte r. So I challenge the posting of the speed of light as a THROUGHPUT limit even though it does limit the execution of a single instruction from start to finish. More passengers instead of faster vehicles. -- Dick Wilmot | I declaim that Amdahl might disclaim any of my claims. (408) 746-6108