kevin@cornell.UUCP (Kevin Karplus) (03/22/85)
Occam is not a completely idiosyncratic language. It is an implementation of CSP (Communicating Sequential Processes), which has been used by theorists for years as a simple, concise notation for discussing concurrent programs. Occam is clearly not a programming language to end all programming languages, but it provides a simple way to talk about communication between processes (somthing NOT provided by C, PASCAL, FORTRAN, PL/I, ...). I've not programmed in Occam, have been considering it seriously. Occam is available cheap to universities (call inMOS for details), and runs on VAXes as well as Transputers. The transputer chip looks ideal for systolic array designs. It lacks enough channels for a hypercube, but as other people have pointed out, it isn't clear that a small processor can handle lots of channels. The transputer does not look very good as a part of a "pseudo-node" of a hypercube (each chip has one edge of the cube, and shares a bus with the other processors at the same node), since it does not have a shared-bus communication port. I'm a little dubious about the value of hypercubes, as most big programs have a 5% to 20% purely serial component. Even doing the parallel part in zero time is only a 5 to 20 times speedup. Any machine that hopes for bigger speedups must have at least one extermely fast serial processor. (Note: this only applies to general-purpose machines. Obviously, certain problems can have the serial part reduced to a tiny fraction.) I like the idea of including a disk drive at every node of a cube. It is incredibly expensive, but many disk drives with a processor for each is the only way to get fast databases. I'm not convinced a cube is the right architecture, though. There are networks with fewer links that are amost as fast, and much cheaper to build. Kevin Karplus [Standard disclaimers]
gottlieb@cmcl2.UUCP (Allan Gottlieb) (03/27/85)
In article <440@cornell.UUCP> kevin@gvax.UUCP (Kevin Karplus) writes: >I'm a little dubious about the value of hypercubes, as most big >programs have a 5% to 20% purely serial component. >(Note: this only applies to general-purpose >machines. Obviously, certain problems can have the serial part reduced >to a tiny fraction.) Have you any data to support this claim? At NYU, our Ultracomputer project has (very extensive) experience with a wide range of important scientific applications and we have NEVER found the serial code to be k% for fixed k. Instead each of these problems is invariably a class of problems parameterized by some "size" variables (often the number of mesh points) and the serial portion approaches 0 as the size increases. Thus, for large enough problems the potential for parallelism can be made arbitrarily large. This raises the question of "how much is enough". That is how big must a problem be for 1000 processors to be used effectively. We have numerious simulation results on this question. The NASA (GISS) "weather code" (i.e. three dimensional atmospheric simulation) when executed using meshes appropriate for an Amdahl V7 or V8 can get high efficiency (above 70%) with a few hundred processors but not thousands. However, when (more desirable from a numerical analysis point of view) meshes separated by about 1 degree of arc are used thousands of processors can be efficiently employed. Thus for this problem class, thousands (but not millions) of processors would be useful. I should note that we parallelized this program without excessive effort using techniques that Kuck (Illinois) and Kennedy (Rice) and their collegeues believe can be done automatically. Perhaps using more sophisticated parallelization techniques or by employing a new algorithm, more processors could be used. I do not believe that just refining the mesh enough to utilize a million processors is justified from a numerical analysis point of view -- but here I am on shakey grounds. Caltech has also reported on many scientific problems (using their real hardware) and again the serial portion drops with problem size. -- Allan Gottlieb GOTTLIEB@NYU {floyd,ihnp4}!cmcl2!gottlieb <---the character before the 2 is an el
henry@utzoo.UUCP (Henry Spencer) (04/02/85)
> >I'm a little dubious about the value of hypercubes, as most big > >programs have a 5% to 20% purely serial component. > >(Note: this only applies to general-purpose > >machines. Obviously, certain problems can have the serial part reduced > >to a tiny fraction.) > > Have you any data to support this claim? There is one very large piece of supporting data: CDC's incredibly expensive failure, the Star-100 supercomputer, which ran vectors very quickly and scalars very slowly. They had hoped for automatic vectorizers that would be good enough to make the thing work, and had also hoped that most problems would vectorize simply and almost completely. No such luck. One of the reasons the Cray-1 was such a success, by comparison, was that the Cray does scalar operations at blazing speed too. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
gottlieb@cmcl2.UUCP (Allan Gottlieb) (04/04/85)
>> >I'm a little dubious about the value of hypercubes, as most big >> >programs have a 5% to 20% purely serial component. >> >(Note: this only applies to general-purpose >> >machines. Obviously, certain problems can have the serial part reduced >> >to a tiny fraction.) >> >> Have you any data to support this claim? > >There is one very large piece of supporting data: CDC's incredibly >expensive failure, the Star-100 supercomputer, which ran vectors very >quickly and scalars very slowly. Vectorizable does not equal parallelizable. We have good evidence that the percentage of inheriently serial code goes down with problem size for a number of important applications. There still remains the question of whether sufficient high cache hit ratios will occur to keep the processor-memory network from being saturated. Also I/O is a serious question and there are others. But I repeat, our simulation data seem to clearly show that there is not this absolute limit on speedup caused by a fixed percentage of inheriently serial code (or at least any such limit is VERY high thousand-fold speedup is definitely possible). -- Allan Gottlieb GOTTLIEB@NYU {floyd,ihnp4}!cmcl2!gottlieb <---the character before the 2 is an el
brooks@lll-crg.ARPA (Eugene D. Brooks III) (04/04/85)
> > >I'm a little dubious about the value of hypercubes, as most big > > >programs have a 5% to 20% purely serial component. > > >(Note: this only applies to general-purpose > > >machines. Obviously, certain problems can have the serial part reduced > > >to a tiny fraction.) > > > > Have you any data to support this claim? > > There is one very large piece of supporting data: CDC's incredibly > expensive failure, the Star-100 supercomputer, which ran vectors very > quickly and scalars very slowly. They had hoped for automatic vectorizers > that would be good enough to make the thing work, and had also hoped that > most problems would vectorize simply and almost completely. No such luck. > One of the reasons the Cray-1 was such a success, by comparison, was that > the Cray does scalar operations at blazing speed too. > -- > Henry Spencer @ U of Toronto Zoology > {allegra,ihnp4,linus,decvax}!utzoo!henry One should be careful about claiming that the failure a machine with a high penalty for scalar processing vs vector processing transfers to MIMD machines in general. Vectorization is a form of parallelism but is a rather restricted and highly regular form. An application may have parallelism that can't be exploited with vectorization. The most attractive parallel processor architectures allow the exploitation of both forms. The idea is to have many cpus sharing memory and give each cpu has vector instructions.