gnu@hoptoad.uucp (John Gilmore) (12/02/87)
> Another trend... ...is that towards individual > (single-user) computers. The future of multi-tasking on such machines is > very much in question... > -- > Frank Adams ihnp4!philabs!pwa-b!mmintl!franka > Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108 Does anybody seriously believe this? I don't. All the little single chip micros in microwave ovens are running realtime code; depending on internal structure, this could be single threaded polling, one task with lots of interrupts, or many tasks. 99% of the rest of the computers will be "general purpose" and all of them will be running multitasking. Whether they call it "TSR and steal interrupts" or they call it "fork()" or "multifinder", it will be multitasking. Mr. Adams, who works at a major MSDOS software house, may be confused by the fact that on MSDOS you have to write your own scheduler and task switching to get multitasking. That hasn't stopped everyone from doing it, given the obvious benefits of multitasking; it just makes it 30 times as hard. -- {pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu gnu@toad.com "Watch me change my world..." -- Liquid Theatre
dave@sdeggo.UUCP (David L. Smith) (12/03/87)
In article <3445@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes: franka@mmintl writes: > > Another trend... ...is that towards individual > > (single-user) computers. The future of multi-tasking on such machines is > > very much in question... > 99% of the rest of the computers will be "general purpose" and all of > them will be running multitasking. I agree with John, however the ideal towards which we are striving is multi- tasking operating systems and single-tasking processors. Admit it, don't you hate having your processor multi-task? -- David L. Smith {sdcsvax!man,ihnp4!jack!man, hp-sdd!crash, pyramid}!sdeggo!dave man!sdeggo!dave@sdcsvax.ucsd.edu The net.goddesses made me do it!
eugene@pioneer.arpa (Eugene Miya N.) (12/03/87)
In article <147@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes: >Admit it, don't you hate having your processor multi-task? Should should have typed a ;-). Sometimes, I just love forking 24 processes into the background (doing serious work) on one of our Sequents. Sometimes I complain when the load average gets over 0.15 ;-). I sometimes have to do the same thing on our VAX, (only fewer processes). I have come to the conclusion that these boxes are the way to go instead of too-loosely coupled workstations (you should have good graphical front-end, however...). From the Rock of Ages Home for Retired Hackers: --eugene miya, NASA Ames Research Center, eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {uunet,hplabs,hao,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene
daveh@cbmvax.UUCP (12/03/87)
in article <147@sdeggo.UUCP>, dave@sdeggo.UUCP (David L. Smith) says: > In article <3445@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes: > franka@mmintl writes: > > > Another trend... ...is that towards individual > > > (single-user) computers. The future of multi-tasking on such machines is > > > very much in question... > > 99% of the rest of the computers will be "general purpose" and all of > > them will be running multitasking. > > I agree with John, however the ideal towards which we are striving is multi- > tasking operating systems and single-tasking processors. Admit it, don't > you hate having your processor multi-task? I'd rather have my processor multi-task, than sit in idle most of the time, as it's pretty much doing right now, with just a terminal program, a static CLI, some dormant background tasks, and a performance monitor running. At least given the choice of multitasking vs. not multitasking, given the reality of a single processor system. I would object to allowing others to steal time from my CPU, even if I'm not doing anything with it at the moment. And I certainly wouldn't mind at all if I could drop in an additional CPU and have the tasks distributed in a reasonable way over the two. But it'll be some time before I have a processor available for each individual task, so basically, I'm really happy that I can let my processor multitask. > David L. Smith -- Dave Haynie Commodore-Amiga Usenet: {ihnp4|uunet|rutgers}!cbmvax!daveh "The B2000 Guy" PLINK : D-DAVE H BIX : hazy "I can't relax, 'cause I'm a Boinger!"
dan@rose3.Rosemount.COM (Dan Messinger) (12/04/87)
In article <3445@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes: >> Another trend... ...is that towards individual >> (single-user) computers. The future of multi-tasking on such machines is >> very much in question... >Does anybody seriously believe this? I don't. I don't either. Single user workstations are definitely becoming more popular. But single tasking? Never. Who wants to tie up their computer just to get a listing? And I frequently will find other things to do while my programs are compiling (Rogue, if nothing else :-). And then their is network activity or uucp. Multitasking is a necessity! So it should be part of the operating system instead of kludged into the applications (e.g., the print spooler in MS-DOS) Dan Messinger
peter@sugar.UUCP (Peter da Silva) (12/05/87)
In article <147@sdeggo.UUCP>, dave@sdeggo.UUCP (David L. Smith) writes: > I agree with John, however the ideal towards which we are striving is multi- > tasking operating systems and single-tasking processors. Admit it, don't > you hate having your processor multi-task? Hello, no. Until you have something incredible like 64000 processors in your box, you're going to find yourself running out of the suckers. If you have a connection machine on your desk, great! If you've got a transputer, you lose. What happens when you want to start up that 5th or 9th program and you've only got 4 or 8 CPUs? -- -- Peter da Silva `-_-' ...!hoptoad!academ!uhnix1!sugar!peter -- Disclaimer: These U aren't mere opinions... these are *values*.
dave@sdeggo.UUCP (David L. Smith) (12/07/87)
In article <1227@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes: >In article <147@sdeggo.UUCP>, dave@sdeggo.UUCP (David L. Smith) writes: >>I agree with John, however the ideal towards which we are striving is multi- >>tasking operating systems and single-tasking processors. Admit it, don't >>you hate having your processor multi-task? > >Hello, no. Until you have something incredible like 64000 processors in your >box, you're going to find yourself running out of the suckers. If you have >a connection machine on your desk, great! If you've got a transputer, you >lose. What happens when you want to start up that 5th or 9th program and >you've only got 4 or 8 CPUs? Then one of your processors starts to multi-task. Of course you never have enough processors to handle everything, but it would be nice if you did. It's an _ideal_. The question was "Don't you hate having your processor multi-task" and the answers have been "No, because it has to." If it didn't have to would you want it to? Timesharing is a necessary evil, and we use it because we have finite resources. The trend has been to increase these resources and at the point where they become effectively infinite (if you have, say 256 processors in a work station with a _very_ fast I/O system) the number of times time-sharing would come into play would be very few. The same arguments apply to virtual memory. Do you like swapping, paging and the rest of those fun things? I don't, but I have to live with them because I can't afford 4 gigabytes of RAM. (Or 4 gig of disk for that matter). Just because it's necessary doesn't mean you have to like it. -- David L. Smith {sdcsvax!man,ihnp4!jack!man, hp-sdd!crash, pyramid}!sdeggo!dave man!sdeggo!dave@sdcsvax.ucsd.edu I don't need no steenkin' quote!
mmengel@cuuxb.UUCP (12/07/87)
In article <151@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes: $In article <1227@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes: $ >In article <147@sdeggo.UUCP>, dave@sdeggo.UUCP (David L. Smith) writes: $ >>I agree with John, however the ideal towards which we are striving is multi- $ >>tasking operating systems and single-tasking processors. Admit it, don't $ >>you hate having your processor multi-task? $ > $ >Hello, no. Until you have something incredible like 64000 processors in your $ >box, you're going to find yourself running out of the suckers. If you have $ >a connection machine on your desk, great! If you've got a transputer, you $ >lose. What happens when you want to start up that 5th or 9th program and $ >you've only got 4 or 8 CPUs? $ $Then one of your processors starts to multi-task. Of course you never have $enough processors to handle everything, but it would be nice if you did. $It's an _ideal_. $ $The question was "Don't you hate having your processor multi-task" and the $answers have been "No, because it has to." If it didn't have to would you $want it to? Timesharing is a necessary evil, and we use it because we have $finite resources. We also use time sharing because even on existing time sharing systems, there is often a noticable percentage of *idle time*, when every single process on the system is waiting for i/o to complete. This is even true on *IX systems with disk cache hit rates of 80 and 90%. The truth is, on systems like sun workstations and the lot, if you take some process accounting statistics while you are running your c compile in background, and rogue in foreground, with emacs running the whole show, you still have easily 20% idle time on the cpu, even with little or no paging activity. Before you decide that your system isn't as fast as you'd like it to be because its multi-tasking, try looking at some system activity statistics for your system. -- Marc Mengel attmail!mmengel ...!{moss|lll-crg|mtune|ihnp4}!cuuxb!mmengel
throopw@xyzzy.UUCP (12/07/87)
> peter@sugar.UUCP (Peter da Silva) > Until you have something incredible like 64000 processors in your > box, you're going to find yourself running out of the suckers. Note that argument that "we don't need to multiplex a single processor because we can simply throw more processors at the problem" is the same argument as "we don't need to have demand paging because we can simply buy more and more memory". I think that both are incorrect arguments, for exactly the reasons that Peter points out: > What happens when you want to start up that 5th or 9th program and you've > only got 4 or 8 CPUs? Or, what happens when you want to keep N processes handy at a convenient point in mid-execution, when they total to M pages of address space and you've only got (M-1) pages of physical memory? I think virtual processors are just as important as virtual memory. And I think virtual memory is *very* important. The fact that many PCs throw these things away is, I think, a Bad Thing. -- "Solve THIS!" "'Mighty Mouse is a wimp.' But what does it mean?" "It means you've made me MAD!" --- Mighty Mouse to alien brain. -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
greg@xios.XIOS.UUCP (Greg Franks) (12/08/87)
#> Another trend... ...is that towards individual #> (single-user) computers. The future of multi-tasking on such machines is #> very much in question... #Does anybody seriously believe this? I don't. (nor I....) Perhaps the real trend is to single tasks on single computers, but *lots* of computers in one box. Now if Mr. Tramiel can get his Atari-as-Transputer going maybe we'll actually see this concept. -- Greg Franks XIOS Systems Corporation, 1600 Carling Avenue, (613) 725-5411 Ottawa, Ontario, Canada, K1Z 8R8 utzoo!dciem!nrcaer!xios!greg "There's so much to sea in Nova Scotia"
throopw@xyzzy.UUCP (Wayne A. Throop) (12/10/87)
> mmengel@cuuxb.ATT.COM (Marc W. Mengel) >$ dave@sdeggo.UUCP (David L. Smith) >$ [...] you never have >$enough processors to handle everything, but it would be nice if you did. >$It's an _ideal_. True enough. >$The question was "Don't you hate having your processor multi-task" and the >$answers have been "No, because it has to." If it didn't have to would you >$want it to? That wasn't the question I heard, though the difference is subtle. The question I heard, implied strongly by the title and the discussion of neglecting the capability for multitasking in PC OS software, is: "Don't you hate having your processor *capable* of multitasking?" or, as I'd further paraphrase it "Don't you find the notion of 'virtual processors' useless, or at least wasteful?" And my answer is "No, because my resources are finite, and I'd rather have my capabilities degrade at the edges rather than chop off. It's no more useless or wasteful than virtual memory." >$Timesharing is a necessary evil, and we use it because we have >$finite resources. Agreed, but the evil is necessary even on "single user" workstations. > Before you decide that your system isn't as fast as you'd like it > to be because its multi-tasking, try looking at some system activity > statistics for your system. Amen. -- Sometimes I think the only universal in the computing field is the fetch-execute cycle. --- Alan J. Perlis -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
throopw@xyzzy.UUCP (Wayne A. Throop) (12/10/87)
Oops, I just realized that I simply repeated the Q&A Dave complained of: > throopw@xyzzy.UUCP (Wayne A. Throop) >>$ dave@sdeggo.UUCP (David L. Smith) >>$The question was "Don't you hate having your processor multi-task" and the >>$answers have been "No, because it has to." If it didn't have to would you >>$want it to? Sorry about that. But to answer the question directly: Clearly not. "If your program didn't have to do IO, would you want it to?" "If your OS didn't have to consume N Mb of disk space, would you want it to?" Anyhow, I view the question phrased this way a little unrealistic. Let me rephrase it again: "Other things being equal (same #processors, same #bucks of hardware), would you rather do without virtual memory or processors, or would you rather pay the extra bucks of hardware/software to support it for your personal workstation?" For a toy computer for non-essentials, I'd answer that I'd want it cheaper. For a computer for getting real work done, for real, essential-to-daily-business computing, I'd answer that I'd want the extra capability. And the reason is that a toy/non-essential computer can afford to have its capability chop off under load, but something essential to getting work done must degrade gracefully. -- Sometimes I think the only universal in the computing field is the fetch-execute cycle. --- Alan J. Perlis -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
jcz@sas.UUCP (John Carl Zeigler) (12/10/87)
It might be informative to remember why multi-tasking was invented in the first place. Back in the 60's Mr Warbucks wandered down to the Data Processing Department to see what kind of trouble that particular budgetary sink-hole was getting into. He wandered into the Computer Room and spied a Computer Programmer with a couple cases of cards turning dials and flipling switches. "Hi, what are you doing," he asked. "Oh, Hello Mr. Warbucks. I am testing our new Social Security Accounting and Collating System. Here, watch this." With that he flipped one of the switches and a machine began spewing cards into a box. After a few hundred cards had dropped, one popped out into a slot. "See, here is your card," explained the Programmer. "That's nice, but what does that light mean?" he said, pointing to a warm red light on the front panel. "That's the WAIT light, It goes on whenever the CPU is idle." "But, it NEVER went off!" "Right! I have made this program so fast that the light doesn't have time to stop glowing enough for you to see it." "You mean for most of the time the machine wasn't doing anything!" "Well, that is one way to look at it." "Why couldn't it have been doing something else???? I am not going pay a hundred times your salary a year just to have the damn thing WAITing!!!! Get that fixed!" Seriously. I do not mind having my computer doing something else while I am thinking, or while the disk drive is thinking, either. Most applications are NOT CPU bound. Most are I/O bound. The overhead of task switching will always be paid for by the increased utility of the machine. (not forgetting that L = yW, (y is upside down)) Until we get a better architectural paradigm than the current 'Von Neuman Bottle-Neck', that is. Any discussion on that?? -- --jcz John Carl Zeigler SAS Institute Inc. Cary, NC 27511 (919) 467-8000 ...!mcnc!rti!sas!jcz
ram%shukra@Sun.COM (Renu Raman, Sun Microsystems) (12/10/87)
Digressing away from the tasking issue a bit - how long will uni-processor machines keep parallel processors at bay? Looks like saturation of HW techology is nowhere near. --------------------- Renukanthan Raman ARPA:ram@sun.com Sun Microsystems UUCP:{ucbvax,seismo,hplabs}!sun!ram M/S 5-40, 2500 Garcia Avenue, Mt. View, CA 94043
fouts@orville.nas.nasa.gov (Marty Fouts) (12/11/87)
In article <36083@sun.uucp> ram@sun.UUCP (Renu Raman, Sun Microsystems) writes: > > Digressing away from the tasking issue a bit - how long will > uni-processor machines keep parallel processors at bay? Looks > like saturation of HW techology is nowhere near. > Well, as Gene Amdahl would say, it's not that simple. Parallel processing, at least in the sense of throwing multiple processors at a single problem is currently difficult to use. Software is hard to write, easy to get wrong and nearly impossible to debug. As long as this stays true, parallel processor machines will have a burden beyond price/performance to overcome before they are accepted. As far as parallel processors for traditional multiprocessing, the extra burden that multiple compute engines add to i/o facilities like disk drives is such that they don't make much sense except for in very high end machines. I still remember the PDP 11/782 (dual processor 11/780 used as a master/slave system) which ran i/o bound workloads slower than an 11/780 because of the introduction of processor control overhead without upscaling i/o performance. Saturation of usable HW technology is close.
eugene@pioneer.arpa (Eugene Miya N.) (12/11/87)
In article <18@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes: >In article <36083@sun.uucp> ram@sun.UUCP (Renu Raman, Sun Microsystems) writes: >>how long will uni-processor machines keep parallel processors at bay? > >Well, as Gene Amdahl would say, it's not that simple. Parallel >processing, at least in the sense of throwing multiple processors at a >single problem is currently difficult to use. . . . >Saturation of usable HW technology is close. Merry Christmas, Renu: Oh, the "how long small can you make a Cray?" or Dan Ingall's "Cray on a wrist idea? Well, 1) we have some parallel processors now. The most effective ones are in loosely coupled, lightly stressed situations. Just look at IBMs, Univacs, etc. we've had them for a long time (but distinct process (jobs) for processors). I think the Sequent/Flex/et al way is the way to currently build machines, not just workstations (not enough power), but you have to make it cheaper. Marty, notes all the problems of other types. 2) Since Marty didn't answer the question, I would hazard a generic guess (I don't know anything) of about 10 more years (far enough into the future?). There is a Rand Corp report by Sutherland, Meade, and Everhart on the limitations of silicon (there are other similar reports) which chronicles about 9 generations of Silicon. They were "3rd gen" back in 1976 when they wrote it, and we are sort of "5-6 gen" by their comments. After all, how fast can you push the 68K ;-)? I would just hate to have some life-saving application which needs just that little bit more power that silicon could not provide. If you believe in the infinite power of computing, then you might be able to tell me when we have a sequential 100 Teraflops machine. All comments grossly over-generalized to answer this general question. From the Rock of Ages Home for Retired Hackers: --eugene miya, NASA Ames Research Center, eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {uunet,hplabs,hao,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene
fay@encore.UUCP (Peter Fay) (12/12/87)
In article <18@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes: >In article <36083@sun.uucp> ram@sun.UUCP (Renu Raman, Sun Microsystems) writes: >> >> Digressing away from the tasking issue a bit - how long will >> uni-processor machines keep parallel processors at bay? Looks >> like saturation of HW techology is nowhere near. >> > >Well, as Gene Amdahl would say, it's not that simple. Parallel >processing, at least in the sense of throwing multiple processors at a >single problem is currently difficult to use. In regard to general-purpose multiprocessors: Every fork() (or thread_create() in Mach) in every program can get scheduled on a different cpu (that includes every shell per user, daemon, background task, ... Also, don't forget all those kernel processes (or tasks, threads) running on different cpus (pagers, signal handlers, ...). How difficult is it when the O.S. does it transparently? And then there is more sophisticated mechanisms ("micro-tasking", gang scheduling, vectorizing fortran compilers) available to any user who wants more capability. >...Software is hard to >write, easy to get wrong and nearly impossible to debug... Writing software which exploits the FULL parallelism of a machine MAY be hard to do in CERAIN cases. Otherwise the (user) software is pretty much identical. Debugging is a whole other soapbox, but my experience is that debugging coding errors is not much more difficult than uniprocessors. What is hard (or "impossible" with current tools) is detecting race conditions and bottlenecks - i.e. CONCEPTUAL errors. This is one of the many time lags in parallel software tool development, not an inherent defect in architecture. Race conditions are not a common occurence for users to debug. > ...As long as >this stays true, parallel processor machines will have a burden beyond >price/performance to overcome before they are accepted. > Since this is not true, the only thing to overcome is common prejudice. Price/performance advantage (for small multiprocessor minis and up) is huge. >As far as parallel processors for traditional multiprocessing, the >extra burden that multiple compute engines add to i/o facilities like >disk drives is such that they don't make much sense except for in very >high end machines... I/O can be a bottleneck for PC's, minis and Crays. The solution is to parallelize I/O, which is what multiprocessors do. (General-purpose multis are NOT "very high-end" -- they run from $70K - $750K). > ...I still remember the PDP 11/782 (dual processor >11/780 used as a master/slave system) which ran i/o bound workloads >slower than an 11/780 because of the introduction of processor control >overhead without upscaling i/o performance. That was a long time ago in multiprocessor history. All I/O was done by one CPU (master/slave) - sequential, not parallel. It was NOT a symmetric multi like those today. >Saturation of usable HW technology is close. What does this cliche mean? By the way, I don't fault people for not understanding about multis (e.g., from past exposure or school). It takes some time for common misconceptions to catch up to current reality. -- peter fay fay@multimax.arpa {allegra|compass|decvax|ihnp4|linus|necis|pur-ee|talcott}!encore!fay
lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (12/15/87)
Multitasking gives flexibility, which is considered virtuous. Interestingly, there are cases when you get a performance win by simulating a multiprocessor on a uniprocessor. For example, suppose you are searching a big tree of choices while solving a Travelling Salesman type problem. This has the nice property that there is variance in the individual compute times. So, if you have a (simulated) multiprocessor, some of the computations finish earlier than their peers. This gives the overall search new data (about lower bounds) which may allow you to kill some of the computations that haven't finished yet. This can save work, not only directly, but also by preventing the killed searches from generating offspring. -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science
fouts@orville.nas.nasa.gov (Marty Fouts) (12/17/87)
In article <2341@encore.UUCP> fay@encore.UUCP (Peter Fay) writes: >In regard to general-purpose multiprocessors: > >Every fork() (or thread_create() in Mach) in every program can get >scheduled on a different cpu (that includes every shell per user, >daemon, background task, ... Also, don't forget all those kernel >processes (or tasks, threads) running on different cpus (pagers, >signal handlers, ...). How difficult is it when the O.S. does it >transparently? Mr. Fay is using transparently in a way with which I am unfamilar. It is true that Mach provides primitives which allow the programmer to introduce multitasking into the program; but these are in no sense transparent. Task creation and deletion, synchronization, and task scheduling all require explict code in the program which is to take advantage of the tasking. Even the ancient Unix fork() has to be explicitly coded for. > >And then there is more sophisticated mechanisms ("micro-tasking", gang >scheduling, vectorizing fortran compilers) available to any user who >wants more capability. > A problem with these more sophisticated mechanisms is that they can lead to parallel execution in which the wall clock time goes up as a function of the number of processors, rather than down. . . Another problem is that no compiler can perfectly optimize. The better the programmer understands what the optimizer can optimize, the easier it is to write optimizable code. With vector code, this is fairly straightforward to do, with parallel code, it is much more difficult. >Writing software which exploits the FULL parallelism of a machine MAY >be hard to do in CERAIN cases. It has been my experience in five years of writing code to exploit distributed and concurrent parallelism that it is hard in a large number of cases. There are well behaved algorithms, such as those involved in PDE solution for which parallelism is trivial. There are also algorithms requiring high communication cost, much synchronization and unpredictable work per task which are nearly untractable. There appear to be an entire class of algorithms for which no parallel solution is more efficent than a sequential solution on the same processor. >Debugging is a whole other soapbox, but my >experience is that debugging coding errors is not much more difficult >than uniprocessors. What is hard (or "impossible" with current tools) >is detecting race conditions and bottlenecks - i.e. CONCEPTUAL errors. >This is one of the many time lags in parallel software tool development, >not an inherent defect in architecture. Race conditions are not a >common occurence for users to debug. > I will not quible over the defects inherent nature, only agree that it is a most difficult problem, and one which decades of parallel processing research has not solved adequately. >Price/performance advantage (for small multiprocessor minis and up) is >huge. > Again, this is only true for well behaved algorithms. Throwing a parallel machine at an average workload is like throwing a vector machine at it, only more so: The difference between vendor 'not to be exceeded' speed for the machine and delivered throughput for the workload can be two order of magnitude, dramatically reducing the real price / performance. >I/O can be a bottleneck for PC's, minis and Crays. The solution is to >parallelize I/O, which is what multiprocessors do. (General-purpose >multis are NOT "very high-end" -- they run from $70K - $750K). > Oddly enough, the Cray 2 doesn not solve the I/O bottleneck problem by providing parallel I/O processors. It is one of the best I/O balanced multiprocessor machines, and accomplishes this with a single I/O (foreground) processor, performing work for all four compute (background) processors. The factor isn't how the scaling is achieved, but that it frequently isn't achieved. >> ...I still remember the PDP 11/782 ... > >That was a long time ago in multiprocessor history. All I/O was done >by one CPU (master/slave) - sequential, not parallel. It was NOT a >symmetric multi like those today. Actually, the 11/782 wasn't a long time ago in multiprocessor history. The earliest multiprocessor machines were the earliest machines. It was John Von Neumann's greatest gift to get us out of the completely parallel processor business into the uniprocessor business. But even recognizable diadic multiprocessors started appearing a decade before the 11/782. The point of the analogy, which was poorly drawn, is that many vendors are still falling into the trap of scaling up one function of a machine, while not scaling up the rest of the system to match, leading to very unbalanced systems. If your favorite PC which has a faster IP/FLOP rating than a 780 can't support 30 users because it has poor I/O performance, adding N Transputers isn't going to allow it support 30 users. . . (By the way, not all current multiprocessors are symetrical. They actually fall into a fairly wide range of classes with respect to how i/o load is distributed among the processors.) > >>Saturation of usable HW technology is close. > >What does this cliche mean? > It means I was in a hurry to finish, so didn't draw this point out. Sorry about that. It appears that the rate at which performance improvement through new implementation technology occurs is slowing dramatically. Top end clock rates are down to a few nanoseconds, and it doesn't appear likely that subnanosecond rates are going to arrive before the end of the century. This means the bottom end is within two orders of magnitude of the top and rapidly getting closer. To me this means that radical performance improvements over the spectrum of machines aren't as likely as a narrowing of the spectrum, thus "saturation of usable technology." When you couple this with the fact that after twenty years of various kinds of software/hardware research, parallel processing is still limited to the technology described by Dykstra in 63, and still has the same problems in software development, it is probably fair to say that little progress is being made in software either. >By the way, I don't fault people for not understanding >about multis (e.g., from past exposure or school). It takes some >time for common misconceptions to catch up to current reality. > Since I don't see a smiley face on this, I will assume that it was intended to be as obnoxious and condescending as it sounds. I won't argue over who misunderstands what, but I will point out to Mr. Fay, that I daily program a range of multiprocessors in both distributed and parallel applications, and that my opinions are the result of direct experience with all (MIMD, SIMD, multiflow, massive or sparse) of the currently available kinds of multiprocessors, as well as many of the dinasours. As an example of a class of algorithms which is difficult to vectorize or parallelize, let me pull out the ancient prime finder algorithm: IPRIME(1) = 1 IPRIME(2) = 2 IPRIME(3) = 3 NPRIME = 3 DO 50 N = 5, MAXN, 2 DO 10 I = 3, NPRIME IQ = N / IPRIME(I) IR = N - (IPRIME(I) * IQ) IF (IR .EQ. 0) GO TO 40 IF (IQ .LT. IPRIME(I)) GO TO 20 10 CONTINUE 20 NPRIME = NPRIME + 1 IPRIME(NPRIME) = N IF (NPRIME .GE. MAXP) GO TO 60 40 CONTINUE 50 CONTINUE 60 CONTINUE Although there are different algorithms for finding primes, I use this one to illustrate a class of problems which comes up frequently in my work. There exists some set from which must be drawn a subset. There exists a rule for ordering the set and another rule for determining if an element is a member of the subset. The determination rule requires the subset drawn from all elements ordered before X in the initial set be know to determine if X is a member of the subset. The determination test requires a comparison with some of the elements of the known subset to decide if X should be added to the subset. Usually, although not always, there is an ordering of the subset (not necessarily the same as the ordering of the set) such that by comparing X with members of the subset in order, it is frequently unnecessary to compare X with all of the members. None of the vectorizing compilers that I have access to will attempt to vectorize this algorithm. The Alliant automatic parallelizer will not attempt to parallelize it. Most of the mechanisms I have tried to handcraft a vector or parallel variant which remains true to the previous paragraph have added sufficent extra work to the algorithm that it runs more slowly as the number of processors increase. I would be indebted to Mr. Fay, or anyone else who could provide me with a parallel or vectorizable version of the algorithm which maintained it as an example of the kind of problem I have described above.
fay@encore.UUCP (Peter Fay) (12/22/87)
To begin with, my statement apparently was found offensive: In article <25@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes: #>By the way, I don't fault people for not understanding #>about multis (e.g., from past exposure or school). It takes some #>time for common misconceptions to catch up to current reality. # #Since I don't see a smiley face on this, I will assume that it was #intended to be as obnoxious and condescending as it sounds. [...] No insult was intended. My point is that, as shown by your own example (about the multiprocessor VAX 11/782 being slower than a uniprocessor 780) older multiprocessor technology has left a bad impression of multis in general, much of which has since been solved by symmetric shared-memory multiprocessors. Mr. Fouts original statement regarding parallel processors: #[...] Parallel #processing, at least in the sense of throwing multiple processors at a #single problem is currently difficult to use. Software is hard to #write, easy to get wrong and nearly impossible to debug. As long as #this stays true, parallel processor machines will have a burden beyond #price/performance to overcome before they are accepted. My response was the follwing: #In regard to general-purpose multiprocessors: NOTE! -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ #Every fork() (or thread_create() in Mach) in every program can get #scheduled on a different cpu (that includes every shell per user, #daemon, background task, ... Also, don't forget all those kernel #processes (or tasks, threads) running on different cpus (pagers, #signal handlers, ...). How difficult is it when the O.S. does it #transparently? # #And then there is more sophisticated mechanisms ("micro-tasking", gang #scheduling, vectorizing fortran compilers) available to any user who #wants more capability. He responded to me: #Mr. Fay is using transparently in a way with which I am unfamilar. It #is true that Mach provides primitives which allow the programmer to #introduce multitasking into the program; but these are in no sense #transparent. Task creation and deletion, synchronization, and task #scheduling all require explict code in the program which is to take #advantage of the tasking. Even the ancient Unix fork() has to be #explicitly coded for. # Yes, fork() and thread_create() must be explicitly coded. One way of looking at this though, is that, for example, Unix utilities already have about 850 fork()'s ALREADY coded in them. Compile them and new fork()'s will generally be run in parallel by the O.S. This usually is not optimal code, but it IS parallel and ISN'T difficult. Our modified "make" utility uses simple forks, shared memory and a dependency graph and often produces near-linear speedup. This parallelism enabled our machine to have the fastest Ada Validation Suit (just over three hours). No, this isn't a marketing pitch, just a justification of my remarks. (I still don't know why anyone would want Ada, anyway :-) I just compiled our 4.2 Unix kernel from scratch (152,000 lines in 13 minutes): 3461.2u 642.1s 13.26 508% 0+2k 1842+9439io 59pf+0w Using a 6-cpu machine, I got a speedup of 508% (obviously, I had to share some cpus with the O.S. and daemons). # #A problem with [micro-tasking, gang scheduling, vectorizing] is that they can #lead to parallel execution in which the wall clock time goes up as a #function of the number of processors, rather than down. . . True. It "can" also go down. As we all know, it depends on the algorithm and how it is coded. If there is a high amount of data dependency, many synchronization points, and little computation between, adding more processors may slow it down (on a supercomputer or a multi). And if that's the only algorithm you have, then a multi is not for you. If, however, you do lots of compiles, use lots of utilities, write operating system code (like me), use databases, AI, sorts, TP, editing, concurrent Ada & pascal, parallel OPS5, etc. (as I said, _general_purpose_), then a multi is the only way to go. #Another #problem is that no compiler can perfectly optimize. The better the #programmer understands what the optimizer can optimize, the easier it #is to write optimizable code. With vector code, this is fairly #straightforward to do, with parallel code, it is much more difficult. # I agree. #>Writing software which exploits the FULL parallelism of a machine MAY #>be hard to do in CERAIN cases. # #It has been my experience in five years of writing code to exploit #distributed and concurrent parallelism that it is hard in a large #number of cases. [...] There are #also algorithms requiring high communication cost, much #synchronization and unpredictable work per task which are nearly #untractable. There appear to be an entire class of algorithms for #which no parallel solution is more efficent than a sequential solution #on the same processor. I too can come up with many examples of algorithms that are hard or impossible to parallelize. Some architectures can exploit these better than others. #[...] it is probably fair to say #that little progress is being made in software either. # As I said originally: "This is one of the many time lags in parallel software tool development, not an inherent defect in architecture." >As an example of a class of algorithms which is difficult to vectorize >or parallelize, let me pull out the ancient prime finder algorithm: See my next followup - this is already too long. -- peter fay fay@multimax.arpa {allegra|compass|decvax|ihnp4|linus|necis|pur-ee|talcott}!encore!fay
daveh@cbmvax.UUCP (Dave Haynie) (12/22/87)
in article <25@amelia.nas.nasa.gov>, fouts@orville.nas.nasa.gov (Marty Fouts) says: > In article <2341@encore.UUCP> fay@encore.UUCP (Peter Fay) writes: >>In regard to general-purpose multiprocessors: >>Every fork() (or thread_create() in Mach) in every program can get >>scheduled on a different cpu (that includes every shell per user, >>daemon, background task, ... Also, don't forget all those kernel >>processes (or tasks, threads) running on different cpus (pagers, >>signal handlers, ...). How difficult is it when the O.S. does it >>transparently? > Mr. Fay is using transparently in a way with which I am unfamilar. It > is true that Mach provides primitives which allow the programmer to > introduce multitasking into the program; but these are in no sense > transparent. Task creation and deletion, synchronization, and task > scheduling all require explict code in the program which is to take > advantage of the tasking. Even the ancient Unix fork() has to be > explicitly coded for. I believe he's referring to the processor the code is executing on, not the specific request for threading, as being transparent. Right now if you fork() or thread_create(), you start an asynchronous task of some kind that's time shared on the same physical CPU. In the proposed senario, the OS will break up the tasks as evenly as possible amoung the existing CPUs in the system. This distribution is completely transparent to the programmer; it doesn't look any different than if it were running on a single machine, other than the large increase in execution speed you may get. This of course bypasses any of the arguments you get into when you introduce more sophisticated ways of using the multiple processors, like vectorizing or parallelizing compilers. Tasks are already atomic units and trivial to break up between CPUs, and the synchronization mechanisms are already put in place by the programmer. Of course, if you were writing code specifically for a machine like this with 5 or 10 CPUs, you'd consider multiple tasks as a speed enhancing device in many cases, where on the single CPU machine they'd likely slow down your program. A good argument for lightweight tasks like what you get in Mach. -- Dave Haynie Commodore-Amiga Usenet: {ihnp4|uunet|rutgers}!cbmvax!daveh "The B2000 Guy" PLINK : D-DAVE H BIX : hazy "I can't relax, 'cause I'm a Boinger!"
fouts@orville.nas.nasa.gov (Marty Fouts) (12/23/87)
In article <2989@cbmvax.UUCP> daveh@cbmvax.UUCP (Dave Haynie) writes: >I believe he's referring to the processor the code is executing on, not the >specific request for threading, as being transparent. Right now if you >fork() or thread_create(), you start an asynchronous task of some kind that's >time shared on the same physical CPU. In the proposed senario, the OS will >break up the tasks as evenly as possible amoung the existing CPUs in the >system. This distribution is completely transparent to the programmer; it >doesn't look any different than if it were running on a single machine, other >than the large increase in execution speed you may get. > The last part of this isn't precisely true. I have frequently found that timing problems in multitasked codes, especially those which are synchronization primitive bugs will not show up in single processor execution because of "accidental" synchronization as a result of sharing the single processor. Even pipelines can give different behaviour on a multiprocessor than on a uniprocessor, although the producer/consumer synchronization usually prevents this. There just isn't any such thing as transparent when dealing with cooperatively multitasked codes.
muller@alliant.Alliant.COM (Jim Muller) (12/23/87)
In <25@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes, with some references to article <2341@encore.UUCP> fay@encore.UUCP (Peter Fay): > >...And then there is more sophisticated mechanisms ("micro-tasking", gang > >scheduling, vectorizing fortran compilers) available to any user who... >A problem with these more sophisticated mechanisms is that they can >lead to parallel execution in which the wall clock time goes up as a >function of the number of processors, rather than down... With vectorization this is typically due to excessive masking. In heavily IF'ed code, the processor(s) can end up doing much more work than necessary. The vector overhead generally is insignificant by comparison. True parallel processing, however, does not suffer this problem, since any one processor never has to execute the "not-to-be-executed" portions. For this reason, true parallel processing can reduce runtimes even on code for which vector processing is a liability. [ Example of code for which vectorization may be a liability: DO I = 1, 100 IF (NS.GT.M) THEN . . ELSE IF (N..GT.MX) THEN IF (II.EQ.IX) THEN . . ENDIF ENDIF 100 CONTINUE ] >Another problem is that no compiler can perfectly optimize. The better the >programmer understands what the optimizer can optimize, the easier it >is to write optimizable code. With vector code, this is fairly >straightforward to do, with parallel code, it is much more difficult. >It has been my experience in five years of writing code to exploit >distributed and concurrent parallelism that it is hard in a large >number of cases. It is a truism for any machine, vector- or multi- or not, that a good program will run better than a poor one, and that a programmer who is ignorant of the compiler's capabilities can accidentally defeat its best efforts. The same holds for hardware issues as well as software. As for parallelism being harder to exploit than vectorization, the reverse is more likely to hold. Parallel execution can be applied to practically any vectorizable structure, while the reverse is decidedly not true. Many structures can be run as parallel processes, with runtime speedup, even if vectorization is either impossible or a liability. Anyway, it should be the compiler's job to analyze code for parallel or vector opportunities, and compilers capable of doing that well do exist. >There are well behaved algorithms, such as those involved in PDE solution >for which parallelism is trivial. There are also algorithms requiring high >communication cost, much synchronization and unpredictable work per task >which are nearly untractable. There appear to be an entire class of >algorithms for which no parallel solution is more efficent than a sequential >solution on the same processor. Indeed parallelism is trivial in some problems. That does not make it useless. Instead, it make it easy and efficient. More to the point, however, for those problems which do not exhibit inherent parallelism, nothing requires the machine (or compiler) to use it where inappropriate. All that is needed is that the unused processors be doing something else. For problems where it *is* appropriate, it can produce significant runtime speedup. In fact, most problems exhibit "some" parallelism, usually enough to be beneficial. For large, computationally intensive problems, the gain in execution speed from parallelism will usually justify the amount of effort required to exploit it, especially if the compiler rather than the programmer does most of the work. > >Debugging is a whole other soapbox, but my experience is that debugging > >coding errors is not much more difficult than uniprocessors. What is hard > >(or "impossible" with current tools) is detecting race conditions and > >bottlenecks - i.e. CONCEPTUAL errors. Race conditions are typically a problem only when a compiler has been inappropriately "permitted" to optimize code that it normally would have refused. The obvious approach, of course, should be to debug in single- processor mode. >...Throwing a parallel machine at an average workload is like throwing a >vector machine at it, only more so: The difference between vendor "not to >be exceeded" speed for the machine and delivered throughput for the >workload can be two order of magnitude, dramatically reducing the real >price / performance. This was discussed by this group just a month or so ago. There are really two causes. The first is the structure of the code itself. Certainly, real life programs are often dominated by scalar code, and better-written code will be less likely to thwart the compiler. More importantly, "peak" speeds quoted are usually the output rate during a vector instruction, totally ignoring the vector startup time. Startup time can be significant and must be paid every time a vector instruction starts (or repeats if the loop length is greater than the vector length). When this is taken into account, the *real* maximum rates can be achieved (well, almost) by codes that are well-structured for the machine. For an "average workload", a multi-processor machine can be beneficial if the various processors are free to do other jobs when parallelism is inappropriate. >Top end clock rates are down to a few nanoseconds, and it doesn't appear >likely that subnanosecond rates are going to arrive before... Light (or electromagnetic pulses) can only travel so fast, so you have to make things smaller to make them faster (obviously). This is *exactly* why radically different approaches such as parallelism, etc. are *appropriate* courses to follow, rather than useless. It is the single processor that is becoming harder to speed up. >...it is a most difficult problem, and one which decades of parallel >processing research has not solved adequately... >When you couple this with the fact that after twenty years of various >kinds of software/hardware research, parallel processing is still >limited to the technology described by Dykstra in 63, and still has >the same problems in software development, it is probably fair to say >that little progress is being made in software either. My background is science, not computer architecture, so I do not fully appreciate the "decades of parallel processing research" or the "past twenty years of various kinds of software/hardware". But I am quite familiar with some approaches, and have seen that it does work. The Alliant FX/8 was based on Kuck (1976), and in fact, was designed to address many of the objections you raised about parallel processing's limitations. It certainly cannot be described as equivalent to anything from 20 years ago. It does have most of the features I have described above, including an intelligent compiler (you are right, the programmer should not have to explicitely deal with parallelism), parallel I/O capability, slowdown with vectorization for heavily-masked code (most vector machines behave this way), "other jobs" for free (non-busy) processors when parallelism is inappropriate, race-condition failure when the compiler is permitted to optimize inappropriately, good parallelism for some codes and none at all for others, significant vector startup time that reduces the "peak" speed to a more "real" speed, and, finally, a dependence, as with any other machine, on the programmer avoiding any of the pitfalls that will defeat the machine's best intentions. REFERENCES Kuck, D., "Parallel Processing of Ordinary Programs," Advances in Computing, Vol. 15, 119-179, M. Rubinoff and M.L. Yovits, Eds., Academic Press, 1976. ----------------------------------------------------------------------------- This is not an official Alliant response. I am simply an employee who feels differently than you do. I only mentioned the FX/8 because it is a good, real-world, functioning example that attempts to address the valid objections you raised, and because I understand it. Alliant had no input into this posting, nor did Alliant require me to write this disclaimer.
fouts@orville.nas.nasa.gov (Marty Fouts) (12/24/87)
In article <1030@alliant.Alliant.COM> muller@alliant.UUCP (Jim Muller) writes: >In <25@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes, >>A problem with these more sophisticated mechanisms is that they can >>lead to parallel execution in which the wall clock time goes up as a >>function of the number of processors, rather than down... > With vectorization this is typically due to excessive masking. In heavily > IF'ed code, the processor(s) can end up doing much more work than necessary. > The vector overhead generally is insignificant by comparison. True parallel > processing, however, does not suffer this problem, since any one processor > never has to execute the "not-to-be-executed" portions. For this reason, > true parallel processing can reduce runtimes even on code for which vector > processing is a liability. Although it is true that parallelization doesn't suffer from vector start up overhead, it does suffer from various degrees of synchronization overhead and communication cost. There are pathologic algorithms which will actually perform worse on a multiprocessor in concurrent mode as a result of that overhead than they perform in single processor mode on the same machine. > Race conditions are typically a problem only when a compiler has been > inappropriately "permitted" to optimize code that it normally would have > refused. The obvious approach, of course, should be to debug in single- > processor mode. > This of course assumes a compiler which can recognize the concurrency and generate concurrent code. While Alliant has demonstrated that there are a wide range of constructs for which concurrency can be detected, there is still a much wider range of constructs for which it has to be explicitly generated (and a wide range of vendors whose compilers do not deal with it.) For this class of problem, it is very difficult to generate correct code and even more difficult to debug code. > Light (or electromagnetic pulses) can only travel so fast, so you have to > make things smaller to make them faster (obviously). This is not obvious, only typical. You can also make them fast by making them more efficent. (;-) Anyway, my problem with parallelism as a way to speed things up is that it speeds up the machines ability to handle a workload, which doesn't help me because management will just put more users on it, but doesn't speed up its ability to handle my intractable problems. > > My background is science, not computer architecture, so I do not fully > appreciate the "decades of parallel processing research" or the "past > twenty years of various kinds of software/hardware". But I am quite > familiar with some approaches, and have seen that it does work. The > Alliant FX/8 was based on Kuck (1976), and in fact, was designed to > address many of the objections you raised about parallel processing's > limitations. It certainly cannot be described as equivalent to anything > from 20 years ago. . . To quible: your reference is 12 years old, and is based on work done several years earlier. I will split the decade with you and say 15 years instead. And, of course, Kuck's work is based on earlier theory. I agree that commercially available multiprocessors have only become accessable in the last 5 years, and that the technology that Kuck described isn't widely available yet. I still maintain that if you stray from the path >This is not an official Alliant response. I am simply an employee who feels >differently than you do. I only mentioned the FX/8 because it is a good, >real-world, functioning example that attempts to address the valid objections >you raised, and because I understand it. Alliant had no input into this >posting, nor did Alliant require me to write this disclaimer. I have written code for the Alliant, and it does a good job of recognizing concurrency that it can find. In the interest of a balanced point of view, I am trying to point out that there is a long way to go before this approach will be widely usable to speed up delivered performance to general purpose work loads. By all means, people with vectorizable code should use vector computers, people with concurent code should use multiple processors, and multiple processors should be used for competitive multitasking. But we should not look to these technologies to continue the once rapid growth in overall computer performance.
cik@l.cc.purdue.edu (Herman Rubin) (12/26/87)
In article <1030@alliant.Alliant.COM>, muller@alliant.Alliant.COM (Jim Muller) writes: As for > parallelism being harder to exploit than vectorization, the reverse is more > likely to hold. Parallel execution can be applied to practically any > vectorizable structure, while the reverse is decidedly not true. If you are doing even moderately efficient methods of generating non-uniform random numbers, this is likely to be false. A flexible vector structure like the CYBER 205 makes the acceptance-rejection step trivial. A rigid vector structure like the CRAY makes it somewhat harder. On a MIMD machine, we have to wait for the unluckiest processor to finish. On a SIMD machine, we have this cost, and also must have both an acceptance and a rejection step for each cycle which has a rejection, as well as testing at each cycle whether a rejection has occurred _on any processor_. More to the point, > however, for those problems which do not exhibit inherent parallelism, > nothing requires the machine (or compiler) to use it where inappropriate. > All that is needed is that the unused processors be doing something else. This eliminates SIMD machines, which are currently the ones with the greatest (by far) parallelism. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet
bzs@bu-cs.BU.EDU (Barry Shein) (12/28/87)
Posting-Front-End: GNU Emacs 18.41.4 of Mon Mar 23 1987 on bu-cs (berkeley-unix) Marty Fouts continues a tradition in discussions of the relative merits of parallel machines which can be summed up in the simple statement of the "best as enemy of the good". I don't think any point is made when someone pulls out some algorithm and challenges the community to parallelize it, what's the point? Has anyone denied that such problematic algorithms exist. He also questions the merits of Peter Fay's claim of development transparency in some modern machines. Here's an example: I compile a system with about 100 C modules on a Vax750 (4.3,8MB,RA81s), elapsed time is 1 hour 39 minutes. I do the following on exactly the same code on an Encore Multimax (6x332,28MB,CDC drives): % setenv PARALLEL 8 % time make and it compiles in 2 Minutes and 10 seconds or about a 50x speedup. Do you like waiting an hour for a compile (assuming your machine is two or three times faster than the 750)? Is it worth anything to you to simply solve that problem? It's not like the parallel systems are much more expensive than what you're probably using now (the Encore I describe above is ~$200K.) So the real question is: Is the value of parallel systems enhanced by their acheivments or diminished by the hard problems which exist? I also have a parallelized raytrace program which exhibits roughly N performance increase as it's re-run allowing more CPUs to participate (same machine.) Sure, it's an easy, nearly trivial problem to parallelize (NOT transparent, but all I have to do is break the loop across simple forks, about 20 lines of additional C code), so what? Does that mean it's not worthwhile and only impossible problems are to be considered? That would be a strange view. I am continuously amazed at people who, upon learning that I own several parallel systems and use them in production work immediately remark that they find it interesting, but OF COURSE there are all these problems (most of which they themselves don't need solved) which can't be parallelized (or not easily.) Seriously, what's the point? Just the 80's version of all the people that used to decry higher level languages because they could pull out this algorithm that no compiler could do as good a job on as a hand-coded solution. So what!? -Barry Shein, Boston University
reggie@pdn.UUCP (George W. Leach) (12/28/87)
In article <18013@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >I am continuously amazed at people who, upon learning that I own >several parallel systems and use them in production work immediately >remark that they find it interesting, but OF COURSE there are all these >problems (most of which they themselves don't need solved) which can't >be parallelized (or not easily.) So nobody is forcing them to "parallelize" these problems! However, for those areas where we need the help you can utilized this ability. And certainly in the future we will get a better handle on how to harness all of this power in other areas. One thing that for years I have wanted is the ability to assign all database processes to specific CPUs. On a singe CPU machine a few DB backends combined with other people compiling and editing can bring a machine to a slow crawl very quickly. -- George W. Leach Paradyne Corporation {gatech,rutgers,attmail}!codas!pdn!reggie Mail stop LF-207 Phone: (813) 530-2376 P.O. Box 2826 Largo, FL 34649-2826
eugene@pioneer.arpa (Eugene Miya N.) (12/29/87)
In article <1030@alliant.Alliant.COM> muller@alliant.UUCP (Jim Muller) writes: >> >Debugging is a whole other soapbox, but my experience is that debugging > Race conditions are typically a problem only when a compiler has been > inappropriately "permitted" to optimize code that it normally would have > refused. The obvious approach, of course, should be to debug in single- > processor mode. Excuse me for chuckling. This approach was mentioned and rejected several years ago at Cray User Group meetings. [I note your innocence below.] There are several good old HEP papers which show single machine debugging isn't the solution. I just hope that fewer other readers make the same mistake. Not every problem is a race condition, but we've seen some good ones. > My background is science, not computer architecture, so I do not fully > appreciate the "decades of parallel processing research" or the "past Don't worry we won't that against you. ;-) [Nor the comment that you worked for Alliant.] From the Rock of Ages Home for Retired Hackers: --eugene miya, NASA Ames Research Center, eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {uunet,hplabs,hao,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene
brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (12/30/87)
In article <3771@ames.arpa> eugene@pioneer.UUCP (Eugene Miya N.) writes: >In article <1030@alliant.Alliant.COM> muller@alliant.UUCP (Jim Muller) writes: >>> >Debugging is a whole other soapbox, but my experience is that debugging >> Race conditions are typically a problem only when a compiler has been >> inappropriately "permitted" to optimize code that it normally would have >> refused. The obvious approach, of course, should be to debug in single- >> processor mode. Although the above comments are a little off the wall, the debugging of a parallel program is best done in two steps. First you run it in serial mode (parallel code but with one processor) and find and kill all of the bugs that show up. If your program dumps core here the use of a good serial debugger will be of great help. After doing this you run the code with more than one processor. Bad things that are hard to pin down can still happen, (gee Eugene, this program ran fine through 13 cycles and then dumped core out of the blue!) but at least you won't have the serial program bugs biting you when you are running the code in parallel. Of course there are some implicit assumptions: 1) The parallel programming model you use uses the same code regardless of the number of processors (the number of processors is a runtime parameter that may not even be known to the user of the code) 2) The code produces the same results (perhaps to machine roundoff accuracy to satisfy those who have real numerical parallel code experience) regardless of the number of processors you run it with.