zenith@ensmp.fr (Steven Ericsson Zenith) (04/18/91)
Somehow I started getting mail from SERC/DTI. Geez, I thought more junk mail. Then this morning as I carelessly disgarded more of the stuff I came across a real gem. Since most on this list probably don't get this stuff, let me share this with you. It had to be done. I thought of doing it myself. But someone beat me to it. What's more, I know the people involved well and I'd trust these guy's to do one heck of a job (and have fun doing it :-). A small company - essentially a bunch of ex-Inmos engineers with more applications experience in this area (distributed memory parallel) than most - in fact, when I think of it, more than anyone else I know. They've put together a desktop supercomputer out of a standard 386 PC, MSDOS, added UNIX System V, a bunch of transputers each of which connect and have shared memory access to a 40MHz i860. The company was founded by one of the guys who did the core work on Helios - so I'd expect them to live up to their "seamless" claim. The machine is called the GT860 (what else :-), scales from 1-8+ i860's, I'd guess these are standard board components designed by them and sold elsewhere. The company are in the UK and called DIVISION Limited. Their phone number (just so you don't all ask me for more data) is +44 454 324527. I haven't seen any of these guy's in a year or two and have no connection - and, no, I'm not looking for a job ;-) But this did get me to thinking. 8 i860's... Mmmm. Standard board level components. I'd probably do better if they shared a bus but then I'd lose scalability, or would I? I don't know much about i860's sharing memory. Anyone done it? 8 would make a reasonable sized node. Yea, let's say 8 i860's per node. Nah let's say 9, one extra just to handle internode exchanges and routing (off the shelf routing chips with enough bandwidth arn't around just yet - or are they?). Make the whole thing really modular and I too can sell you a 1 to n processor machine. Mmm, at least one DSP chip per node. DSP chips to give data compression at every IO port. Data compression on and off all disks. Oh yeh. At least one fast disk per node too. Mmm. what's the cost engineering constraints? What's the size of the box. Eeek, all that SRAM...:-( Now where am I going to get the software for this thing? Oh dear .. well? Comp.parallel's been a bit boring lately, so come on guy's. Building a parallel supercomputer today out of off the shelf parts has got to be real easy (given the time and money). Hasn't it? How about candidates for the comp.parallel OEM machine? OEM (in case you didn't know) is the industrial term for Office Equipment Machine. You know, I'll sell you the bits, you put it together and sell it into application specific areas. Areas like the front desks of the stock markets, container terminals, airtraffic control, chemical factories, and other vertical markets. In the meantime. Good luck to Charlie, Phil and Co. Steven -- Steven Ericsson Zenith <zenith@ensmp.fr> Center for Research in Computer Science Ecole Nationale Superieure des Mines de Paris. France -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
tve@sprite.berkeley.edu (Thorsten von Eicken) (04/19/91)
... taking you up on your request for flames... When I see as new DMMP proposed, the first parameter I look at is how many cycles it takes to send&receive a message. If it's greater than 10, the machine is not interesting in my opinion. Now, if you try to glue a few off-the-shelf processors together, you're not likely to produce anything interesting according to my criterion. But then, maybe I should only look at peak MFLOPS? Thorsten von Eicken - tve@sprite.berkeley.edu -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
pratt@cs.stanford.edu (Vaughan Pratt) (04/19/91)
From: Steven Ericsson Zenith <zenith@ensmp.fr> Date: Wed, 17 Apr 91 22:39:08 +0100 Subject: New i860 parallel machine Newsgroups: comp.parallel Building a parallel supercomputer today out of off the shelf parts has got to be real easy (given the time and money). That brings back memories. This was exactly our attitude at Stanford about workstations in 1980. We spent 11/80 to 8/81 building the Sun workstation out of off-the-shelf parts in our "garage", namely Margaret Jacks Hall 428, then Andy Bechtolsheim's office, previously Forest Baskett's, now mine. Sun the company was formed at the end of 2/82, and the ink turned black in 6/82. But use of off-the-shelf parts was only one of several key ingredients. Another was Unix, of which C was an integral part. Sun shipped off-the-shelf Unisoft Unix throughout 1982 prior to its in-house Lyon-Shannon port of Berkeley Unix. The GT860 needs the parallel supercomputer equivalent of Unix and C. This is not an off-the-shelf item today. But fixing this is more than just finding another Dennis Ritchie. The biggest obstacle is that we don't even know what concurrency is. People think they know now, but in hindsight in 2020 they will be able to see that they really did not know in 1991. In 1891 physicists had no inkling of what physics would look like in 1926. And the great majority had no inkling that they had no inkling. In 1991 computer scientists are in exactly the same situation with regard to concurrency. Understanding the nature of concurrency is a problem as important yet unsolved today as understanding the nature of matter was a century ago. (I predict that ideas will soon start to flow from the former to the latter, which is still far from wrapped up, but that's another story.) Although both sides of the Atlantic are attacking this question, in my view Europe is ahead of the US in this very important area. The largest electronic cache of US-generated papers on European-style concurrency resides on a disk in, by coincidence, the above-mentioned Sun garage. Its contents are listed in pub/README available by anonymous ftp from Boole.Stanford.EDU. Vaughan Pratt -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
zenith@ensmp.fr (Steven Ericsson Zenith) (04/20/91)
I actually got a bundle of mail, in addition to those posted, on this subject. I'll summarise that in a second note. Here I want to comment on Vaughan Pratt's message of the 18th and the subsequent response from "hugo@griggs.dartmouth.edu (Peter Su)" Vaughan Pratt said > The biggest obstacle is that we don't even know what concurrency is. > People think they know now, but in hindsight in 2020 they will be able > to see that they really did not know in 1991. Actually I agree fully with this comment and I hope to explain why. However Peter Su said in response: > One of the assumptions behind this statement is that concurrency as it > is defined by computer science ... is relavent to the construction of > parallel machines and algorithms for computational problems ... this > definition of concurrency, and the formal models surrounding it are > all motivated in some sense by the design and implementation of > operating systems. One could argue that they are largely irrelevant > to the design and implementation of parallel algorithms. > For example, in the theoretical computer science community, most > parallel algorithms are designed not with concurrent process algebras, > pomsets, or any of that. Rather, the model of computation is SPMD > shared memory. In other words, the emphasis is on operating on > parallel data structures, not on coordinating concurrent processes. I don't understand the point here, are you just having a dig at Vaughan? If I read you correctly you are distinquishing between "computer science" and "theoretical computer science"? You seem to suggest that the theories of concurrency have no place in "theoretical computer science". Well, frankly, lift your head up and take a look around you! Nor can I agree that the formal models of concurrency are motivated in any sense by the design and implementation of operating systems alone. As evidence of to support my contention I point to the work of Bill Roscoe, Tony Hoare, Robin Milner and many others. It is true that in the USA a lot of systems research has focused on operating system support for concurrent models but that is just one aspect of a multifaceted science. As Vaughan points out alot of work has been done in Europe. Another facet of computer science focuses on the design of algorithms ... very often the focus in such work is very narrow. Peter goes on to say: > The reason for this is simple, the amount of parallelism that you can > obtain scales naturally with the size of the problem. So, the most > natural way to get a lot of parallelism is to write one program, and > replicate it. Is it really realistic to expect that people could > program a 2K MIMD machine susing two thousand separate programs? I > don't think so. Well I think you are just plain wrong and here's why. You have too narrow a view of problems. Let's consider a problem I raised earlier. Control and management of a container terminal. Let us now consider the problem a little (just a little). I have ships arriving at n docks in intervals. Each ship carries x containers. Each container has y goods earmarked for collection to z departures. Ships are expected, we know their point of departure, we know what they carry in the containers. I also have m trunks arriving at land gates. Each dock has p robots which remove containers from arriving ships and place them directly on trunks (let's assume all custom checks are on ship). This one container terminal communicates and cooperates with other such terminals, points of departure and destination. And so forth ... Forget control - consider monitoring such a system which on it's own is already a >2K MIMD machine! Not convinced? Ok, there's a bigger one - consider air traffic control. Look, these are BIG machines. What are you going to do? Continue to program the whole problem in PASCAL? COBOL? ASSEMBLER :-) Further, in the back room of each container terminal (circa 2010ad), and each air traffic control system are several >10K SIGMA machines. Amoung other things, every day, for ten minutes it runs one of your wacky algorithms. Oh, it's neat. It's bulk synchronous, it does in ten minutes what it would take 5 years to compute now, it is one component in a mighty world built of concurrent systems. And great big ones at that! Don't misunderstand me. Replication is a very very important aspect of concurrent systems, and it will be used widely. So in answer to Peter's question "is it really realistic to expect that people could program a 2K MIMD machine susing two thousand separate programs?" - you bet your sweet bippy it is. People do it now. Talk to anyone programming an air traffic control system and ask them how many separate programs (read processes) they have currently! But ok, that's in the real world, and I'm reluctant to leave it. You want to consider a single big box of devices? Consider simulation models. Think about architectural design and simulation in the car industry. Consider the design and simulation of space probes. Think about geological and astro-physical simulations. Components of each of these may indeed be SPMD (or SIMD for that matter) but the whole will be a MIMD concurrent system. In the next century this will be small fry (i.e. 2K separate programs). Expect systems with 100's of millions of concurrent computing components. Two years ago I pooh poohed Ian Barron when he told me people should be thinking in millions of processes. He'd been having his house decorated at the time and I figured the fumes were getting to him. But he's right in my view now. The only obstacle is economic. No small obstacle that - ask yourself why there is no exploitation of the moon. It may be that we build one or two 10K processor machines before we reach the end of the century ... and nothing interesting happens beyond that until the year 2100! Imagination is what we need! We're too short sighted at the moment. But it's a problem. How is anyone going to manage such systems? Well, I don't think it will be one person - just as the large systems I've mentioned are not now programmed or even managed by a single person. I expect someone to discover the holy grail of concurrent systems in the next ten years or so ... and I think that's what Vaughan meant. Incidently, the theories of concurrency don't ignore data distribution. In fact, that's what we're all preoccupied with just now. Oh, and just in case someone comes back and says I can still do all the above using a SPMD model my reply must be that a) you must have enormous resource on each node and b) all you've done is provide a scheduling mechanism for a MIMD program. :-) Mmmm. I still haven't answered Vaughan's comments. Steven -- Steven Ericsson Zenith <zenith@ensmp.fr> Center for Research in Computer Science Ecole Nationale Superieure des Mines de Paris. France -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
berryman-harry@CS.YALE.EDU (Harry Scott Berryman) (04/21/91)
WARNINIG! I AM NOT A HARDWARE JOCK! TAKE MY HARDWARE COMMENTS WITH A GRAIN OF SALT! In article <1991Apr18.120233.339@hubcap.clemson.edu> zenith@ensmp.fr (Steven Ericsson Zenith) writes: > >They've put together a desktop supercomputer out of a standard 386 PC, >MSDOS, added UNIX System V, a bunch of transputers each of which connect >and have shared memory access to a 40MHz i860. The company was founded >by one of the guys who did the core work on Helios - so I'd expect them >to live up to their "seamless" claim. There are two serious problems I see with this kind of design. The first problem is that the i860 will be far faster than the transputers which link them. The second is that the i860 will be need heaps of fast memory or it will be cache-bound. These two problems actually reer there ugly heads on the iPSC/860. The network is not nearly as fast as the processors and the i860s spend a lot of time waiting for memory. The two problems agrivate each other. It is an intersting set of parts they've picked. Custom CMOS may help us out of a bind on some of the parts. You can build a custom chip for under $100,000. For what target audience would we be building such a beast for? A dbase machine has far different constraints from a scientific machine. There is no Cray version of Ingress, last I heard. As the whole world (at least my part of it) seems intent on building the better scientific parallel machine, I suggest we talk about a different set of problems. Two come to mind: 1) Reservation systems (like for airlines and hotel chains) 2) Digital signal processing (in real time) I know very little about either application. Anyone out know any of the requirements for such systems, or have suggestions for other problems? I agree with you, Steve, this party's been a bit of bore lately. Let's get some action out here. Harry Scott Berryman Yale Computer Science Department and ICASE/NASA Langley Research Center berryman@cs.yale.edu -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
anand@top.cis.syr.edu (R. Anand | School of Computer and Information Science) (04/21/91)
Here at Syracuse University, I am surrounded by parallel computers of all sorts ranging from a Connection Machine to an Encore Multimax. Even as I type this, I am running a parallel neural-net backpropagation training program but it not on one of these machines. Instead, it is running quite happily on 8 SPARCstation 1+s. I get a speedup of about 7 with 8 workstations and the resulting throughput is quite acceptable. I have seen a number of other postings in the past few months that indicates that this simple approach has been used successfully by a number of other people. I am now coming to the conclusion that just like the LISP machines were made obsolete by the rapid increases in speeds of general-purpose microprocessors, the parallel computer as we know it today is going to face some sever challenges by networked workstations. My program uses ISIS for communication and XDR routines to ensure data compatibility between different machines. Since broadcasting is the only kind of communuication available in ISIS, it is fortunate that thats all I need. There is one master processor which broadcasts data to the compute servers and gets back results. With a conventional parallel computer, you are locked into using whatever type of processor the machine was designed for. In my case, I face no such problems. I will be shortly taking advantage of a set of RS6000s that have become available here and soon thereafter a set of HP snakes. In addition, I do not have to use a crummy program development environment of the sort that usually comes with most parallel computers. Instead, I use standard g++ (my programs are written in C++) and gdb which I find to be more than adequate. Load balancing is trivial with my setup. At the end of each iteration, the compute servers report the time taken per unit of work to the ovserseeing processor. Then for the next iteration, this overseer then apportions out the work in inverse proportion to the time taken. If someone logs into one of my compute servers, the load balancing simply shifts work elsewhere. If a machine crashes, ISIS can handle this situation quite well. My program simply regroups and continues with fewer compute servers. Ok, so now you tell me: It may be fine for you but my program is not so coarse-grained. My answer is: Go back and take a second look at your problem. It may well be possible to formulate a coarser version that what you are using now. Here is a tip: Many of the programs used on the Connection Machine are based on broadcasting. It is very likely that you may be able to adapt something you find there. R. Anand | School of Computer and Information Science anand@top.cis.syr.edu | Syracuse University. -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
pratt@cs.stanford.edu (04/22/91)
In article <9104191644.AA17015@hubcap.clemson.edu> hugo@griggs.dartmouth.edu (Peter Su) writes: >One of the assumptions behind this statement is that concurrency as it >is defined by computer science (i.e. The interaction of independent >processes executing programs asynchronously, posets, partial orders >among events in time, temporal logic, whatever) is relavent to the >construction of parallel machines and algorithms for computational >problems. >As has been noted before (see IEEE Software, Sept. 1990), this >definition of concurrency, and the formal models surrounding it are >all motivated in some sense by the design and implementation of >operating systems. One could argue that they are largely irrelevant >to the design and implementation of parallel algorithms. Denying a connection between operating systems and parallel algorithms is like denying a connection between planetary motion and gyroscope behavior. If your account of planetary motion is in terms of Ptolemaic epicycles then there is no connection. But if your account is in terms of Newton's laws then there is a very beautiful connection. The "epicycle theory" of operating systems is the view of asynchronously communicating sequential processes mediated by buffered channels. This view sheds little light on parallel algorithms other than those that clearly incorporate such channels. In contrast "Newton's laws" for operating systems view their activity in terms of the dual notions of event occurrences and state trajectories, which is as much the basis of parallel algorithms as of operating systems and can shed much light on both. >For example, in the theoretical computer science community, most >parallel algorithms are designed not with concurrent process algebras, >pomsets, or any of that. Rather, the model of computation is SPMD >shared memory. In other words, the emphasis is on operating on >parallel data structures, not on coordinating concurrent processes. These are only distinguishable if you insist on too narrow a definition of "coordinating concurrent processes." Presumably you are thinking in terms of coordination by channels, semaphores, monitors, etc., which certainly do not feature prominently among the coordination primitives used by many parallel algorithms. A major shortcoming of parallel programming today is a lack of good primitives to serve as a basis for structured parallel programming. A decent process algebra will supply such primitives. These will be of at least as much help in writing, understanding, and verifying SIMD programs as if-then-else's and while-do's are for sequential programs. Then you will find people starting to design their parallel algorithms with concurrent process algebras (for which pomsets are just one of several plausible models) instead of with sequential control structures. Although a good deal of progress in that direction has been made during the 1980's, we still have a long way to go before that vision is fully realized. >The reason for this is simple, the amount of parallelism that you can >obtain scales naturally with the size of the problem. So, the most >natural way to get a lot of parallelism is to write one program, and >replicate it. That approach constitutes the assembly language of parallel programming: Turing powerful but devoid of structure. In mathematics you could argue with equal logic that the natural way to build a vector space is start with one point and replicate it. That gets you as far as the underlying set of the space, and indeed to many people a geometric space *is* the set of its points. But that doesn't do justice to vector spaces. Set theory is only the assembly language of mathematics. A vector space adds structure to that set, differentiating it from other set-based structures such as lattices and graphs and grammars. Types are more than just sets, and parallel programs are more than just replicated programs: they have useful structure. If you don't see that structure you don't understand your program efficiently. Structured programming was a hard sell in 1970, and it doesn't seem to have gotten much easier 20 years later. When it's available and working smoothly many people buy it and use it, yet do not acknowledge it. And when it's not available they insist they're doing just fine without it. Some acknowledge it, but it is depressing that so many do not. In between are those that acknowledge the past but not the future: they like what's happened so far but refuse to believe that anything more can happen. They believe that all the good inventions in structured programming have already been invented and it is now a closed subject. There is some structure available now for structured parallel programming, but nowhere near enough in my opinion. A whole lot more work is needed before the tools are in place to create really insightful representations of concurrent programs. Tools based on the view that a concurrent program is a collection of communicating sequential processes, or that it is the sum of its execution sequences, are not universal enough to be useful to people working in parallel programming. We need much better views in order to understand our parallel programs insightfully. Better views of concurrent computation are among the most important goals of modern concurrency theory research. >Is it really realistic to expect that people could >program a 2K MIMD machine susing two thousand separate programs? I >don't think so. This has nothing to do with concurrency theory, which neither limits itself to MIMD nor insists that the n programs running on an n-processor MIMD machine all be distinct. >Thus, the main problem in parallel computing from my point of view >(programmer and algorithm designer) is how to manage the placement and >movement of *data* in the machine. So, I wonder what all the theories >about concurrency have to do with this. They have everything to do with it. They tell you how to think about it happening in parallel. If you have a better theory of how data can move and interact concurrently than the present theories of concurrency then you're a contributor to this field whether you admit it or not. If you don't have a better theory then one must infer from your objections (to the work of those who do) that you don't believe the parallel programming world needs a theory of data movement and interaction. Vaughan Pratt -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
will@uunet.UU.NET (William Pickles) (04/22/91)
zenith@ensmp.fr (Steven Ericsson Zenith) writes: >Comp.parallel's been a bit boring lately, so come on guy's. Building a >parallel supercomputer today out of off the shelf parts has got to be >real easy (given the time and money). Hasn't it? How about candidates ... for a real programming model for this type of machine. Any fool can build it but the article in Decembers Scientific American shows that most supercomputers end up running two orders of magnitude slower than their build speed. Programming for consistent speed is not easy WIlliam Pickles -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
zenith@ensmp.fr (Steven Ericsson Zenith) (04/23/91)
zenith@ensmp.fr (Steven Ericsson Zenith) writes: >Comp.parallel's been a bit boring lately, so come on guy's. Building a >parallel supercomputer today out of off the shelf parts has got to be >real easy (given the time and money). Hasn't it? How about candidates ... for a real programming model for this type of machine. Any fool can build it but the article in Decembers Scientific American shows that most supercomputers end up running two orders of magnitude slower than their build speed. Programming for consistent speed is not easy WIlliam Pickles William, you haven't been paying attention. You are, of course, absolutely right. But please, don't tempt me ... ;-) Steven PS. Requests for addition to the Ease mailing list can be sent to - ease-request@ensmp.fr :-) -- Steven Ericsson Zenith <zenith@ensmp.fr> Center for Research in Computer Science Ecole Nationale Superieure des Mines de Paris. France -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell