[comp.parallel] New i860 parallel machine

zenith@ensmp.fr (Steven Ericsson Zenith) (04/18/91)

Somehow I started getting mail from SERC/DTI.  Geez, I thought more junk
mail. Then this morning as I carelessly disgarded more of the stuff I
came across a real gem.

Since most on this list probably don't get this stuff, let me share this
with you. It had to be done. I thought of doing it myself. But someone
beat me to it. What's more, I know the people involved well and I'd
trust these guy's to do one heck of a job (and have fun doing it :-).

A small company - essentially a bunch of ex-Inmos engineers with more
applications experience in this area (distributed memory parallel) than
most - in fact, when I think of it, more than anyone else I know.
They've put together a desktop supercomputer out of a standard 386 PC,
MSDOS, added UNIX System V, a bunch of transputers each of which connect
and have shared memory access to a 40MHz i860. The company was founded
by one of the guys who did the core work on Helios - so I'd expect them
to live up to their "seamless" claim.

The machine is called the GT860 (what else :-), scales from 1-8+ i860's,
I'd guess these are standard board components designed by them and sold
elsewhere. The company are in the UK and called DIVISION Limited.  Their
phone number (just so you don't all ask me for more data) is +44 454
324527. I haven't seen any of these guy's in a year or two and have no
connection - and, no, I'm not looking for a job ;-)

But this did get me to thinking. 8 i860's... Mmmm. Standard board level
components. I'd probably do better if they shared a bus but then I'd
lose scalability, or would I? I don't know much about i860's sharing
memory. Anyone done it? 8 would make a reasonable sized node. Yea, let's
say 8 i860's per node. Nah let's say 9, one extra just to handle
internode exchanges and routing (off the shelf routing chips with enough
bandwidth arn't around just yet - or are they?). Make the whole thing
really modular and I too can sell you a 1 to n processor machine. Mmm,
at least one DSP chip per node.  DSP chips to give data compression at
every IO port.  Data compression on and off all disks. Oh yeh. At least
one fast disk per node too. Mmm. what's the cost engineering
constraints? What's the size of the box. Eeek, all that SRAM...:-( Now
where am I going to get the software for this thing? Oh dear .. well?

Comp.parallel's been a bit boring lately, so come on guy's. Building a
parallel supercomputer today out of off the shelf parts has got to be
real easy (given the time and money). Hasn't it? How about candidates
for the comp.parallel OEM machine?  OEM (in case you didn't know) is the
industrial term for Office Equipment Machine. You know, I'll sell you
the bits, you put it together and sell it into application specific
areas. Areas like the front desks of the stock markets, container
terminals, airtraffic control, chemical factories, and other vertical
markets.

In the meantime. Good luck to Charlie, Phil and Co.

Steven
--
Steven Ericsson Zenith <zenith@ensmp.fr>
Center for Research in Computer Science
Ecole Nationale Superieure des Mines de Paris. France


-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

tve@sprite.berkeley.edu (Thorsten von Eicken) (04/19/91)

... taking you up on your request for flames...

When I see as new DMMP proposed, the first parameter I look at is how
many cycles it takes to send&receive a message. If it's greater than
10, the machine is not interesting in my opinion. Now, if you try to
glue a few off-the-shelf processors together, you're not likely to
produce anything interesting according to my criterion. But then, maybe
I should only look at peak MFLOPS?
	Thorsten von Eicken - tve@sprite.berkeley.edu

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

pratt@cs.stanford.edu (Vaughan Pratt) (04/19/91)

	From: Steven Ericsson Zenith <zenith@ensmp.fr>
	Date: Wed, 17 Apr 91 22:39:08 +0100
	Subject: New i860 parallel machine
	Newsgroups: comp.parallel

	Building a parallel supercomputer today out of off the shelf
	parts has got to be real easy (given the time and money).

That brings back memories.  This was exactly our attitude at Stanford
about workstations in 1980.  We spent 11/80 to 8/81 building the Sun
workstation out of off-the-shelf parts in our "garage", namely Margaret
Jacks Hall 428, then Andy Bechtolsheim's office, previously Forest
Baskett's, now mine.  Sun the company was formed at the end of 2/82,
and the ink turned black in 6/82.

But use of off-the-shelf parts was only one of several key
ingredients.  Another was Unix, of which C was an integral part.  Sun
shipped off-the-shelf Unisoft Unix throughout 1982 prior to its
in-house Lyon-Shannon port of Berkeley Unix.  The GT860 needs the
parallel supercomputer equivalent of Unix and C.  This is not an
off-the-shelf item today.

But fixing this is more than just finding another Dennis Ritchie.  The
biggest obstacle is that we don't even know what concurrency is.
People think they know now, but in hindsight in 2020 they will be able
to see that they really did not know in 1991.

In 1891 physicists had no inkling of what physics would look like in
1926.  And the great majority had no inkling that they had no inkling.
In 1991 computer scientists are in exactly the same situation with
regard to concurrency.  Understanding the nature of concurrency is a
problem as important yet unsolved today as understanding the nature of
matter was a century ago.  (I predict that ideas will soon start to
flow from the former to the latter, which is still far from wrapped up,
but that's another story.)

Although both sides of the Atlantic are attacking this question, in my
view Europe is ahead of the US in this very important area.  The
largest electronic cache of US-generated papers on European-style
concurrency resides on a disk in, by coincidence, the above-mentioned
Sun garage.  Its contents are listed in pub/README available by
anonymous ftp from Boole.Stanford.EDU.

	Vaughan Pratt

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

zenith@ensmp.fr (Steven Ericsson Zenith) (04/20/91)

I actually got a bundle of mail, in addition to those posted, on this
subject. I'll summarise that in a second note.

Here I want to comment on Vaughan Pratt's message of the 18th and the
subsequent response from "hugo@griggs.dartmouth.edu (Peter Su)"

Vaughan Pratt said

> The biggest obstacle is that we don't even know what concurrency is.
> People think they know now, but in hindsight in 2020 they will be able
> to see that they really did not know in 1991.

Actually I agree fully with this comment and I hope to explain why.

However Peter Su said in response:

> One of the assumptions behind this statement is that concurrency as it
> is defined by computer science ... is relavent to the construction of
> parallel machines and algorithms for computational problems ... this
> definition of concurrency, and the formal models surrounding it are
> all motivated in some sense by the design and implementation of
> operating systems.  One could argue that they are largely irrelevant
> to the design and implementation of parallel algorithms.

> For example, in the theoretical computer science community, most
> parallel algorithms are designed not with concurrent process algebras,
> pomsets, or any of that.  Rather, the model of computation is SPMD
> shared memory.  In other words, the emphasis is on operating on
> parallel data structures, not on coordinating concurrent processes.

I don't understand the point here, are you just having a dig at Vaughan?
If I read you correctly you are distinquishing between "computer
science" and "theoretical computer science"? You seem to suggest that
the theories of concurrency have no place in "theoretical computer
science".  Well, frankly, lift your head up and take a look around you!

Nor can I agree that the formal models of concurrency are motivated in
any sense by the design and implementation of operating systems alone.
As evidence of to support my contention I point to the work of Bill
Roscoe, Tony Hoare, Robin Milner and many others. It is true that in the
USA a lot of systems research has focused on operating system support
for concurrent models but that is just one aspect of a multifaceted
science. As Vaughan points out alot of work has been done in Europe.

Another facet of computer science focuses on the design of algorithms
... very often the focus in such work is very narrow.

Peter goes on to say:

> The reason for this is simple, the amount of parallelism that you can
> obtain scales naturally with the size of the problem.  So, the most
> natural way to get a lot of parallelism is to write one program, and
> replicate it.  Is it really realistic to expect that people could
> program a 2K MIMD machine susing two thousand separate programs?  I
> don't think so.  

Well I think you are just plain wrong and here's why. You have too
narrow a view of problems. Let's consider a problem I raised earlier.
Control and management of a container terminal. Let us now consider the
problem a little (just a little). I have ships arriving at n docks in
intervals.  Each ship carries x containers.  Each container has y goods
earmarked for collection to z departures. Ships are expected, we know
their point of departure, we know what they carry in the containers. I
also have m trunks arriving at land gates. Each dock has p robots which
remove containers from arriving ships and place them directly on trunks
(let's assume all custom checks are on ship).  This one container
terminal communicates and cooperates with other such terminals, points
of departure and destination. And so forth ...

Forget control - consider monitoring such a system which on it's own is
already a >2K MIMD machine! Not convinced? Ok, there's a bigger one -
consider air traffic control. Look, these are BIG machines. What are you
going to do?  Continue to program the whole problem in PASCAL? COBOL?
ASSEMBLER :-)

Further, in the back room of each container terminal (circa 2010ad), and
each air traffic control system are several >10K SIGMA machines.  Amoung
other things, every day, for ten minutes it runs one of your wacky
algorithms.  Oh, it's neat. It's bulk synchronous, it does in ten
minutes what it would take 5 years to compute now, it is one component
in a mighty world built of concurrent systems. And great big ones at
that!

Don't misunderstand me. Replication is a very very important aspect of
concurrent systems, and it will be used widely.

So in answer to Peter's question "is it really realistic to expect that
people could program a 2K MIMD machine susing two thousand separate
programs?"  - you bet your sweet bippy it is. People do it now. Talk to
anyone programming an air traffic control system and ask them how many
separate programs (read processes) they have currently!

But ok, that's in the real world, and I'm reluctant to leave it. You
want to consider a single big box of devices? Consider simulation
models. Think about architectural design and simulation in the car
industry. Consider the design and simulation of space probes. Think
about geological and astro-physical simulations. Components of each of
these may indeed be SPMD (or SIMD for that matter) but the whole will be
a MIMD concurrent system.

In the next century this will be small fry (i.e. 2K separate programs).
Expect systems with 100's of millions of concurrent computing
components. Two years ago I pooh poohed Ian Barron when he told me
people should be thinking in millions of processes. He'd been having his
house decorated at the time and I figured the fumes were getting to him.
But he's right in my view now.  The only obstacle is economic. No small
obstacle that - ask yourself why there is no exploitation of the moon.
It may be that we build one or two 10K processor machines before we
reach the end of the century ... and nothing interesting happens beyond
that until the year 2100! Imagination is what we need! We're too short
sighted at the moment.

But it's a problem. How is anyone going to manage such systems? Well, I
don't think it will be one person - just as the large systems I've
mentioned are not now programmed or even managed by a single person. I
expect someone to discover the holy grail of concurrent systems in the
next ten years or so ... and I think that's what Vaughan meant.

Incidently, the theories of concurrency don't ignore data distribution.
In fact, that's what we're all preoccupied with just now.

Oh, and just in case someone comes back and says I can still do all the
above using a SPMD model my reply must be that a) you must have enormous
resource on each node and b) all you've done is provide a scheduling
mechanism for a MIMD program. :-)

Mmmm. I still haven't answered Vaughan's comments.

Steven
--
Steven Ericsson Zenith <zenith@ensmp.fr>
Center for Research in Computer Science
Ecole Nationale Superieure des Mines de Paris. France


-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

berryman-harry@CS.YALE.EDU (Harry Scott Berryman) (04/21/91)

WARNINIG! I AM NOT A HARDWARE JOCK! TAKE MY HARDWARE COMMENTS WITH
                      A GRAIN OF SALT! 

In article <1991Apr18.120233.339@hubcap.clemson.edu> zenith@ensmp.fr (Steven Ericsson Zenith) writes:
>
>They've put together a desktop supercomputer out of a standard 386 PC,
>MSDOS, added UNIX System V, a bunch of transputers each of which connect
>and have shared memory access to a 40MHz i860. The company was founded
>by one of the guys who did the core work on Helios - so I'd expect them
>to live up to their "seamless" claim.

There are two serious problems I see with this kind of design. The first
problem is that the i860 will be far faster than the transputers which link
them. The second is that the i860 will be need heaps of fast memory or it will
be cache-bound. These two problems actually reer there ugly heads on the
iPSC/860. The network is not nearly as fast as the processors and the i860s
spend a lot of time waiting for memory. The two problems agrivate each other.
It is an intersting set of parts they've picked.

Custom CMOS may help us out of a bind on some of the parts. You can 
build a custom chip for under $100,000. For what target audience would we
be building such a beast for? A dbase machine has far different constraints
from a scientific machine. There is no Cray version of Ingress, last I heard.

As the whole world (at least my part of it) seems intent on building the
better scientific parallel machine, I suggest we talk about a different
set of problems. Two come to mind:

1) Reservation systems (like for airlines and hotel chains)
2) Digital signal processing (in real time)

I know very little about either application. Anyone out know any of the
requirements for such systems, or have suggestions for other problems?

I agree with you, Steve, this party's been a bit of bore lately. Let's
get some action out here.

Harry Scott Berryman 
Yale Computer Science Department and ICASE/NASA Langley Research Center
berryman@cs.yale.edu

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

anand@top.cis.syr.edu (R. Anand | School of Computer and Information Science) (04/21/91)

  Here at Syracuse University, I am surrounded by parallel computers of
all sorts ranging from a Connection Machine to an Encore Multimax.
Even as I type this, I am running a parallel neural-net backpropagation
training program but it not on one of these machines.  Instead, it is
running quite happily on 8 SPARCstation 1+s.  I get a speedup of about
7 with 8  workstations and the resulting throughput is quite
acceptable.

I have seen a number of other postings in the past few months that
indicates that this simple approach has been used successfully by a
number of other people.  I am now coming to the conclusion that just
like the LISP machines were made obsolete by the rapid increases in
speeds of general-purpose microprocessors, the parallel computer as we
know it today is going to face some sever challenges by networked
workstations.

My program uses ISIS for communication and XDR routines to ensure data
compatibility between different machines. Since broadcasting is the
only kind of communuication available in ISIS, it is fortunate that
thats all I need. There is one master processor which broadcasts data
to the compute servers and gets back results.

With a conventional parallel computer, you are locked into using
whatever type of processor the machine was designed for. In my case, I
face no such problems. I will be shortly taking advantage of a set of
RS6000s that have become available here and soon thereafter a set of HP
snakes.

In addition, I do not have to use a crummy program development
environment of the sort that usually comes with most parallel
computers. Instead, I use standard g++ (my programs are written in C++)
and gdb which I find to be more than adequate.

Load balancing is trivial with my setup. At the end of each iteration,
the compute servers report the time taken per unit of work to the
ovserseeing processor. Then for the next iteration, this overseer then
apportions out the work in inverse proportion to the time taken.  If
someone logs into one of my compute servers, the load balancing simply
shifts work elsewhere. If a machine crashes, ISIS can handle this
situation quite well. My program simply regroups and continues with
fewer compute servers.

Ok, so now you tell me: It may be fine for you but my program is not so
coarse-grained. 

My answer is: Go back and take a second look at your problem. It may
well be possible to formulate a coarser version that what you are using
now. 

Here is a tip: Many of the programs used on the Connection Machine are
based on broadcasting. It is very likely that you may be able to adapt
something you find there.


R. Anand                | School of Computer and Information Science
anand@top.cis.syr.edu   | Syracuse University.

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

pratt@cs.stanford.edu (04/22/91)

In article <9104191644.AA17015@hubcap.clemson.edu>
hugo@griggs.dartmouth.edu (Peter Su) writes:

>One of the assumptions behind this statement is that concurrency as it
>is defined by computer science (i.e. The interaction of independent
>processes executing programs asynchronously, posets, partial orders
>among events in time, temporal logic, whatever) is relavent to the
>construction of parallel machines and algorithms for computational
>problems.

>As has been noted before (see IEEE Software, Sept. 1990), this
>definition of concurrency, and the formal models surrounding it are
>all motivated in some sense by the design and implementation of
>operating systems.  One could argue that they are largely irrelevant
>to the design and implementation of parallel algorithms.

Denying a connection between operating systems and parallel algorithms
is like denying a connection between planetary motion and gyroscope
behavior.  If your account of planetary motion is in terms of Ptolemaic
epicycles then there is no connection.  But if your account is in terms
of Newton's laws then there is a very beautiful connection.

The "epicycle theory" of operating systems is the view of
asynchronously communicating sequential processes mediated by buffered
channels.  This view sheds little light on parallel algorithms other
than those that clearly incorporate such channels.  In contrast
"Newton's laws" for operating systems view their activity in terms of
the dual notions of event occurrences and state trajectories, which is
as much the basis of parallel algorithms as of operating systems and
can shed much light on both.

>For example, in the theoretical computer science community, most
>parallel algorithms are designed not with concurrent process algebras,
>pomsets, or any of that.  Rather, the model of computation is SPMD
>shared memory.  In other words, the emphasis is on operating on
>parallel data structures, not on coordinating concurrent processes.

These are only distinguishable if you insist on too narrow a definition
of "coordinating concurrent processes."  Presumably you are thinking in
terms of coordination by channels, semaphores, monitors, etc., which
certainly do not feature prominently among the coordination primitives
used by many parallel algorithms.

A major shortcoming of parallel programming today is a lack of good
primitives to serve as a basis for structured parallel programming.  A
decent process algebra will supply such primitives.  These will be of
at least as much help in writing, understanding, and verifying SIMD
programs as if-then-else's and while-do's are for sequential programs.
Then you will find people starting to design their parallel algorithms
with concurrent process algebras (for which pomsets are just one of
several plausible models) instead of with sequential control
structures.  Although a good deal of progress in that direction has
been made during the 1980's, we still have a long way to go before that
vision is fully realized.

>The reason for this is simple, the amount of parallelism that you can
>obtain scales naturally with the size of the problem.  So, the most
>natural way to get a lot of parallelism is to write one program, and
>replicate it.

That approach constitutes the assembly language of parallel
programming: Turing powerful but devoid of structure.  In mathematics
you could argue with equal logic that the natural way to build a vector
space is start with one point and replicate it.  That gets you as far
as the underlying set of the space, and indeed to many people a
geometric space *is* the set of its points.  But that doesn't do
justice to vector spaces.  Set theory is only the assembly language of
mathematics.  A vector space adds structure to that set,
differentiating it from other set-based structures such as lattices and
graphs and grammars.  Types are more than just sets, and parallel
programs are more than just replicated programs: they have useful
structure.  If you don't see that structure you don't understand your
program efficiently.

Structured programming was a hard sell in 1970, and it doesn't seem to
have gotten much easier 20 years later.  When it's available and
working smoothly many people buy it and use it, yet do not acknowledge
it.  And when it's not available they insist they're doing just fine
without it.  Some acknowledge it, but it is depressing that so many do
not.

In between are those that acknowledge the past but not the future:
they like what's happened so far but refuse to believe that anything
more can happen.  They believe that all the good inventions in
structured programming have already been invented and it is now a
closed subject.

There is some structure available now for structured parallel
programming, but nowhere near enough in my opinion.  A whole lot more
work is needed before the tools are in place to create really
insightful representations of concurrent programs.  Tools based on the
view that a concurrent program is a collection of communicating
sequential processes, or that it is the sum of its execution sequences,
are not universal enough to be useful to people working in parallel
programming.  We need much better views in order to understand our
parallel programs insightfully.  Better views of concurrent computation
are among the most important goals of modern concurrency theory
research.

>Is it really realistic to expect that people could
>program a 2K MIMD machine susing two thousand separate programs?  I
>don't think so.  

This has nothing to do with concurrency theory, which neither limits
itself to MIMD nor insists that the n programs running on an
n-processor MIMD machine all be distinct.

>Thus, the main problem in parallel computing from my point of view
>(programmer and algorithm designer) is how to manage the placement and
>movement of *data* in the machine.  So, I wonder what all the theories
>about concurrency have to do with this.

They have everything to do with it.  They tell you how to think about
it happening in parallel.  If you have a better theory of how data can
move and interact concurrently than the present theories of concurrency
then you're a contributor to this field whether you admit it or not.
If you don't have a better theory then one must infer from your
objections (to the work of those who do) that you don't believe the
parallel programming world needs a theory of data movement and
interaction.

	Vaughan Pratt

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

will@uunet.UU.NET (William Pickles) (04/22/91)

zenith@ensmp.fr (Steven Ericsson Zenith) writes:

>Comp.parallel's been a bit boring lately, so come on guy's. Building a
>parallel supercomputer today out of off the shelf parts has got to be
>real easy (given the time and money). Hasn't it? How about candidates

... for a real programming model for this type of machine. Any fool can
build it but the article in Decembers Scientific American shows that most
supercomputers end up running two orders of magnitude slower than their
build speed. Programming for consistent speed is not easy

WIlliam Pickles

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

zenith@ensmp.fr (Steven Ericsson Zenith) (04/23/91)

   zenith@ensmp.fr (Steven Ericsson Zenith) writes:

   >Comp.parallel's been a bit boring lately, so come on guy's. Building a
   >parallel supercomputer today out of off the shelf parts has got to be
   >real easy (given the time and money). Hasn't it? How about candidates

   ... for a real programming model for this type of machine. Any fool can
   build it but the article in Decembers Scientific American shows that most
   supercomputers end up running two orders of magnitude slower than their
   build speed. Programming for consistent speed is not easy

   WIlliam Pickles

William, you haven't been paying attention. You are, of course,
absolutely right. But please, don't tempt me ... ;-)

Steven
PS. Requests for addition to the Ease mailing list can be sent to
- ease-request@ensmp.fr :-)
--
Steven Ericsson Zenith <zenith@ensmp.fr>
Center for Research in Computer Science
Ecole Nationale Superieure des Mines de Paris. France

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell