[comp.arch] MYRIAS - yet again

mdr@reed.UUCP (Mike Rutenberg) (12/07/89)

Ok, I hate to harp on it, but I have recently seen lots of boasts
in the non-technical Canadian media about Myrias and their super duper
fast computer.

Does anyone have *any* details of the architecture that they can talk about?
Is Myrias simply a /dev/null for tax dollars (both Canadian and DARPA)
or are there real machines with a real architecture that are being
used for something real?

Mike
-- 
Mike Rutenberg
BITNET: mdr@reed.bitnet      UUCP: uunet!tektronix!reed!mdr

grunwald@foobar.colorado.edu (Dirk Grunwald) (12/08/89)

We have one here. I haven't used it, but I've seen speedup curves for
it.  As best, it does about 10Mflops, because they're using 68020's or
with 68881s.

The myrias architecture is based on parallel loops. Completeley parallel.
It's not a shared memory machine.

Imagine you have a parent process with a page of memory. It spawns a
child.  The parent & child touch the page. The page is copy-on-writed.
We now have a Master (M), Parent (P) and Child (C) copy.

When the child dies, we 'merge' the different pages.
Semantics are:
	+ parent or child touch a bit -> get their new bit
	+ both parent & child touch same bit -> get junk

Do this using M xor P xor C, for the entire page.

It's not clear to me that they have any special hardware for this.
Some say aye, some say nay. They do the merging in background. Aside
from the above (which is a simple O/S hack), the box is basically a
lot like an IPSC/2 - circuit switched networks to shuffle 4K pages
around. One could wonder why they didn't just repackage the IPSC/2,
since that would be faster.

If you have a parallel fortran loop with no cross-iteration
dependencies (=), this is a godsend, 'cause it's cheap parallelism. If
you have *any* cross iteration dependenceies (< or >), then it sucks,
because there is *no* (according the mtg I went to) synchronization
between processors.

For the certain applications it was designed for, it's a reasonable
design (although not terribly fast).

Dirk Grunwald -- Univ. of Colorado at Boulder	(grunwald@foobar.colorado.edu)
						(grunwald@boulder.colorado.edu)

ingoldsb@ctycal.UUCP (Terry Ingoldsby) (12/13/89)

In article <13683@reed.UUCP>, mdr@reed.UUCP (Mike Rutenberg) writes:
> Does anyone have *any* details of the architecture that they can talk about?
> Is Myrias simply a /dev/null for tax dollars (both Canadian and DARPA)
> or are there real machines with a real architecture that are being
> used for something real?


Myrias is based in Edmonton Alberta.  They are real, and have a real
(as in working) multiprocessor machine.  I believe the machine uses
68030 processors.  It seems to me that they have several US DOD
contracts.


-- 
  Terry Ingoldsby                       ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                           or
  The City of Calgary         ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

serafini@amelia.nas.nasa.gov (David B. Serafini) (12/14/89)

In article <515@ctycal.UUCP> ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes:
>In article <13683@reed.UUCP>, mdr@reed.UUCP (Mike Rutenberg) writes:
>> Does anyone have *any* details of the architecture that they can talk about?
>
>Myrias is based in Edmonton Alberta.  They are real, and have a real
>(as in working) multiprocessor machine.  I believe the machine uses
>68030 processors. 
>-- 
>  Terry Ingoldsby                       ctycal!ingoldsb@calgary.UUCP
>  Land Information Systems                           or
>  The City of Calgary         ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

The processors are 68020's.  They are going to 88000 and hope to ship mid '90.
The machine is hierarchical.  There are 4 processors on a board, with a full 
interconnect between these.  There are 16 boards in a card cage.  There are 
two busses that interconnect the boards.  Any board can use either bus.  
There are 5 serial com lines coming out of each cage, so they can be inter-
connected.  I think they've built a 512 proc. machine, but I might be wrong.
I believe the com lines are FDDI using the AMD chip set.  The number of lines
is determined by how many chips fit on a board.

It's a real machine.  They've sold some.  The software is more important than
the hardware since they're trying to build a programming paradigm that will be
both easy to use and easy to port.  They claim that converting old code takes
hours or days instead of months.  Basically anything that can be vectorized
on a Cray can be parallelized on the Myrias.  They downplay the issues of
interconnect performance (latency and bandwidth) more than they should (IMHO),
but for some applications it has great potential for scalability.  Like the
i860 Intel iSPC, the 88000 Myrias will have performance like a full-up Y-MP,
if you can get at it.

<dbs>

David B. Serafini				serafini@ralph.arc.nasa.gov
Rose Engineering and Research 
@NASA/Ames Research Center
MS 227-6
Moffett Field, CA 94035

grunwald@foobar.colorado.edu (Dirk Grunwald) (12/15/89)

DBS> the hardware since they're trying to build a programming paradigm that will be
DBS> both easy to use and easy to port.  They claim that converting old code takes
DBS> hours or days instead of months.  Basically anything that can be vectorized
DBS> on a Cray can be parallelized on the Myrias.  They downplay the issues of


While it may be possible, I don't think it's practical. According to
the talk myrias gave here ( we have one somewhere, see ealier note) 
there is no synchronization possible.

Thus, you can't cheaply parallelize..

   Do I = 2, N
    A(I) = B(I) * C(I)
    D(I) = A(I-1) * C(I)
  end

On the Cray, this would be vectorized:
	A(2:N) = B(2:N) * C(2:N)
	D(2:N) = A(1:N-1) * C(2:N)

On a machine with synchronization, you could say:

   Doall I = 2, N
    A(I) = B(I) * C(I)
    POST(A,I)
    WAIT(A,I-1)
    D(I) = A(I-1) * C(I)
   end

or
   Doall I = 2,N
    A(I) = B(I) * C(I)
   end
   Doall I = 2,N
    D(I) = A(I-1) * C(I)
   end

The myrias forces the latter, because of no synchronization. You could
optimize this a little...

   S = (N-2)/Processors
   Doall IP = 1,S
   Do I = IP, IP + N - 1
    A(I) = B(I) * C(I)
    if (I != IP )
	D(I) = A(I-1) * C(I)
   end
   end
   Doall I = 1,S
    D(S * (N-2) ) = A((S * N-2)-1) * C((S*(N-2)))
   end

(more or less -- you just strip mine the loop based on the number of
  processors, execute all first statements, and only the second statements
  that are local to your strip, merge pages and then assign all
  cross-process iterations)

But you'll need to force a page merge betwen the two doall loops (
think they call them 'pardo' or something).

It's not clear to me this that this going to be faster than e.g.  a
CM-2 or a Cray.

For loops involving no cross-iteration dependence, however, it should
work well. I belive this is what they had intended, by the way, because
the designers (a physicist?) had several probelems with  no cross
iteration dependence.

cmt@myrias.com (Chris Thomson) (12/18/89)

In article <13683@reed.UUCP> mdr@reed.UUCP (Mike Rutenberg) writes:
>Ok, I hate to harp on it, ... Does anyone have *any* details of the
>architecture that they can talk about?  Is Myrias simply a /dev/null ...
>or are there real machines with a real architecture that are being
>used for something real?

Yes, we are real, and so is our system.  Architectural info to follow
in subsequent postings.

We have 8 systems installed:
   - Myrias, Edmonton, Canada: 768 PE's (varies)
   - Alberta Research Council, Edmonton, Canada: 64 PE's
   - Department of Defense, Maryland, USA: 128 PE's
   - Department of National Defense, Ottawa, Canada: 128 PE's
   - University of Calgary, Calgary, Canada: 64 PE's
   - Colorado Center for Applied Parallel Processing, Boulder, USA: 64 PE's
   - Air Force Weapons Lab, Albuquerque, USA: 64 PE's

The seven offsite systems have been shipped since April 1989.

It's not really for me to say, but my impression of what is done with
these systems is that it is "real".  Perhaps some of our users will
comment.  Application areas I know of include seismic, chemistry,
biochemistry, physics, ray tracing, marine biology, and others.
-- 
Chris Thomson, Myrias Research Corporation   uunet!myrias!cmt or cmt@myrias.com
900 10611 98 Ave, Edmonton Alberta, Canada   Tel 403-428-1616  Fax 403-421-8979

brb@myrias.com (Brian Baird) (12/19/89)

In article <629968724.4593@myrias.com> cmt@myrias.com (Chris Thomson) writes:
Chris> We have 8 systems installed:
Chris> - Myrias, Edmonton, Canada: 768 PE's (varies)
Chris> - Alberta Research Council, Edmonton, Canada: 64 PE's
Chris> - Department of Defense, Maryland, USA: 128 PE's
Chris> - Department of National Defense, Ottawa, Canada: 128 PE's
Chris> - University of Calgary, Calgary, Canada: 64 PE's
Chris> - Colorado Center for Applied Parallel Processing, Boulder, USA: 64 PE's
Chris> - Air Force Weapons Lab, Albuquerque, USA: 64 PE's
Chris> The seven offsite systems have been shipped since April 1989.

The eighth site (for those of you keeping count) is
       - University of Alberta, Edmonton, Canada: 64 processors
-- 
Brian Baird				brb@myrias.com
Myrias Research, Edmonton		{uunet,alberta}!myrias!brb