[comp.arch] Multiprocessors and UNIX

leo@fulcrum.bt.co.uk (Leo Howe ) (01/17/90)

In a previous article submitted to comp.os.research I (Leo Howe) wrote:

>> As part of an investigation into the performance of UNIX on
>> various hardware configurations, I am currently involved in
>> a study of the 'behaviour' of UNIX on multiprocessor based
>> hardware.
>> The aims and objectives of the study are:

>>         a) To find out which M'processor hardware architectures
>> have been employed for running UNIX.

>>         b) To find out which Operating system styles have been
>> employed, e.g Master/Slave, Symmetrical e.t.c.

>>         c) To determine the performance implications of porting
>> UNIX to M'processor based hardware.

>>         d) To find out what AT&T are doing in order to develop a new
>> organisation of UNIX for making efficient use of M'processor
>> based hardware.

>>         e) To find out whether or not UNIX retains it's portability
>> across M'processor systems.

>>         f) To obtain information on the price/performance ratios
>> of M'processor systems running UNIX.

>>    I would be greatful for any information on the above areas, and
>> also any references.

>>    The main investigation also involves obtaining a price /
>> performance model for computer systems which can be used to determine
>> whether or not the increase in performance is worth the increase in
>> price. Any information on currently used price/ performance models,
>> and references on the said subject are also welcome.

>>    Please e-mail any information to me at:
>>                             leo@fulcrum.bt.co.uk

>>    Thank you.
>>           LEO.

	The following is summary of the replys that I have
received to date.
If anyone else would like to make a contribution please e-mail
me at the address given above.

------------------------------------------------------------------------

Have a look at 

\bibitem{Rus87} C. H. Russell and P. J. Waterman, ``Variations on UNIX for
Parallel-Processing Computers'', {\em Comm.\ ACM} 30,12, (Dec.\ 1987),
pp.\ 1048--1055.
-Describes a variety of UNIX-based O.S. efforts, with by far the most
space devoted to CMU's MACH. The BBN Butterfly implementation of MACH is
described. Other systems, including U of T's Tunis, are described briefly.

\bibitem{26}P. Ewens, D. Blythe, M. Funkenhauser, and R. Holt,
``Tunis: A Distributed Multiprocessor Operating System'',
{\em Proc.\ Summer USENIX}, June 1985, pp.\ 247--254.

This is symmetric Tunis. There is also a master-slave version
of Tunis, unfortunately I don't have a specific reference.

Then there's the Mach stuff

\bibitem{Acc86} M. Accetta et al., ``Mach: A New Kernel Foundation for UNIX
Development'', {\em Proc.\ Summer USENIX}, July 1986, pp.\ 93--112.

\bitem{You87} M. Young et al.,
``The Duality of Memory and Communication in the Implementation of a
Multiprocessor Operating System'', {\em Proc.\ 11th ACM Symposium on Operating
Systems Principles}, Nov.\ 1987, pp.\ 63--76.

and surely several others...

None of these papers is that recent. You should be able to find plenty of
other stuff with '88 or '89 stamped on it.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I saw your posting to usenet concerning Unix and Multiprocessors.

I am the director of the Centre for Multiprocessors (CMP), part of the Computing
Laboratory of the University of Newcastle upon Tyne. Part of the role of
CMP is to provide British industry with access to (Unix-based) shared
memory multiprocessors so they can answer questions such as the ones you
were posing. As well, we can help provide answers, benchmarking facilities,
etc. We have two Unix-based multiprocessors here at Newcastle (Encore
Multimaxes) and we are very familiar with their behaviour.

I've got some information about CMP (and multiprocessors). If you would
like to receive that, please send me your postal address. 

If you have any other questions, please email me.

Pete Lee
Prof. of Computing Science.

----------------------------------------------------------------------------

Had your request on MP UNIX forwarded to me.  Silicon Graphics sells a
system called the "PowerSeries" which features from 1 to 8 MIPS R3000 RISC
processors in a multiprocessor configuration.  The operating system is
a symmetric, fully threaded version of System V.3 with Berkeley extensions
called IRIX.  It has been shipping for 20 months, and there are > 1500 units
currently in the field.  The 4D/280, the largest configuration (8 processors)
runs $179K in the base configuration, with a Dhrystone performance of
160 MIPS (8 copies in parallel), thus being about $1000/MIP.  IRIX has
been ported to an old machine (CDC Cyber 180) and a new supercomputer (as yet
to be announced [ not by us ]).

You can get lots of good information, to satisfy just about any need, by
contacting the UNIX International Multiprocessor Working Group.  The
representative is Tom Bishop, bishop@uiunix.ui.org.  They have a trailerfull
of information on state-of-the-art multiprocessor UNIX.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

	a)
The Monash Multiprocessor is a tightly-coupled shared memory machine in which
all memory is accessible to all processors.
The Monash Multiprocessor runs a capability based operating system kernel.
We are currently implementing a version of Berkeley 4.3 UNIX on top of it.

	b)
Our UNIX port is being implemented in-process. A 'Unix' user process on our
capability-based machine essentially carries around its own copy (actually it
is shared) of the UNIX kernel which is called like library subroutines. In that
sense it is a symmetrical style, in which each process performs its own
kernel services. The capability system provides sufficient protection to do
this.

	c)
Too early to say until the system is complete but we anticipate that it should
run better than conventional ports because we believe our capability-based
system to be more efficient. We can see no reason for performance penalties
compared to conventional systems providing that a proper multithreaded kernel
is implemented. However this is a non-trivial task. The simplest ports are
master/slave arrangements; to gain fully symmetrical operation without
performance bottlenecks requires a complete kernel reimplementation.

	d) 
I also would like to know.

	e) 
Standard UNIX without using machine dependent functions should not present
portability problems. Of course how do you get people to write portable
programs.

	f)
I also would like to know.

If you are interested in our capability-based multiprocessor kernel and
architecture have a look at 'The Computer Journal', Vol 29, No 1, 1986, pp 1-8.
The UNIX implementation has a fair way to go yet. It could be a year or more
before it is complete, although it is already to the stage where we can run
a Bourne Shell.

-------------------------------------------------------------------------

HP/Apollo has the DN1000 running UNIX on upto 4 processor
and HP just announced the HP9000 8?0.

The DN1000 is a symetric multi-processor.

You know about CMU's MACH, right?

Solbourne has Sun4 compatibles running in a master/slave multiprocessor
configuration.

It's been done several other times, Encore, universities...

-Jan Hardenbergh - jch@apollo.hp.com - HP / Graphics Technology Division

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Here's a few quick answers to your query.  Send specific questions
if you need more information.

a) UNIX has been ported to the Cray X-MP, Y-MP and Cray-2 systems.
   All use multiple (2 - 8) processors attached to a single central
   memory.  The X/Y-MP has shared register and multiple semaphore
   support, the Cray-2 has a single semaphore.

b) Symmetrical, but largely single threaded.  We multithread our interrupt
   and system calls, but not multiple simultaneous system calls.  Our
   average workload is much more compute bound than most UNIX systems
   (95+% user time) so this less of a problem than one would expect.
   Larger numbers of cpus are making this issue more important than before.

c) ???

d) Good luck.  All I know about is the multithreaded 3B20 system.  I think
   more/better work has been done since then.

e) If you single thread system calls it is surprisingly portable.

f) Price performance for what?  Syscalls or floating point operations?

If you want more information about the Cray UNIX port, there is a paper
by Tim Hoel and Bruce Keller in the Denver Usenix proceedings (Jan '86?).
Its a little dated, but might provide a few more details.

-Vic Lee -- vtl@dumpster.cray.com

---------------------------------------------------------------------------

To find out the ratio, you need to specify the environment:
number crunching, i/o, database, tp, etc.
Ed

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Hi Leo,

I have picked this up from USENET and feel that Sequent could probably
give you a lot of information on how we have implemented UNIX on top
of a multiprocessor based hardware platform. If you can send me your
postal address and phone number I'd like to send you some detailed
information on Sequent hardware and our operating system and maybe arrange
a meeting to pass on some additional details on our version of UNIX.

As a starting point here are some very short answers to some of the questions
but I'd really like to send you more details in the post.

I look forward to hearing from you,

Andy Gadsby

gadsby@lonsqnt.co.uk

         a)
Sequent Symmetry and Sequent Balance (2 - 30 processors)
 
         b)
Full Symmetrical.

         c)
We see linear performance improvements as processor are added provided
you balance the system in memory and I/O terms.

---------------------------------------------------------------------------

The cheapest multi-processor Unix system (actually Xenix at the moment) of
which I am aware comes from Corollary.  It lets you put up to five 386
processors in a single 386 cabinet.  Their serial card supports up to 64
ports, so you can get a fairly hefty system.  The problem with it (as I
see it) is that you can't put enough memory into the system.
I know about these only because Unixsys S.A. (our French counterparts) are
involved with them.  I understand that a 386/ix version will be available
at some point, which will of course be much better than Xenix.

This is a Master/Slaves system.

Then there are machines like the Sequent Balance and Symmetry.

These are powerful, but the operating system is a mixture of BSD and System V,
which turn out not to sit well together.  They are more symmetrical.

In either case, a Unix process only ever runs on a single processor (although
Sequent at least has system calls to allow one to state a preference, and
some limited support for light-weight processes.  They offer Mach as an
option, although probably not in the UK because (until very recently) Mach
came under USA export restrictions.

The i80486 + i80860 combination is one that will almost certainly take off
next year.  The first machines are already appearing, and Mach is being
ported, which will give full BSD compatibility plus light-weight processes,
etc., and should be very, very fast.


Acer Counterpoint also have a multi-processor system based on the Moterola
68K series.  You can add processors, I think up to 32 (you are welcome to
telephone me for more info, although I am in Canada (visiting SoftQuad)
until the 21st Dec.)

        c)
If you are planning to do this port yourself... 't ain't easy :-) :-)

        d)
Not clear, but look at the work Intel is doing with Mach.

        e)
No.  But it can be made to do so, with some (not all) of the caveats of
systems like NFS.  The better implementations (e.g. Sequent) are more like
RFS in some ways -- you don't notice that there is more than one CPU.

        f)
This is harder.  Things are changing too fast!

	g) performance model
This depends *entirely* on your applications.
Running a single, large program may well go no faster on a Sequent than on
a fast 386 under 386/ix, becasue it is running on a single CPU.
The big difference is only felt when you have a lot of users, or are
doing work involving several processes -- e.g. cc or make.
The performance increase is in higher throughput.

Lee      lee@sq.com
Liam R. Quin, Unixsys (UK) Ltd., Knutsford... +44 565 50021
(note: new number -- we have moved.)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>>> I work for Sequent Computers in Technical Marketing.  I'm
>>> going to try to answer your questions here but I'd suggest
>>> calling our London Office and talking to one of our analysts
>>> or give me a call if you prefer.  I'm in Beaverton, OR USA
>>> The number is (503) 526-4280.  Sorry, I don't know the country
>>> codes.

        a) 
>>> We have a 30 Intel 386 processor symmetric architecture that runs a
>>> derivative of UNIX called DYNIX

        b)
>>> We use a symmetric architecture.  All CPUS do everything.  There
>>> are no terminal processors or file system processor or
>>> interrupt processors.  

        c)
>>> It goes real fast :-).  Seriously, we see near linear speedups
>>> in throughput without modifications to code.  A single program
>>> will run at the speed of a 386 but can be modified by hand to
>>> run across all processors.

        d) 
>>> Don't know.

        e)
>>> It is portable at the user level.  All programs function
>>> properly and all interfaces have been maintained.  The
>>> kernel looks different from "standard" UNIX.  However, our
>>> implementation is still as portable as "standard" UNIX.
>>> We have actually ported it from the NS32032 to the 386 and
>>> are about to port it to the 486.

        f)
>>> You'll have to talk to the sales guys about prices.  I don't
>>> know them.  But, I do know that price/performance is one of
>>> the major reasons for our success.

>>> Bart Kessler
>>> Sequent Technical Marketing
>>> Beaverton OR
>>> ..!uunet!sequent!kessler

---------------------------------------------------------------------------

In a recent article in SIGARCH Barry Wolman and Thomas Olson describe a
benchmark for databases.  They show the relative performance of several
different systems under load.  They do not indicate the price.  

The benchmark is available from them as they have agreed to provide it to
those who ask.  If you do not have access to SIGARCH for October,
please feel free to contact Barry Wolman directly: barry@s66.prime.com.

Ed Feustel
Prime Computer

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I don't have much info, but HP produced (past tense, it's been
discontinued) a multiprocessor Unix starting around 1983 or so.  The
series 500 had one to three CPU's.  You might try contacting your local
HP sales office for information.

Rob Sartin			internet: sartin@hplabs.hp.com
Software Technology Lab 	uucp    : hplabs!sartin
Hewlett-Packard			voice	: (415) 857-7592

---------------------------------------------------------------------------

         a)
I am familiar with two: the Encore Multimax and the Sequent balance/symmetry.
The Encore and the Balance use the NS32K family chips and the Symmetry uses
the i386.

         b)
All three (the balance and symmetry are roughly identical to the software
and OS) use the symmetrical model.  

         c)
Wonderful for a many user system.  To double your processing power double the
number of processors.  The jobs get done in the same amount of time, but twice
as many jobs can be run at the same time.  If YOU need double the processing
power, you either start to do parallel software work, or you get better
hardware.

         d)
Dunno.  

         e)
Certainly, at least on our Sequent Symmetry, almost every piece of software
I've tried to make work has done so.

Contact Sequent or Encore for sales hype info.  There are also some articles
on these machines that you might try to find (I haven't seem them in a while
and my access to the on-line database for this is down for the count).

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Hi, Leo,

	I'm a software engineer for Encore Computer Corp.  We build a
multiprocessor system called the Multimax.

a.  The current generation of Multimax hardware (the 520 series) is
    configurable for 2 to 20 ns32532 micro processors with up to 128 meg of
    memory.   It is a symmetric shared-memory multiprocessor.

    The next generation of Multimax hardware will be based on the motorola
    88k risc processor, configurable from 4 to 32 (40?) processors and up
    to 512 meg of memory.

b.  Our operating system is symmetrical.  I don't think multiprocessor box
    with more that 2 or 4 processors can use a master-slave approach.  It's
    too much of a bottleneck.

c.  Performance implications?  Can you be more specific?  I don't know how
    to answer this one.

d.  AT&T is currently looking at multiprocessor issues for the sys V.4
    release.  What they will really do probably isn't decided yet.

e.  Serial unix applications maintain portability to multiprocessor system.
    Since there is currently no standardization for multiprocessor
    extensions, each vendor has done their own thing.  As such,
    multiprocessor applications tend not to be portable.  As for the OS, a
    lot of time and effort has to be invested to multi-thread the operating
    system. 

f.  Price / performance ratios for multiprocessor systems is generally MUCH
    better than for sequential systems.  Our 520 line provides between 17
    and 170 vax mips (depending on the number of processors).  A fully
    loaded (170 mip) system costs between $500k and $750k.  If you compare
    that price to the $2m price of a large vax or IBM (which has a much
    lower mip rating), our price/performance ratio is fantastic.  I'm sure
    encore's marketing department can give you exact numbers.

				       -- Mark Guzzi
					  Encore Computer Corporation
					  guzzi@encore.com
					     (or guzzi@encore.encore.com)

---------------------------------------------------------------------------

        a) 
Encore has both 4.2BSD, 4.3BSD, Mach, and System V all ported to
our Multimax product line.  The Multimax is an NS32XXX (32332 and 32532)
based system which supports up to 20CPUs (20 532's gives you about 170MIPS)
and 128MB of memory (512MB in a few months when our new memory card comes out).

        b)
All of our systems are symmetric.  When we first started with 4.2BSD we
parallelized the kernel straight off.  When we started with Mach it was
a master/slave system.  Performance sucked (and that is being generous!)
We spent quite a bit of time parallelizing it.  The results of this can
be seen in two papers which we've published.  One in the proceedings of the
Usenix Workshop on Distributed and Multiprocessor Systems entitled
"The Parallelization of Mach/4.3BSD: Design Philosophy and Performance
Analysis" by J. Boykin adn A. Langerman.  The second will be presented
at the Winter Usenix conference.

        c)
Big win.

        d)
Who knows?!?  Unix International has made a shitty proposal to them.
Given that OSF will be coming out with an MP kernel late '90 (based on
the work done here at Encore), AT&T is nervous.  In fact, we're talking
to them about selling them our System V MP technology!

        e)
It depends on the MP system.  If it is a symmetric shared-memory multiprocessor
the stuff that we've done is very portable.  If you start getting into things
like cubes, or what I call "multi-computers" such as the BBN Butterfly,
you have a different set of problems.

        f)
They generally have a much better ratio.  I don't have pricing numbers
handy on our systems, but if I remember right a 32332 based system with
4MIPS lists at about 89K;  each additional 4MIPs is something like another
20K (don't quote me).  Discounts are available (definately don't quote me!)

----

Joe Boykin
Manager, Mach OS Development
Encore Computer Corp
Treasurer, IEEE Computer Society

Internet: boykin@encore.com
UUCP: encore!boykin