[comp.arch] new Gould NPL

uusgth@sw1e.UUCP (03/27/87)

       On Wednesday, March 25, Gould, Inc. announced via world-wide
       satellite down-link the first of their new super
       -minicomputers, the NP1.  The NP1 is the first of a new
       architecture from Gould, called the NPL series, that brings
       ultra-high performance to minicomputers at price performance
       factors well below the current industry.

       The NP1 is an ECL, gate-array processor sitting on a very
       fast bus (154 megabytes per second). With 2 cpu's on the
       same bus, coupled with a math accelerator unit, it does 12
       MIPS sustained at a base price of about $400,000. With a
       multiple, coupled bus option, 8 cpu's can co-exist for close
       to 100 MIPS, addressing 2 gigabytes of 54 nanosec memory.

       A more apt description for this "fire-breather", as Gould
       likes to call their computers, is "mini-supercomputer".
       Indeed, the new NP1 has a special link that allows 100
       megabytes/second access to a Cray. All this sitting in a
       fairly-rugged standard air-cooled cabinet not much bigger
       than a large filing cabinet. Of course, being the "old
       battery" company that Gould is, you can hang as much un-
       interruptible battery power on it as you want; and it has a
       built-in micro-processor for environmental control.

       Unix is THE operating system, complete with Bell, Berkeley,
       and the real-time extensions that Gould has pioneered in the
       real-time world.  A full complement of network connectivity,
       compilers for all the major languages, and a VME bus offer
       users almost anything they could ever want.

       Gould is betting their entire Computer Systems division on
       this architecture; expecting the NPL line to be 20% of their
       1988 revenues, and then the majority of their business in
       the 1990's. It should be interesting indeed to watch this
       development vs. HP's similar full-scale jump into RISC
       architecture (Gould's new line is not RISC); not to forget
       the parallel process architectures coming on soon.

       While it's too early to see if it's RISC or no risk, or
       both, one thing is clear, we are seeing 3 major developments
       here. One, Unix is becoming an industry standard; two,
       real-time Unix is here; and three, price per MIP is dropping
       exponentially. Probably, the only thing holding back such
       beasts from eating mainframes alive in the commercial market
       is that the secondary hardware markets (i.e. fast drives;
       higher-speed access methods, such as fiber WANS; and
       robotized cartridge handling; etc.) and software markets
       (expert systems; improved database file structures; and
       standardized window interfaces) just haven't kept up with
       the fast moving central processor technology.

           So what's new !

	The opinions above are the author's alone; any similarity to
	management, living or dead, is purely coincidental.

   Tom Helton			Support Group
   ..{ihnp4}!	| | |\ | | \/   Southwestern Bell Telephone Co.	
  sw1e!uusgth	|_| | \| | /\   One Bell Ctr 24W5, StL, MO 63101	
-- 
   Tom Helton			Support Group
   ..{ihnp4}!	| | |\ | | \/   Southwestern Bell Telephone Co.	
  sw1e!uusgth	|_| | \| | /\   One Bell Ctr 24W5, StL, MO 63101	

csg@pyramid.UUCP (04/01/87)

In article <501@sw1e.UUCP> uusgth@sw1e.UUCP (uusgth) writes:
>The NP1 is the first of a new architecture from Gould, called the NPL series,
>that brings ultra-high performance to minicomputers at price performance
>factors well below the current industry.
>
>The NP1 is an ECL, gate-array processor sitting on a very fast bus (154 mega-
>bytes per second). With 2 cpu's on the same bus, coupled with a math accelera-
>tor unit, it does 12 MIPS sustained at a base price of about $400,000.

Ummm, I'm sure that Arete, Encore, Pyramid, and Sequent would be happy to tell
you about 12 MIPS superminicomputers with fast busses that sell for a lot less
than $400K for a fully configured system. Elxsi and Convex too, I think.

I can't believe this posting does these machines justice. Could someone post
some technical info?

<csg>

ram@alpha.UUCP (04/01/87)

>same bus, coupled with a math accelerator unit, it does 12
>MIPS sustained at a base price of about $400,000. With a
>multiple, coupled bus option, 8 cpu's can co-exist for close
>to 100 MIPS, addressing 2 gigabytes of 54 nanosec memory.

   2cpu+math = 12 MIPS
   8cpu+  "  =100 MIPS ?
     
   100/12 = 8.5
   8/2    = 4.0

   I need a math accelerator and a good one too :=).

>Unix is THE operating system, complete with Bell, Berkeley,
>and the real-time extensions that Gould has pioneered in the
>real-time world.  A full complement of network connectivity,

    The wisest thing Gould (ever) did.  Early in their mini manufacturing
    line they opted for THE operating system UNIX.  To tell the truth,
    porting software into UTX (Gould's name- I wonder why everybody likes
    to change the name UNIX but always stick an X) is a lot easier
    than even on VAX(4.3) at times.  Watch out for their secure versions too.  
    Choosing UNIX early on the game has its advantages.  The problem of OS
    for a new machine is porting rather than development.  That saves a lot
    of time in the development cycle of an OS for your machine.
    To quote somebody from Gould (Latest UNIX/WORLD),  

	Q. Why did Gould choose UNIX when IBM has neglected it?
   Retort. UNIX is bigger than IBM. 

>development vs. HP's similar full-scale jump into RISC
>architecture (Gould's new line is not RISC); not to forget
>the parallel process architectures coming on soon.

    True.  But don't rule out RISC.  It is becoming an industry standard.
    (only the name not the technology).
    Soon, the RISC name should be forgotten if every processor is a RISC
    processor.

>           So what's new !

    Year of new architectures!  Watch out for Cydrome, Multiflow (VLIW),
    Vitesse,  Chopp(sullivan), ETA, RP3 ........  Architecture design
    is becoming an art.  Maybe we should have arch-art exhibits.
    (One is coming up in Santa Clara during the first week of May).

    Oh! What's new column:  AI has infiltrated compiler technology.
    C86 C compiler (for PCs) (Ref: UNIX/WORLD) has optimizations based on
    AI techniques.  Guess the consequences.  The compiler learns. 
    The more times you compile the better is your optimization resulting
    in a single line of code. 

>   Tom Helton			Support Group
>   ..{ihnp4}!	| | |\ | | \/   Southwestern Bell Telephone Co.	
>  sw1e!uusgth	|_| | \| | /\   One Bell Ctr 24W5, StL, MO 63101	
>----------

   1987- Year of the start-ups  1988 Year of up-starts


					    Renu Raman
					    ...ihnp4!nucsrl!ram

ron@brl-sem.UUCP (04/01/87)

In article <1805@pyramid.UUCP>, csg@pyramid.UUCP (Carl S. Gutekunst) writes:x> 
> Ummm, I'm sure that Arete, Encore, Pyramid, and Sequent would be happy to tell
> you about 12 MIPS superminicomputers with fast busses that sell for a lot less
> than $400K for a fully configured system. Elxsi and Convex too, I think.

Last time I checked (and I checked Pyrmaid recently) Arete, Pyramid, and
Elxsi were not 12 MIPs.  The bus on the new Gould is much faster than
every CPU you listed.  There are two big advantages of the Gould.  One,
it achieves it's speed using a small number of processors which makes
it more attractive to some of the number crunchers than it would seem to
someone who is just going to dump 100 users on the machine (which is where
the ENCORE really excels).  The other big point is the ability for some
really fast disk technology to be used.  Fast disks make UNIX sing.  For
example, the C compiler on most machines is limited by the disk speed.
Find the fastest disk you can, make it /tmp and watch the performance
change.

Still, both this offering from Gould, and the recent offerings from DEC
are very UNDERWELMING.  Gould's previous top of the line machine, the
PN9000, was pushing 10 mips as it was.  This new processor isn't a very
big jump for two or three years elapsed time.  It would seem to me that
we ought to be in the 20 mips range per processor now.  DEC is essentially
pedelling the same old computer, clustered up in a form which pretends
to be competing with the newer IBM processors.  The effect of this approach
has been seen in DEC processors over the years.  You end up paying close
to N times as much (where N is the number of processors) and receive less
than N times the performance increase.

If only the big IBM's weren't such a bitch to talk to...

-Ron

sherm@elxsi.UUCP (04/02/87)

In article <706@brl-sem.ARPA> ron@brl-sem.ARPA (Ron Natalie <ron>) writes:

>>Ummm, I'm sure that Arete, Encore, Pyramid, and Sequent would be happy to tell
>>you about 12 MIPS superminicomputers with fast busses that sell for a lot less
>>than $400K for a fully configured system. Elxsi and Convex too, I think.
>
>Last time I checked (and I checked Pyramid recently) Arete, Pyramid, and
>Elxsi were not 12 MIPs.  

I don't know how Gould is measuring their MIPS.  If they're using Whetstone
MIPS (which is likely since the original posting mentioned that a
"math processor" was used to obtain the 12 MIPS), then a single ELXSI 
processor (model 6420) is about 12 MIPS.  The price for a single-CPU 
system is about the same as for the Gould.

>The bus on the new Gould is much faster than
>every CPU you listed.  

ELXSI's main bus is 320MB/sec, over twice as fast as the Gould's.

>There are two big advantages of the Gould.  One,
>it achieves it's speed using a small number of processors which makes
>it more attractive to some of the number crunchers than it would seem to
>someone who is just going to dump 100 users on the machine (which is where
>the ENCORE really excels).  

This comment also applies to the ELXSI.  It supports up to 12 64-bit
ECL gate array CPUs.  Works very nicely for number crunchers (applications
can even be parallelized to run on multiple CPUs simultaneously) and
for timesharing.  We have a customer who is supporting 1300 office
and "knowledge" workers on a medium-size ELXSI.  (Only 200-300 users are
ever doing anything at the same time, however.)  We also have people
using the ELXSI for number-crunching realtime activities like flight
simulation.

>The other big point is the ability for some
>really fast disk technology to be used.  Fast disks make UNIX sing.  For
>example, the C compiler on most machines is limited by the disk speed.
>Find the fastest disk you can, make it /tmp and watch the performance
>change.

You must be used to some very slow disks or very fast processors.  I have
never seen a disk bound C compiler on any machine that supports a reasonable
disk drive, unless you are talking about paging.  On the ELXSI, which has 
a very fast CPU and normally runs with the modest Fujitsu Eagles (1.8MB/sec 
transfer, ~25ms seek) the C compiler is almost completely CPU bound.  Speeding 
the disks up by a factor of 100 would have almost no effect on compile time.

Also, in those cases where disk performance is an issue during
compilation (normally heavily page faulting timeshare systems) it is
rarely a problem with transfer rate but much more likely a problem with
seek time.  Having a fast bus does not help in these circumstances,
except to the extent that it allows you to hook MANY disks up to
effectively overlap seeks.

There are certainly applications which require high disk transfer rates.
One of the reasons people buy ELXSI's is that they support direct mapping
of large files into user's address spaces yielding disk-to-user transfer
rates at nearly the hardware limit of the disk.  However these applications
are rare among our customers.  We have only one customer who is using
a higher disk transfer rate than the Eagles (2.4MB/sec).  
-- 
-------------------------------------------------------------------------------
Michael Sherman
...!{sun|styx}!elxsi!sherm

sherm@elxsi.UUCP (04/02/87)

In article <1805@pyramid.UUCP> csg@pyramid.UUCP (Carl S. Gutekunst) writes:
>In article <501@sw1e.UUCP> uusgth@sw1e.UUCP (uusgth) writes:
>>The NP1 is the first of a new architecture from Gould, called the NPL series,
>>that brings ultra-high performance to minicomputers at price performance
>>factors well below the current industry.
>>
>>The NP1 is an ECL, gate-array processor sitting on a very fast bus (154 mega-
>>bytes per second). With 2 cpu's on the same bus, coupled with a math accelera-
>>tor unit, it does 12 MIPS sustained at a base price of about $400,000.
>
>Ummm, I'm sure that Arete, Encore, Pyramid, and Sequent would be happy to tell
>you about 12 MIPS superminicomputers with fast busses that sell for a lot less
>than $400K for a fully configured system. Elxsi and Convex too, I think.
>
>I can't believe this posting does these machines justice. Could someone post
>some technical info?

ELXSI
-----
  The ELXSI is a multiprocessor with an architecture similar to the 
  description of the Gould NP1 given in the original posting.  It has 
  a relatively small number of fast CPUs plugged into a high-speed bus
  over which they access a large shared memory and communicate with one
  another.

  CPUs:  
 	 From 1 to 12 64-bit ECL, gate-array processors on a REALLY fast
         bus (320MB/sec, over twice the Gould's).  There are two models
         of CPUs, the 6410 released in 1984 and the 6420 released in 1986.
         These are completely compatible and can coexist in the same system.
	 CPU cycle time is 50ns.

         The 6410 measures about 7 Whetstone MIPS, while the 6420 does about
         12, which sounds the same as the Gould CPU described above.
	 I don't know how Gould is measuring their MIPS, however.  If you
         measure MIPS by running nroff and calling a VAX/780 1 MIP, then
         an ELXSI 6410 is about a 4 (like a VAX 8600) and a 6420 is about a 
	 6 (like a VAX 8650 or 8700).  Floating point, especially 64-bit 
	 floating point, is much faster.

	 The instruction set is between a RISC and a CISC.  (I hesitate
	 to call it a MISC, however. :-)  Instructions are regular and
	 simple but there are several addressing modes.  The machine is
	 message based, so in addition to the normal instructions, there are 
	 about a dozen instructions implementing a message system.  (E.g.,
	 send and receive instructions, with asynchronous and synchronous
	 variants.)

         The CPUs contain special hardware to support extremely fast 
	 context switches - about 10 microseconds.  There are 16 sets
         of 16 64-bit GPRs and 16 sets of TLB translations in each CPU.
	 A context switch is handled completely in microcode on detection
         of events such as a quantum fault, page fault or receipt of a
         message.   A software scheduler running on each CPU decides
         which processes get to be in register sets at any given time.
         A realtime process can be locked into a register set so that
	 it can begin execution 10 usecs after being notified of an
         external event, even if timesharing is going on at the same
         time.  

	 The 6410 has a 16KB cache, while the 6420 has 64KB.  Cache
	 cycle time is 25ns.  The cache board plugs directly into the
	 320MB/sec bus (called the Gigabus) and accesses the shared 
         global memory (up to 2 GB) over the bus.  Access time is 
	 400ns for a 16-byte read or an 8-byte write.

  I/O system:

	 I/O is handled by 1 to 4 I/O processors.  Each of these is
	 actually an ELXSI 6410 CPU with the floating point board replaced
	 by an I/O bus handling board.  This additional board provides two
	 8 MB/sec I/O channels.  I/O capacity through the I/O processors
	 is thus 16-64 MB/sec.  

         A board providing direct access to the Gigabus exists, but is
	 not commonly used for device I/O.  It is currently being used as
	 a Cray interface.

  Software:

         The ELXSI supports a shareable "virtual machine" kernel which
	 handles the paging system, process scheduling, load balancing,
	 etc.  This kernel is called the ELXSI System Foundation.  Its
         services are currently used to support four operating systems -
         two separate Unix ports (System V and 4.2), a VMS-compatible
         system called "EMS", and a proprietary OS called EMBOS.  The
	 operating systems can all run concurrently, in fact there can
	 even by multiple copies of each Unix system.  (We use this facility
	 in house for OS development - we have one stable timesharing
	 BSD system, for example, and several "standalone" systems which
	 we constantly crash.)

  	 The System Foundation performs only a coordination role for performance
	 sensitive activities such as disk access.  Once everyone agrees on
	 which operating system owns which part of the disk, each OS speaks
	 directly to the disk controllers.  The normal Unix filesystem 
	 layouts are maintained - System V uses the original Unix FS, while
	 BSD uses the fast file system.  The System Foundation facilities
	 can be used by suitably privileged user applications.  In fact,
	 an application can run directly connected to various pieces of
	 hardware and at a priority higher than any operating system process,
	 without necessarily being able to do any damage to the rest of
	 the system (except for performance).

	 The Unix ports are rather plain vanilla ports. A message interface
	 was stuck into the system call library to replace the normal "trap"
	 interface.  The memory management and CPU scheduling code was 
	 replaced by messages to the System Foundation services.  Most 
	 everything else was left alone.
	
	 OS parallelism is achieved by having numerous separate processes
	 handling separate tasks, rather than having all CPUs trying to
	 perform the same tasks.  There is no time at which an OS process
	 running on one CPU can lock out one running on another.

Price: 
         List price is about $400K for a single CPU system.  Additional CPUs are
         roughly 150K for a 6410 and 200K for a 6420.
-- 
-------------------------------------------------------------------------------
Michael Sherman
...!{sun|styx}!elxsi!sherm

ram@alpha.UUCP (04/02/87)

In comp.arch ron@brl-sem.ARPA (Ron Natalie <ron>) wrote: 
>In article <1805@pyramid.UUCP>, csg@pyramid.UUCP (Carl S. Gutekunst) writes:x> 
>>Ummm, I'm sure that Arete, Encore, Pyramid, and Sequent would be happy to tell
>>you about 12 MIPS superminicomputers with fast busses that sell for a lot less
>> than $400K for a fully configured system. Elxsi and Convex too, I think.

>Last time I checked (and I checked Pyrmaid recently) Arete, Pyramid, and
>Elxsi were not 12 MIPs.  The bus on the new Gould is much faster than
>every CPU you listed.  There are two big advantages of the Gould.  One,
>it achieves it's speed using a small number of processors which makes
>it more attractive to some of the number crunchers than it would seem to

    Elxsi has a 320 Mb bus and Encore 100Mb (I am pulling these numbers
    off my head- so allow some tolerances).  Sequent had a slower
    bus I think.  Sequent 21000 is rated at 21MIPS (21 NS32XXX).
    Also ELXSI OS is called EMBOS ( a message based OS) on which
    SYS V or BSD are selectable (rather than a hybrid - to which 
    some purists are against).  Also note that ELXSI is claimed
    to show a linear performance improvement with processor addition
    (John Sanguinetti: In COMPUTER of Aug'86).

>someone who is just going to dump 100 users on the machine (which is where
....
>

>Still, both this offering from Gould, and the recent offerings from DEC
>are very UNDERWELMING.  Gould's previous top of the line machine, the
>PN9000, was pushing 10 mips as it was.  This new processor isn't a very
>big jump for two or three years elapsed time.  It would seem to me that

     remember 3 yrs + ECL.

>we ought to be in the 20 mips range per processor now.  DEC is essentially
>pedelling the same old computer, clustered up in a form which pretends
>to be competing with the newer IBM processors.  The effect of this approach

     I guess when you grow big, marketing is where you indulge in
     rather than good performance oriented developments.

>has been seen in DEC processors over the years.  You end up paying close
>to N times as much (where N is the number of processors) and receive less
>than N times the performance increase.
>
>If only the big IBM's weren't such a bitch to talk to...
>
>-Ron
>----------



					      renu raman
					      ...ihnp4!nucsrl!ram

P.S Don't trust my mail/note headers. Trust the footers (really)

csg@pyramid.UUCP (04/02/87)

In article <706@brl-sem.ARPA> ron@brl-sem.ARPA (Ron Natalie <ron>) writes:
>Last time I checked (and I checked Pyrmaid recently) Arete, Pyramid, and
>Elxsi were not 12 MIPs.

I had understood the poster to say that *two* processors gave 12 MIPS, which
would make it about the same as Pyramid's 9820 dual CPU system. It has been
since pointed out to me that it was 12 MIPS per CPU, which is a very different
story; $400K would buy you 24MIPS or so. Elxsi claims 10MIPS per CPU, Pyramid
7, and Arete 2.5, and so on.

>The bus on the new Gould is much faster than every CPU you listed.

Elxsi's bus is claimed to be 330Mbytes/second, more than double the Gould NPL.
(Pyramid's is 40MB.) Elxsi uses that huge bandwidth primarilly for multiple
processors, and indeed it is the Pyramid's bus bandwidth that limits the
number of CPUs you can throw on it. (Caching helps a lot, but the CPUs do need
to reference main store once in a while. :-)) 

I'd be curious to hear people's thoughts on whether all that bandwidth is
really usable by I/O. The best disks/controllers you can buy today only putter
along at, what, 20Mbytes/sec? And the Unix file system is two slow and grody
to really take advantage of more than about an eighth of that with a 12 MIPS
CPU behind it. Yes, there's raw disk I/O, but how much is that used outside of
dedicated database applications? 

(Honest questions, I really don't know. I'm just a comm hacker. :-))

<csg>

ram@alpha.UUCP (04/02/87)

    Oops: I missed stating this in an earlier response.  It is Flops
    that is important than MIPS in machines of this class.  Could
    somebody get the FLOPS numbers.  I doubt if NP1 is RISC. In
    which case MIPS rather undermines the machines capabilities.  
    Would "Krazy" care to clarify.



						 Renu Raman