[net.works] 32032 UNIX

STEINER@RUTGERS.ARPA (01/07/85)

From: Caro.PA@XEROX.ARPA

Does anyone manufacture a UNIX box based on the NS320xx??  If so,
what are the specs?  I like the chip, but everyone seems to have
jumped on the 68K bandwagon.  I read an add that Tektronix (I think)
is coming out with a CAD/CAM workstation based on a 32032/32016
coprocessor design, but I don't think it's UNIX based.

Any info is welcome.

Perry

STEINER@RUTGERS.ARPA (01/07/85)

From: Doug <Faunt%hplabs.csnet@csnet-relay.arpa>

American Information Systems (415)494-3210 is shipping a 32032 based
Unix system, any day now, so I'm told.
        faunt%hplabs@csnet-relay    ....!hplabs!faunt

iau@ukc.UUCP (I.A.Utting) (01/08/85)

An outfit called "Whitechapel Computer Works" in London have a workstation
(called the MG-1) based on the NS 32016 running GENIX (Nat Semis 4.1 port).
It has a monochrome bit-map display (1024 x 800), 40MByte Winchester,
5.25" floppy, a Swiss 3-button mouse, on-board Ethernet and 1MByte RAM.
It also has an IBM-PC expansion bus (DMA capable) which should keep the price
of peripherals down. All this for (pounds sterling) 8000 or there abouts.

I have a beta-test machine which is behaving reasonably for such things, but
it's early days yet for software which takes advantage of the screen (window
manager soon) or the Ethernet (no support yet).

You can contact WCW at:
	Whitechapel Computer Works
	75, Whitechapel Road
	London
	E1 1DU
	UK		Phone: +1 377 8680

		Ian.

ed@mtxinu.UUCP (Ed Gould) (01/09/85)

> From: Caro.PA@XEROX.ARPA
> 
> Does anyone manufacture a UNIX box based on the NS320xx??  If so,
> what are the specs?

Sequent Computer Systems in Portland recently announced a multiprocessor
system, containing from two to 12 32000 CPUs.  It runs 4.2BSD.
I understand that they'll be showing it in Dallas.  It's a nice machine.

-- 
Ed Gould		    mt Xinu, 739 Allston Way, Berkeley, CA  94710  USA
{ucbvax,decvax}!mtxinu!ed   +1 415 644 0146
			    (I'd rather not be parochial.)

STEINER@RUTGERS.ARPA (01/09/85)

From: Andrew Klossner <andrew@orca>

	"Does anyone manufacture a UNIX box based on the NS320xx??  If
	so, what are the specs?  I like the chip, but everyone seems to
	have jumped on the 68K bandwagon.  I read an add that Tektronix
	(I think) is coming out with a CAD/CAM workstation based on a
	32032/32016 coprocessor design, but I don't think it's UNIX
	based."

Our "CAD/CAM workstation" is just a 320xx-based computer running a
4.2BSD port.  I have one on my desk; it's a very nice little box.  We
added mucho bells and whistles, like a distributed file system (my
neighbor's disk looks like part of my file system), the "unimplemented"
memory management calls like mmap(2) and munmap(2), and, since we're
Tektronix, a real Basic language implementation.

  -- Andrew Klossner   (decvax!tektronix!orca!andrew)       [UUCP]
                       (orca!andrew.tektronix@csnet-relay)  [ARPA]

jps@stcvax.UUCP (Jeff Snover) (01/10/85)

> Does anyone manufacture a UNIX box based on the NS320xx??  If so,
> what are the specs?  


Intergraph (Huntsville Alabama) has a nice workstation based on the
32032 called the Interpro-32.

Here is what I remember:
	- 32032 based running Genix (nice Unix [4.2?] port)
	- 32032 support chips (I think the MMU and FPP)
	- 1k x 1k color display with windowing
	- 1.75 Megs expandable to 4 (or 16) Megs
	- Intel Ethernet controller chip coupled with the 80186 (yuk)
	  to handle packets and I/O.
	- *fast* bit-slice engine to handle screen manipulation and graphics
	- 25 Meg Winchester

I don't know if this is appropriate for your application but I was quite
impressed.  

I hope this was useful. (Oh by the way, my brother is a Civil Eng. and a
buddy of his works at Stone & Webster and said of the *many* workstations
that they have [Apollo, Computervision, Camdus(???), and Intergraph] he
has found the Intergraph to be the best).


-------


-- 
Chapter-11/Multiple-Mega-Layoff Survivor (for now...)

Jeffrey P. Snover  -  STC StorageTek (Disk Division)
uucp:	{ ihnp4, decvax}!stcvax!jps
	{ allegra, amd70, ucbvax }!nbires!stcvax!jps
USnail:	Storage Technology Corp  -  MD 3T / Louisville, CO / 80028
DDD:	(303) 673-6750

STEINER@RUTGERS.ARPA (01/15/85)

From: psu-cs!aatpdx!mcg%tektronix.csnet@csnet-relay.arpa


As a former employee of Tek working on the workstation to which you
referred in your note, I can tell you that it is *definitely* UNIX
based.  The system (the 6000 series) is based on the 32016 and 32032
processors and 4.2BSD UNIX.  I don't wish to say anything about
their merit.

Also, a company called LMC Corp sells a Multibus-based 32016 box
which runs UNIX.

There are two commercially available UNIX ports for the 32016:
one from Human Computing Resources in Toronto: originally a 4.1
port, now moving to System V; and one from National Semiconductor,
also originally 4.1, now moving to 4.2.  HCR's is available in
source or binary form for a small number of configurations, and
National's is available in source form, or in binary for their
proprietary workstation.

I run HCR 4.1 UNIX (Unity) on both an old National DB16000 board
(a multibus 16032 prototype) and on the GVC Corp GVC-16 board,
made by a small company in Cambridge.  Since we do not yet have
Rev N (bug-free) chips, all the software work-arounds make it
a little slow, but I expect 11/750 performance or better when
we have Rev. N chips and use GVC's 4 megs of on-board no-wait-state
memory (there are 4 to 6 waits for Multibus memory access).

S. McGeady
Ann Arbor Terminals
Research and Development
Portland, OR

shor@sphinx.UChicago.UUCP (Melinda Shore) (01/17/85)

Sad rumor time -- I hear LMC has gone under, or at least has been dropped
by their parent company.  The Unix they've been running, is Genix, which 
is based on 4.1.

It really looked like a nifty machine.  I hope that they can find other
sources of capital.

Melinda Shore
University of Chicago Computation Center

STEINER@RUTGERS.ARPA (01/17/85)

From: Tom Blenko  <blenko@rochester.arpa>

The Tektronix box does run on a 16/32K. But the 16K only runs at
4 MHz (which they all somehow forgot to mention in the WORKS digest).
Look before you leap!

	Tom

steveg@hammer.UUCP (Steve Glaser) (01/19/85)

>From: Tom Blenko  <blenko@rochester.arpa>
>
>The Tektronix box does run on a 16/32K. But the 16K only runs at
>4 MHz (which they all somehow forgot to mention in the WORKS digest).
>Look before you leap!
>
>	Tom

Now wait just a minute here.  Currently Tek 6000 workstations run at 8Mhz.
We will be moving to 10Mhz as soon as the chips are available in sufficient
quantity.  The 10 Mhz systems we have in house work just fine.

We never did run at 4 Mhz.  There were some early software development
and demo boxes that ran at 5 Mhz.

	Steve Glaser
	Tektronix, ECS Engineering
	tektronix!steveg		UUCP
	steveg.tektronix@csnet-relay	CSNET/ARPANET

steveh@hammer.UUCP (Stephen Hemminger) (01/20/85)

-----
| The Tektronix box does run on a 16/32K. But the 16K only runs at
| 4 MHz (which they all somehow forgot to mention in the WORKS digest).
| Look before you leap!
-----

Wrong!  Production TEK workstations run at 8 Mhz.  Your information
may have come from software vendors who were loaned pre-production test
units at lower clock rates.

STEINER@RUTGERS.ARPA (01/21/85)

From: Andrew Klossner <andrew@orca>

Tom Blenko <blenko@rochester.arpa> writes:

	"The Tektronix box does run on a 16/32K. But the 16K only runs
	at 4 MHz (which they all somehow forgot to mention in the WORKS
	digest).  Look before you leap!"

The Tektronix 6000 family of workstations (NS16032/32032 based, running
enhanced 4.2BSD Unix) presently run at 8MHz, and will increase to 10MHz
in the next month or so as the rev N 32016 CPUs become available.

These products have never run at 4MHz, even during initial development.
The first engineering prototypes clocked at 5MHz, back when National
couldn't produce parts faster than this.

As Mr. Blenko is a former employee of Tektronix, I'm surprised at his
vociferousness.

  -- Andrew Klossner   (decvax!tektronix!orca!andrew)       [UUCP]
                       (orca!andrew.tektronix@csnet-relay)  [ARPA]

STEINER@RUTGERS.ARPA (01/22/85)

From: psu-cs!aatpdx!mcg%tektronix.csnet@csnet-relay.arpa


Tom Blenko (blenko@rochester) is completely wrong in the statement that
"the Tektronix box .. only runs at 4 MHz".  Early prototype versions of
the box ran with 4 and 6Mhz parts, because of the inavailability of 10Mhz
ones, and bug-free 10Mhz parts (rev N) are still not available in production
quantities, but even though I haven't worked for Tek for over a year, I
know that the first production units will run at least 6 Mhz, and later
units will run at 10 Mhz.

Tom is also a former Tek employee and obviously has some axe to grind.

Since the product has been announced and is almost on the market, and has
been the subject of a certain amount of bashing, let me reverse my
previous stand of silence on the issue, and say this: the Tek 6000 series
will be a very nice UNIX box.  Engineering Computing Systems (ECS, the
division building the system) has expended an amazing amount of time and
effort fixing bugs in the system, rationalizing interfaces, and generally
compiling all the best features of numerous UNIX systems out there for
their version.  They have made some major strides forward in improving
the virtual memory system in 4.2bsd, and have sped up the system by
off-loading major subsystems into other processors - the I/O system
and network are supported by independent processors.

Take the Tektronix box seriously.


S. McGeady
Ann Arbor Terminals
Portland, OR
cbosgd!aat!mcg

tucker@ccvaxa.UUCP (01/27/85)

    Whitechapel in the UK makes a real nice 32K UNIX single user workstation
from the sounds of it (See the Fe issue of BYTE magazine).  Claims to use
some very advanced hardware ideas like: 
    1.) A wide data bus (64bits) with 200Mhz bandwidth (bit bandwidth).
        512K standard, 8Meg max.
    2.) Two coprocessors for the bit mapped display, one for BLT's and
        one the keyboard/mouse and mouse icon.  I.e. main processor time
        is used only when something happens at the user interface.  Just
        running the mouse around doesn't take up any main processor time.
        Also, the display processor uses virtual memory mapping with its
        own page table handler.  I.e. display memory can be non-contig and
        and as many displays as the uses wants can be used at once.
    3.) Hard-disk has a direct connection to the memory that doesn't 
        take time away from the main CPU because of high memory bandwidth.
        Hard-disk DMA allows multisector unloading to noncontig locations
        without CPU time being wasted.
        10/22/45 Meg, also one 800K floppy.

    Other nice hardware things:
    a.) Ethernet interface standard.
    b.) Full 8Mhz 32000 processor set standard.
    c.) Nice landscape display with keyboard and hemisphere mouse standard.
    d.) Soft on/off switch like that on the Apple LISA standard.

    They also have some nice software things:
    1.) BSD 4.1 Genix optimized for workstation use. (i.e single user)
    2.) Comes GKS software and windowing built in.
    3.) Full C, PASCAL, and FORTRAN optimizing compilers available. (??)
    4.) Smalltalk maybe (vaperware) available in the near future.

    The nicest thing of all was the price.  5000lbs for the base configuration.
    Thats about $6000 dallors at current exchange rates I think.  Sounds too
    good to be true.

    Note that I'm not affiliated with this company, and do not know details
    about the system.  Look at BYTE if you want to see the reveiw.

jml@drutx.UUCP (LeonJM) (01/30/85)

National has recently announced a System V Rel 2 version 2 port.  It is the
first System V port on a micro chip that supports paging.  I played around
with a system at the National booth at UniForum and it seemed reasonably
fast.  By the way, does anyone know the who actually is doing the official
System V port for Zilog?

John Leon    ihnp4!drutx!jml
AT&T Information Systems Laboratories, Denver

sohail@terak.UUCP (Sohail M. Hussain) (02/01/85)

> Early prototype versions of
> the box ran with 4 and 6Mhz parts, because of the inavailability of 10Mhz
> ones, and bug-free 10Mhz parts (rev N) are still not available in production
> quantities

We have some of those 10Mhz rev N parts, in our work station and
what has been puzzeling me, is that these machines out perform out
Vax 750. (not in compiles ofcourse, but in execution times)

Can some one out there shed some light on why a 32016, runs faster
than a 750, in programs that access memory (using pointers or matrix type
operations.)

As background, we have Vax 750, with 3Mb mem, running 4.1 BSD
and the workstations are of our own manufacture, using a 10Mhz 32016,
4Mb memory, and running 4.1 Genix. The times were done with both systems
running multiuser, but only one person logged in.

Looking forward to some answer, as I have always though of our 750 as a
fair size machine, it supports our development efforts quite well, I am
now faced with either having to start respecting the 32016 more, 
or the 750 less.

sohail
Sohail Hussain

uucp:	 ...{decvax,hao,ihnp4,seismo}!noao!terak!sohail
phone:	 602 998 4800
us mail: Terak Corporation, 14151 N 76th street, Scottsdale, AZ 85260
-- 
Sohail Hussain

uucp:	 ...{decvax,hao,ihnp4,seismo}!noao!terak!sohail
phone:	 602 998 4800
us mail: Terak Corporation, 14151 N 76th street, Scottsdale, AZ 85260

hammond@petrus.UUCP (02/07/85)

> We have some of those 10Mhz rev N parts, in our work station and
> what has been puzzeling me, is that these machines out perform out
> Vax 750. (not in compiles ofcourse, but in execution times)
> 
> Can some one out there shed some light on why a 32016, runs faster
> than a 750, in programs that access memory (using pointers or matrix type
> operations.)
> 
> Sohail Hussain
> 
Issues: Does your 32016 based workstation have a 32081?
	Are you using the 32082 MMU?
        Does your 750 have a floating point accelerator?
        Is your benchmark program small enough to fit in memory,
        (i.e. roughly the same number of page faults on both machines?)

Questions:  How much faster, i.e. 5, 10, 20 30 %?

I have a NSC Sys 32 (A 32016 based, 4.1 bsd development system)
It runs about the same as an 11/23, or about 1/3 of a 750.
My boss has been giving me grief about this, so your info is most
encouraging.

Note a 32032 should give roughly 1.25 times the performance of a 32016.
The 32 bit bus doesn't buy you that much more, except in applications
such as copying data memory to memory.

henry@utzoo.UUCP (Henry Spencer) (02/08/85)

Another relevant question is, does your memory have zero wait states?
People I trust tell me that the 32016's performance deteriorates
*SHARPLY* when wait states are introduced -- it's much worse than
you would expect, and in particular it's not linear in the number of
wait states.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

chuqui@nsc.UUCP (Chuq Von Rospach) (02/09/85)

In article <278@petrus.UUCP> hammond@petrus.UUCP writes:
>It runs about the same as an 11/23, or about 1/3 of a 750.
>My boss has been giving me grief about this, so your info is most
>encouraging.

I'll probably get grief for saying this, but there are some quirks in the 
SYS32 hardware that keep it from performing in ways it should. The memory
subsystem tends to require an unreasonable number of wait states in certain
configurations, and it makes the system sludge out. We've been taking a
close look at the SYS32 in the last few months because we realize that the
performance makes our chips look a lot worse than they really are. I don't
have anything I can talk about at this time besides pointing out that it IS
very possible to get 32xxx based systems that run MUCH faster than SYS32.
The SYS32 is more of a workhorse than a benchmark system, and people should
be aware of that fact.

chuq
-- 
From the ministry of silly talks:               Chuq Von Rospach
{allegra,cbosgd,hplabs,ihnp4,seismo}!nsc!chuqui nsc!chuqui@decwrl.ARPA

Life, the Universe, and lots of other stuff  is a trademark of AT&T Bell Labs

srm@nsc.UUCP (Richard Mateosian) (02/11/85)

In article <5040@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>Another relevant question is, does your memory have zero wait states?
>People I trust tell me that the 32016's performance deteriorates
>*SHARPLY* when wait states are introduced -- it's much worse than
>you would expect, and in particular it's not linear in the number of
>wait states.

Obviously, behavior under wait states depends a lot on particular 
programs.  Here is one data point taken from one of my Wescon papers.

The program is a small benchmark that uses memory heavily.  Bus use
is 82% on the NS32016, 57% on the NS32032, both at 0 wait states.  Given
below are execution times in seconds at 0, 1 and 2 wait states:

              0 ws        1 ws        2 ws
              ----        ----        ----
NS32032       32.7        36.0        39.3
NS32016       43.7        49.4        55.0

  Ratio        1.34        1.37        1.4

Execution speed of the NS32016 is 88.5% at 1 ws, 79.5% at 2 ws.

Execution speed of the NS32032 is 90.8% at 1 ws, 83.2% at 2 ws.

In general, programs with lighter bus use show smaller degradation with wait
states and smaller ratios of NS32032 to NS32016 execution speed.  In fact,
a program doing nothing but register to register divison operations might
show no degradation at all under wait states (because of the instruction
pipeline) and no difference in execution speed between the NS32016 and
NS32032.
-- 
Richard Mateosian
{allegra,cbosgd,decwrl,hplabs,ihnp4,seismo}!nsc!srm    nsc!srm@decwrl.ARPA

jchapman@watcgl.UUCP (john chapman) (02/12/85)

> Another relevant question is, does your memory have zero wait states?
> People I trust tell me that the 32016's performance deteriorates
> *SHARPLY* when wait states are introduced -- it's much worse than
> you would expect, and in particular it's not linear in the number of
> wait states.
> 
  On the otherhand (at least according to my old 16032 manual) you
  wouldn't need very fast memory to keep up with a 32016 say about
  400ns access?
 

doug@terak.UUCP (Doug Pardee) (02/12/85)

> >People I trust tell me that the 32016's performance deteriorates
> >*SHARPLY* when wait states are introduced -- it's much worse than
> >you would expect, and in particular it's not linear in the number of
> >wait states.
> 
> In general, programs with lighter bus use show smaller degradation with wait
> states and smaller ratios of NS32032 to NS32016 execution speed.

Wait states are a punch aimed at the 32000's glass jaw -- instruction
prefetch.

For those not completely conversant:  the 32000 series CPU's use
instruction prefetching to try to keep the 8 bytes following the
_current_ instruction already loaded into the CPU.  These bytes
are always the ones located sequentially after the current
instruction.

There are two undesirable side effects which can occur.  The most
obvious occurs when a branch is taken -- the prefetch cycles were
a waste of time, and the new instructions have to be fetched.  But ----
if the CPU had just started a prefetch cycle when the branch is
recognized, it has to wait for it to complete before the branch
can be executed.  Wait states increase the likelihood of this
happening as well as make the situation more serious.

Remembering that programs spend most of their time in loops, and that
a loop requires at least one branch on every time through, this
effect is magnified considerably.  Especially for concocted benchmark
programs, where the contents of the loop tends to be trivial, leaving
the branching as the major time consumer.

A second aspect of the 32000 series enters in here as well -- unlike
the 68000, instructions are not required to start on word boundaries.
If the branch destination is to an "odd" address, the CPU requires
yet another memory cycle, with any wait states.  Compilers for high-
level languages like "C" don't pay any attention to this little detail,
so tight loops can suffer just because the top of the loop is on an
odd-byte boundary.

The other side effect is less obvious.  The instruction prefetch cycle
can also obstruct access to the operands of the current instruction.
Again, wait states increase the likelihood of this happening, and
make the delay more serious as well.

This process, in turn, is made more likely by the use of high-level
languages like "C".  Unlike the competition's CPUs, the 32000 series
allows essentially all operations to be performed memory-to-memory,
without needing a register as an intermediate.  The compilers use
this feature extensively, with the result that operands require
memory access much more often than the equivalent 32000 assembler
code or (e.g.) 68000 "C" code.

Important note:  this presumes that if the compiler had been forced
to bring the operands into a register, and get the result in a
register, that it could have done some optimization and re-used that
register.  It is obvious, is it not, that a simple "Load A, Add B,
Store B" is necessarily going to be slower than "Add A to B"?

And to compound the problem even further:  the 32000 series is set
up to use "indirect addressing" fairly heavily, and the compilers
really use it a bunch.  Especially the "C" compiler, which uses
indirect addressing to implement pointer variables.

But wait, there's more (this is starting to sound like a TV mail-order
ad!).  Most "C" programmers seem to like to use "external" variables
rather than parameters.  On the 32000 series, parameters are accessed
just as easily as ordinary variables, but externals are a *double-
indirect*!  For a 32016 to get just the *address* of an external
item, it has to do four (4) memory cycles.  And if that item is a
pointer variable, "C" will require yet another two memory cycles
before it even has the *address* of the data.

All of this indirect address and operand fetching puts quite a load
on the memory system, and prefetching represents serious competition
for memory cycles.  If that prefetching turns out to have been
unnecessary because of a branch, the performance suffers more than
the number of wait states would imply.

So if you want your 32000 system to hum along, don't use wait states,
keep looping and branching to a minimum, program in assembler, and if
you simply *must* program in "C" avoid external variables and use
register variables (especially for pointer variables).  Oh, BTW, the
MMU adds one wait state of its own.
-- 
Doug Pardee -- Terak Corp. -- !{hao,ihnp4,decvax}!noao!terak!doug