[comp.sys.next] NextStation impressions

carlton@aldebaran.berkeley.edu (Mike Carlton) (11/30/90)

Greetings,

Well, as someone else posted recently, Next recently held a demo on the 
Berkeley campus with several NextStations and a Color NextStation.  I was 
able to spend a while on one of the machines and got some simple benchmark 
results.  Below are the numbers and several other comments I had.

Basically, the NextStation is quick.  Here are some numbers, along with 
some other machines for comparison.  I measured the following:

                power/bc(1)  sqrt/dc(2)    nroff(3)   bamsim(4)
                ----------- ----------- ----------- -----------
RS/6000(5)              3.4         0.3         2.7        46.1
MIPS(6)                 6.0         0.3         1.6        46.1
NextStation(7)          7.8         0.5         3.4        70.1
SparcStation(8)        12.1         0.7         3.9        51.9
030 Cube(9)            19.0         1.3        15.1       322.4
Sun 3/60(10)           30.3         2.5        16.9       376.1
Sun 3/50(11)           44.5         3.9        29.1       462.8
VAX 785(12)            44.3         3.5        36.3       681.2

Benchmarks:
( 1) echo 2^5000/2^5000 | /bin/time bc > /dev/null
( 2) echo 99k2vp8opq | /bin/time dc > /dev/null
( 3) cat /dev/null | awk 'END {for(i=0;i<100;i++){print ".PP";for(j=0;j<100; \
     j++) print j;}}' | /bin/time nroff -ms -i > /dev/null 
( 4) bamsim chat.o: chat parser with writes simulated on the VLSI-BAM chip

Machines:
( 5) RS/6000 Model 320, 32MB? memory, 20MHz?, 8KB i-cache and 64KB d-cache
( 6) M/2000-8, 128MB memory, 25MHz R3000, 64KB i-cache and 64KB d-cache
( 7) NextStation, 8MB memory, 25MHz 68040, on-chip 4KB i-cache and 4KB d-cache
( 8) SparcStation 1+, 24MB memory, 25MHz? SPARC, 64KB? cache
( 9) 030 Cube, 12MB memory, 25MHz 68030, on-chip 256B i-cache and 256B d-cache
(10) Sun 3/60, 20MB memory, ?MHz 68020, ?KB cache, on-chip 256B i-cache
(11) Sun 3/50, 4MB memory, 16MHz 68020, on-chip 256B i-cache
(12) VAX 785, 32MB? memory, 8MHz?, ?KB cache

Disclaimer: these are simple benchmarks -- don't place too much emphasis
on them.  If you really need to know just how fast the machine is then run
your specific programs on it.  SPEC results would be better of course, but 
I don't have them.

The first 2 benchmarks are simple unix one-liners that were recently
posted to comp.benchmarks.  These are nice because you can sit down and 
type them quickly.  They should involve mostly integer and pointer
operations.  The third one is a simple test of nroff, and so should be
mostly character operations.  The fourth benchmark is a register level
simulator our group developed and uses, it is almost all integer operations. 

I've described the machines as well as I can, question marks indicates
the quantities I don't know or have made a guess at.  The NextStation I 
used was still a pre-release machine -- it was running a beta version 
of the OS and the 040 was supposed to be an early version.

For the one-liners, the NextStation ranges from 1.1 to 1.5 times the
speed of a Sparcstation 1+.  On the larger simulation the Next is only
75% of the speed of the Sparcstation, we believe this could be due
to caching on the Sparc or due to slow bit operations, since the simulation 
performs quite a few, and the 040 should be slower on large shifts (the shift 
instruction can only specify shifts of 1-8 bits on the 030 and presumably 
the same is true of the 040).

Floppy drive: The floppy drive is very well integrated, I simply popped 
a DOS 1.44MB floppy in the drive and the Next automatically mounted it 
as a new volume and it appeared in the browser.  Subjectively, the floppy 
seemed slow (i.e. copying seemed to be a few times slower than copying 
from a floppy to a hard drive on a Mac), but this is just an impression.  
Also, the Next was supporting a foreign format and so may be slower because 
of that.  

Shipping news: one of the Next reps said that they had begun shipping
and he thought that Berkeley would get its first 5 machines sometime
next week, with 5 more to follow in a couple of weeks.  Of course,
he went to great pains to point out that there were no guarantees, 
nothing was definite, this was what he thought, etc.  The first 10 
machines will all be the 105MB disk, 8MB configuration.  A machine with 
200MB disk has been added to the available configurations, but won't ship 
until after the new year.  He thought the price for a 200MB system would 
be about $700 more than the 105MB system.  030 Cube upgrades to 040's 
probably won't ship until the new year.

Software: I was able to play with Illustrator (and only crashed it
once in about 5 minutes).  It appeared to be just about identical
to the Macintosh version.  They also had a demo version of Framemaker,
a demo of WordPerfect and of course, Lotus Improv.  In general, the
applications seemed pretty solid.  

Overall, the machines are very impressive.  I'm number 5 on the list 
here at Berkeley and so might actually get one in the next couple of
weeks -- I'm looking forward to it.  I don't think there is a better
price/performance, basic Unix box out there (which is what I was 
looking for) and the user interface is a world above X or Suntools.
Combine that with a bunch of very slick bundled software and I think 
it's a great deal.  We've had an 030 Cube in our office since they first 
came out, it was a good machine, but wasn't worth $6500 of my own money.  
The Nextstation is definitely worth the $3200.

That's all for now, 
Mike Carlton	carlton@cs.berkeley.edu

alex@pluto.dss.com (Alex Smith) (11/30/90)

In article <9325@pasteur.Berkeley.EDU>, carlton@aldebaran.berkeley.edu (Mike Carlton) writes:
> Greetings,

[ Benchmark specifics ]

> For the one-liners, the NextStation ranges from 1.1 to 1.5 times the
> speed of a Sparcstation 1+.  On the larger simulation the Next is only
> 75% of the speed of the Sparcstation, we believe this could be due
> to caching on the Sparc or due to slow bit operations, since the simulation 
> performs quite a few, and the 040 should be slower on large shifts (the shift 
> instruction can only specify shifts of 1-8 bits on the 030 and presumably 
              ^^^ ^^^^ ^^^^^^^ ^^^^^^ ^^ ^^^ ^^^^
According to the MC68030 User's Manual (2nd Ed.):

	The shift count for the shifting of a [data] register is specified
	in two different ways:

	1. Immediate -- The shift count (1-8) is specified in the instruction.
	2. Register -- The shift count is the value in the data register
	   specified in the instruction modulo 64.
                                        ^^^^^^ ^^
Perhaps the simulation is using immediate specification, or shifting memory
(which can only be done one bit/byte at a time).

> the same is true of the 040).

[ etc.]

Alexander Smith		"If that was an opinion, this is a disclaimer."
alex@pluto.dss.com

carlton@aldebaran (Mike Carlton) (12/01/90)

In article <4088@pluto.dss.com> alex@pluto.dss.com (Alex Smith) writes:
+In article <9325@pasteur.Berkeley.EDU>, carlton@aldebaran.berkeley.edu (Mike Carlton) writes:
+> Greetings,
+
+[ Benchmark specifics ]
+
+> instruction can only specify shifts of 1-8 bits on the 030 and presumably 
+              ^^^ ^^^^ ^^^^^^^ ^^^^^^ ^^ ^^^ ^^^^
+According to the MC68030 User's Manual (2nd Ed.):
+
+	The shift count for the shifting of a [data] register is specified
+	in two different ways:
+
+	1. Immediate -- The shift count (1-8) is specified in the instruction.
+	2. Register -- The shift count is the value in the data register
+	   specified in the instruction modulo 64.
+                                        ^^^^^^ ^^
+Perhaps the simulation is using immediate specification, or shifting memory
+(which can only be done one bit/byte at a time).
+
+> the same is true of the 040).
+
+[ etc.]
+
+Alexander Smith		"If that was an opinion, this is a disclaimer."
+alex@pluto.dss.com

I wasn't real clear on my original posting, mainly because I was still
speculating about the possible problems due to an 040.  Yes, the code
uses immediate shifts extensively.  At some point, it becomes faster for
the compiler to load a temp register with a shift amount and do a register
shift rather than multiple immediate shifts.  I don't know what gcc does,
I have not profiled the code or looked at the assembly generated because
it hasn't been a priority.

Here are a few lines from one of the header files, showing some of the 
operations that are being performed:
	#define sign_ext_11(data)  (((int)((data)<<21))>>21)
	#define sign_ext_12(data)  (((int)((data)<<20))>>20)
and even:
	#define tagged_imm_11(tage,data) \
        	((-ebit(tage))^((((-ebit(tage))^(tage))<<27)| \
        	(((-ebit(tage))^(data))&0x7ff)))

When I get a slab, I'll take a look at what the compiler is generating.

--mike
Mike Carlton	carlton@cs.berkeley.edu