[comp.arch] Why is the RT slow?

rathmann@polya.Stanford.EDU (Peter K. Rathmann) (11/15/88)

I got the impression from looking at some informal benchmarks that the
IBM PC/RT is slower than a good 386 box, and not even in the same
league as leading risc processors like SPARC/MIPS/88000.

Can anyone explain why?  It is billed as a risc and is not saddled
with compatibility with earlier architectures, so why shouldn't it be
a performance leader?

I can make some guesses at possible causes:

 - Its just bad press.  The RT is really pretty fast.
 - The present implementation is slow, but would be competitive if
	built with fast silicon, big caches, etc.
 - The designers got too "advanced" and included features which might
	come in handy 20 years from now, but for the present just slow
	things down.

Can someone with more knowledge fill me in?		Thanks,  Peter

mark@mips.COM (Mark G. Johnson) (11/15/88)

In article <5046@polya.Stanford.EDU> rathmann@polya.Stanford.EDU (Peter K. Rathmann) writes:
 >I got the impression from looking at some informal benchmarks that the
 >IBM PC/RT is slower than a good 386 box, and not even in the same
 >league as leading risc processors like SPARC/MIPS/88000.
 >
 >Can anyone explain why?  It is billed as a risc and is not saddled
 >with compatibility with earlier architectures, so why shouldn't it be
 >a performance leader?
 >

To avoid competing with the soon to be announced, faster-n-stink IBM
"Americas" risc processor?
-- 
 -- Mark Johnson	
 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
	...!decwrl!mips!mark	(408) 991-0208

jk3k+@andrew.cmu.edu (Joe Keane) (11/16/88)

I get the impression that they were paranoid about code space, giving a
too-complicated instruction set.  To wit:

*  Shift paired.  Where else do they pair registers?

*  `Absolute' branches, with 20 bits of target address.  Gross.

*  The `short' instructions.  No rhyme or reason other than that they save 2
bytes and do common things.

*  Non-execute branch instructions.  Can't just leave a NOP, can we?

*  Four and-immediate instructions.  How about better ways to make constants, so
everyone can benefit?

Just my opinion.

--Joe

johnl@ima.ima.isc.com (John R. Levine) (11/16/88)

There are several reasons why the RT is so slow.  The worst is that it is
lashed up to 16-bit PC/AT peripherals, which make I/O a real lose.  I hear
that the new versions use a 32 bit microchannel which should be a major
improvement; the RT's native bus is quite fast.

The CPU is another issue. The original version that we developed AIX on ran at
8MHz (pretty respectable at the time) and could start an instruction every
cycle, and overlapped up to two memory references with execution. Except that
they realized that they hadn't allowed for taking page faults with multiple
transactions in progress, so whenever virtual memory was on, i.e. almost all
the time, they turned off overlap, which meant that every load or store took a
full 5 cycles. The second version of the chip fixed that, but it took forever
to get it released. Also they decided to design the CPU to work without a
cache, basically because they guessed wrong about where memory technology was
going.

There's no reason you couldn't reimplement the chip to take advantage of
a cache, but IBM seems to move very slowly, so it's hard to tell if or when
they will do so.

If you're running AIX, you have the VRM virtual machine manager sitting between
Unix and the hardware, which is also a performance hit, but that's a story all
of its own.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869
{ bbn | spdcc | decvax | harvard | yale }!ima!johnl, Levine@YALE.something
Kids spend more time with their parents than parents spend with their kids.

ok@quintus.uucp (Richard A. O'Keefe) (11/17/88)

In article <cXU-XCy00V48ECDWt=@andrew.cmu.edu> jk3k+@andrew.cmu.edu (Joe Keane) writes:
>I get the impression that they were paranoid about code space, giving a
>too-complicated instruction set.  To wit:

>*  Shift paired.  Where else do they pair registers?

Speaking of which, could someone explain to me what those instructions
are good for, and how best to do double-length shifts?

I doubt whether any of the points Joe Keane listed really explains why
the old RTs were roughly halfway between a Sun-2 and a Sun-3/50 on a
logarithmic scale of speed.  There are supposed to be newer versions
that are rather faster.  My guess is that it was a marketing decision:
IBM decided to scale the thing to what they thought people would buy
and guessed wrong.

johnl@ima.ima.isc.com (John R. Levine) (11/18/88)

In article <691@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>In article <cXU-XCy00V48ECDWt=@andrew.cmu.edu> jk3k+@andrew.cmu.edu (Joe Keane) writes:
>>I get the impression that they were paranoid about code space, giving a
>>too-complicated instruction set ...
The ROMP chip was designed to run without a cache, so they were concerned
about minimizing the bandwidth needed for instruction fetching.  The
instruction decode is fairly heavily pipelined, I doubt that's a major
bottleneck.

>>*  Shift paired.  Where else do they pair registers?
	In the IBM 360 series, to some extent.
>Speaking of which, could someone explain to me what those instructions
>are good for, and how best to do double-length shifts?

When writing the AIX C compiler, I found them extremely handy for building
the shift and add chains for multiplication by a constant.  For double length
shifts, something that we didn't need to do, I would shift each part separately
since that's fast, and then use and's and or's to put the pieces together.

The problem with the RT's instruction set is that it was exquisitely
optimized for PL.8, their systems programming language, but the only
real thing written in PL.8 that I'm aware of is the VRM, and even half
of that ended up being rewritten in assembler.  It's not a bad instruction
set for C, but you need a far more sophisticated compiler than PCC to make
good use of it.

>...  My guess is that it was a marketing decision:
>IBM decided to scale the thing to what they thought people would buy
>and guessed wrong.

Of course -- they really didn't understand how much faster the workstation
market moves than their more familiar mainframe and small business market, so
by the time the RT came out it was obsolescent. Now they do understand, but
who knows what they'll do about it.

When the IBM came out, they published the "RT Personal Computer Technology"
book which has a bunch of fairly interesting articles giving a somewhat
sanitized version of how the RT came to be.  It's order number SA23-1057.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869
{ bbn | spdcc | decvax | harvard | yale }!ima!johnl, Levine@YALE.something
Kids spend more time with their parents than parents spend with their kids.

henry@utzoo.uucp (Henry Spencer) (11/18/88)

In article <691@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>... My guess is that it was a marketing decision:
>IBM decided to scale the thing to what they thought people would buy
>and guessed wrong.

In a talk he gave here yesterday, John Mashey guessed that IBM expected
that caches would stay expensive and hence wouldn't be cost-effective.
Static RAM prices dropped and this turned out to be a mistake.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

baum@Apple.COM (Allen J. Baum) (11/18/88)

[]
I don't think its very mysterious why the RT performance is so bad. Aside from
the dark rumours that the architecture was deliberately crippled so as not to
impact existing product lines, it (the implementation, NOT the architecture)
was not whizzy because of engineering tradeoffs. They stuck to a microcomputer
bus, they used a process technology that was barely state-of-the-art (or
wasn't by the time they finished, anyway), did not have a cache. It only ran
at 6MHz! Why bother with a cache! They made lots of tradeoffs for backwards
compatibility with existing systems.

However, don't believe that a flawed implementation means the architecture is
flawed. It may be flawed, but I've seen no convincing arguments of in this 
newsgroup. A 33Mhz version of the chip, with cache, should scream.
Just like a 33Mhz version of an R3xxx, or a 29000, or an 88xxx ..... 
I see very little in the architecture that would make, say, a 10% difference in
performance, which is equivalent to a four month lead time.

--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

andrewt@basser.oz (Andrew Taylor) (11/18/88)

>From: jk3k+@andrew.cmu.edu (Joe Keane)
> I get the impression that they were paranoid about code space, giving a
> too-complicated instruction set.  To wit:
> [...]
> *  The `short' instructions.  No rhyme or reason other than that they save 2
> bytes and do common things.

Unlike the 801 the RT has no instruction cache (an economic decision).
The designers introduced 2 byte instructions in an attempt
to compensate for the lack of cache. Unfortunately this required reducing
the number of registers from 32 to 16. I vaguely recall hearing that the
no cache decision was later regretted.

The first RT processor because of a problem with memory management exceptions
did not allow the execution of the load/store instructions to be
overlapped with subsequent instructions. As load/store instructions
take 5-6 cycles this was a significant handicap.

The 2nd RT processor (sometimes called the RT/APC) allowed load/store
instructions to be overlapped. This plus the clock-speed being almost
doubled made it much faster. The only benchmark I've seen puts it
between a SUN 3/50 and a SUN 3/60.

More RT models came out this year. I know nothing about them.

Andrew

pcg@aber-cs.UUCP (Piercarlo Grandi) (11/18/88)

In article <8234@obiwan.mips.COM> mark@mips.COM (Mark G. Johnson) writes:

    In article <5046@polya.Stanford.EDU> rathmann@polya.Stanford.EDU (Peter K. Rathmann) writes:

     >I got the impression from looking at some informal benchmarks that the
     >IBM PC/RT is slower than a good 386 box, and not even in the same
     >league as leading risc processors like SPARC/MIPS/88000.
     >
     >Can anyone explain why?  It is billed as a risc and is not saddled
     >with compatibility with earlier architectures, so why shouldn't it be
     >a performance leader?
     >
    
    To avoid competing with the soon to be announced, faster-n-stink IBM
    "Americas" risc processor?

Actually I think simply because IBM is notoriously slow in the lab-to-market
cycle; nobody has ever accused IBM of peddling the ltest gizmo. They do
it cautiously and play on the safe side; look at the 386 marketplace.

It has never been IBM's policy to win sales on performance alone; of course
smaller fry like MIPS or Motorola or AMD or Acorn think that their edge
is offering the latest and (by consequence) fastest gizmo to a small enough
niche that does not strain their capabilities.

NOTE on niche markets:
    of course MIPS/Acorn cannot aspire (yet) to even think to non niche
    markets, but Motorola/AMD ? Well, compared to IBM... Their Chairman
    once said that for a recent fiscal year FIFTY PERCENT of all capital
    goods investments by USA corporations went into 3090s, and even IBM's
    resources were being strained a bit by that...
-- 
Piercarlo "Peter" Grandi			INET: pcg@cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science	UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BX (UK)

schwartz@shire.cs.psu.edu (Scott Schwartz) (11/19/88)

In article <2921@ima.ima.isc.com>, johnl@ima (John R. Levine) writes:
> Also they decided to design the CPU to work without a
>cache, basically because they guessed wrong about where memory technology was
>going.

I'm told that the new ones (615x) use 3090 technology memory.  I really hope
it's helping :-)
-- 
Scott Schwartz		<schwartz@shire.cs.psu.edu>

njs@scifi.scifi.UUCP (Nicholas J. Simicich;?) (11/19/88)

In article <20791@apple.Apple.COM> baum@Apple.COM (Allen J. Baum) writes:
   .........
>  I don't think its very mysterious why the RT performance is so bad. Aside from
>  the dark rumours that the architecture was deliberately crippled so as not to
>  impact existing product lines, it (the implementation, NOT the architecture)
>  was not whizzy because of engineering tradeoffs. They stuck to a microcomputer
>  bus, they used a process technology that was barely state-of-the-art (or
>  wasn't by the time they finished, anyway), did not have a cache. It only ran
>  at 6MHz! Why bother with a cache! They made lots of tradeoffs for backwards
>  compatibility with existing systems.

The bus for adapter cards does, in fact, run at 6 MZ, and is similar
to an AT bus.  But the above quoted article implies that this has
something to do with the speed of the processor.  This is just not
true.  The processor/memory/floating point bus is totally separate,
and it has its own clock.

The 125 RT executes programs at about twice the speed of the original
RT.  The current model, the 135, is 25%-30% faster than that, far as I
can tell.  The bus is still 6MZ.

I won't comment on rumors unless you buy me a drink at Usenix :-) and
even then I might not.  

  ......
>  --
>  		  baum@apple.com		(408)974-3385
>  {decwrl,hplabs}!amdahl!apple!baum

My personal comments: Today's RT is a lot faster than yesterday's RT.
If you are interested, there is a salesman near you.
--
Nick Simicich --- uunet!bywater!scifi!njs --- njs@ibm.com (Internet)

schwartz@shire.cs.psu.edu (Scott Schwartz) (11/19/88)

In article <1620@basser.oz>, andrewt@basser (Andrew Taylor) writes:
>More RT models came out this year. I know nothing about them.

For the most part, they are faster than a Sun3/160 but slower
than a Sun4/260.  Drystones available upon request.
-- 
Scott Schwartz		<schwartz@shire.cs.psu.edu>

bader+@andrew.cmu.edu (Miles Bader) (11/20/88)

ok@quintus.uucp (Richard A. O'Keefe) writes:
> I doubt whether any of the points Joe Keane listed really explains why
> the old RTs were roughly halfway between a Sun-2 and a Sun-3/50 on a
> logarithmic scale of speed.  There are supposed to be newer versions
> that are rather faster.  My guess is that it was a marketing decision:
> IBM decided to scale the thing to what they thought people would buy
> and guessed wrong.

Among other things, I have been told that the old rts DISABLED the
instruction pipeline when in "virtual memory mode", due to problems
with recovering from page faults.  Newer models apparently can use
the pipeline.

-Miles

charette@edsews.EDS.COM (Mark A. Charette) (11/20/88)

>    .........
> I don't think its very mysterious why the RT performance is so bad. Aside from
> the dark rumours that the architecture was deliberately crippled so as not to
> impact existing product lines, it (the implementation, NOT the architecture)
> was not whizzy because of engineering tradeoffs. They stuck to a microcomputer
> bus, they used a process technology that was barely state-of-the-art (or
> wasn't by the time they finished, anyway), did not have a cache. It only ran
> at 6MHz! Why bother with a cache! They made lots of tradeoffs for backwards
> compatibility with existing systems.

The later RTs (125,135) are reasonable, middle of the road performers when
you look at the CPU speed. However, you might just want to look at the disk
I/O performance. I wish our little Suns ran that fast.

Also, IBM won't stand still in this marketplace. Once IBM finds out REAL
money can be made in the workstation market, they'll be here with a
reliable, almost whizbang machine. Considering their quality control, I
think they could come up with a real winner that doesn't come DOA or have
random errors caused by the phase of the moon.


-- 
Mark Charette             "People only like me when I'm dumb!", he said. 
Electronic Data Systems   "I like you a lot." was the reply.
750 Tower Drive           Voice: (313)265-7006        FAX: (313)265-5770
Troy, MI 48007-7019       charette@edsews.eds.com     uunet!edsews!charette 

mac3n@babbage.acc.virginia.edu (Alex Colvin) (11/21/88)

Can old (slow) RTs be cheaply upgraded to new (fast) RTs?
Is it just a CPU/cash redesign or does the bus change too?

mcdonald@uxe.cso.uiuc.edu (11/21/88)

>Also, IBM won't stand still in this marketplace. Once IBM finds out REAL
>money can be made in the workstation market, they'll be here with a
>reliable, almost whizbang machine. Considering their quality control, I
>think they could come up with a real winner that doesn't come DOA or have
>random errors caused by the phase of the moon.

IF they could get some sort of good quality control they MIGHT
be able to get some sort of market. Note that , excluding 3090 class
machines, their market is slipping very badly. Perhaps that means
that they only care about 3090's, which they understand. But perhaps
it reflects the concern we have here about them : every machine
I know about here from them (including RT's) has been DOA or
died within a week of arrival. My machine died once the first day,
once the third day and once the eighth. And, every machine has
design (concept) flaws - especially the original RT!

And, as to long term quality control - every single PC-AT disk we have
has died! 100% failure!! (one lasted until last month).

sauer@auschs.UUCP (Charlie Sauer) (11/22/88)

In article <415@babbage.acc.virginia.edu>, mac3n@babbage.acc.virginia.edu (Alex Colvin) writes:
> Can old (slow) RTs be cheaply upgraded to new (fast) RTs?
> Is it just a CPU/cash redesign or does the bus change too?

There is an upgrade kit for upgrading the original models to 115/125's.  It 
includes the 10 MHz APC processor card w/20 MHz 68881, 4MB of memory and the 
DMA (buffered) ESDI disk controller.  (Note that original models 10 and 20 came
with ST506 drives, so the ESDI disk controller will not work with those drives.
Original models 15 and 25 came with ESDI drives.)  It can be ordered as 
accessory part number 61X6833.  Last I knew, the list price was $2495.
-- 
Charlie Sauer   IBM AES/ESD, D75/802     uucp: cs.utexas.edu!ibmaus!sauer
                11400 Burnet Road         822: @CS.UTEXAS.EDU:sauer@ibmaus.uucp
                Austin, Texas 78758    aesnet: sauer@auschs  
                (512) 823-3692           vnet: SAUER at AUSVM6

csimmons@hqpyr1.oracle.UUCP (Charles Simmons) (11/23/88)

In article <46500032@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
<>Also, IBM won't stand still in this marketplace. Once IBM finds out REAL
<>money can be made in the workstation market, they'll be here with a
<>reliable, almost whizbang machine. Considering their quality control, I
<>think they could come up with a real winner that doesn't come DOA or have
<>random errors caused by the phase of the moon.
<
<IF they could get some sort of good quality control they MIGHT
<be able to get some sort of market. Note that , excluding 3090 class
<machines, their market is slipping very badly. Perhaps that means

I don't think we should exclude 3090 class machines.  Amdahl has
made significant inroads into this marketplace with their 580 series.

-- Chuck

(Correct me if I'm wrong, Mike.)