[comp.arch] 360 vs VAX stuff

johnw@astroatc.UUCP (John F. Wardale) (09/03/87)

I while ago I made the following claims about pipelining in regards to
360's and VAXen, which generated some warm replies.  I should have
answered them sooner, but the best laid plans of mice and men....

>From: petolino%joe@Sun.COM (Joe Petolino)
me> The 8600 overlaps operand-decode with operand-fetch, and uses
me> multiple functional (execution) units, but **UNLIKE** IBM and any
me> other true pipe-line design, can *NOT* have multiple instructions
me> in the decode phase simultaniously!  
> 
> This is certainly a novel criterion for calling a design 'pipelined'!
> All of the CPU designs I know of (this includes machines by IBM, Amdahl,
> MIPS, and Sun) have at most one instruction in each pipeline stage at any one
Very true, but a phase is not a state [see below]

> second-guessing deleted...also see below

>From: bcase@apple.UUCP (Brian Case)
> In article .... [I wrote]
me> THE *MAJOR*
me> reason the ancint 360/370 stuff is still alive, while DEC's vaxen
me> are falling by the wayside (despite DEC's best efforts) is that
me> 360's *CAN* be pipelined (tho not necessarily real easily) and
me> VAXen can't!  
> 
> I beg your pardon, but your statement is quite a bit stronger than reality
> will permit.  I, for one, believe that the high-end VAXs are quite pipelined.
Mmmm...Not really.

me> The 1st byte of each 370 instruction tells the length of the instruction!
> 
> You have pin-pointed one of the VAX's problems.  This does not prevent,
> absolutely, pipelining.
Ok, but it ties the designers hands and one foot behind his back!!!

----------------------------
First I will admit that my phrasing WAS not great...(That's why I'm trying
to clearity...

By "decode phase" I'm refering to anything *BEFORE* the instruction is
issued.  By *MY* definition (which could be totally off the wall) a
machine that can only work on cracking (opcode-decode, operand-decode(VM?),
operand-fetch, hazard-checking, etc.) is NOT pipelined.  
  [if you declare one "cracking" and two being "executed" as "pipelined"
   that's OK, but I'm talking about *REAL* assmbly-line pipelining ]

----------------------------

>From: guy%gorodish@Sun.COM (Guy Harris) asks:
> OK, so if you implement a VAX using the same technology as a top-of-the line
> IBM mainframe, how fast would it be?
> [ top of the line speeds ]  do not *in and of itself* indicate that
> this is due solely to architectural problems with the VAX.

Theoreticly true, but in this case, I think it is.

I believe the encoding of VAX instructions prevents one from 
making it go fast, while still being affordable.  (Its a
point-of-diminishing returns question.)  Comments...Anyone think it'd be
(economically) worth building a VAX 3X or 10X the current top-vax?
Or has it [as I feel it has] reached the limit for current technology.
Have 360-type machine speeds been improving with, or faster than
technology-speeds? (sorry if that's too vague)
What is the rate or rates of top-360 speed increase, and how does this
track technology speeds??
s  |
e  |       ==/
e  |    ==/    ==/
d  | ==/    ==/				upper line is tecnology limit
   |      =/				lower line is 360 limit
   |     /
   |    /			question:  is the 360 in the /, the =/ or
   |   /				the ==/ part of its line.  (i.e.
   |  /					I don't know where dates are on
    -------------------			the time line.)
           time

crt grafix suck....lets all buy Mac-II's   (semi :-)

-- 
					John Wardale
... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw

To err is human, to really foul up world news requires the net!

johnw@astroatc.UUCP (John F. Wardale) (09/03/87)

Correction...The vertical axis is "speed" not "seed" ... Sorry
-- 
					John Wardale
... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw

To err is human, to really foul up world news requires the net!

rw@beatnix.UUCP (Russell Williams) (09/03/87)

In article <430@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
>I believe the encoding of VAX instructions prevents one from 
>making it go fast, while still being affordable.  (Its a
>point-of-diminishing returns question.)  Comments...Anyone think it'd be
>(economically) worth building a VAX 3X or 10X the current top-vax?
>Or has it [as I feel it has] reached the limit for current technology.

   DEC doesn't think so -- they're building one, or so I hear.  What the cost 
will be is open to question, but I understand their margins on existing machines
are quite high and going up all the time.  Maybe it won't be so high on the
next one, but when you sell that many it doesn't matter as much.  People in 
DEC's or IBM's position don't have to have the most cost-effective architecture 
by a long shot.  Once you're firmly established, the disadvantages of your 
architecture become less important because you have the advantages of:
1. More R&D$ to optimize your implementations -- how many companies can afford
   to invent TCMs?
2. Volume manufacturing -- you can buy in big quantities or mfr. subsystems
   and/or chips in-house, and you can buy expensive mfg. equipment to drive
   down costs.
3. The "nobody ever got fired for buying IBM (DEC) effect".  In most market
   segments, you must have a substantial price/performance advantage over
   the leader to get customers, and even then there are many who will stick
   with the industry leaders at almost any price. The best example of this I 
   can think of is Amdahl.  Even though their machines are compatible, highly 
   reliable, etc.  IBM still sells many times more machines with inferior
   price/performance (of course I realize there are other reasons too, but
   that's a major one).  

   I won't deny that the VAX architecture is a handicap in building fast
machines, but DEC has the market lead and resources to pursue any of several 
options over the next few years.  I don't know if they can build complex VAXen 
to compete with more easily scalable architectures indefinitely, but they can 
afford to eat some loss in margin until they figure out what to do.  

Russell Williams
..{ucbvax!sun,lll-lcc!lll-tis,altos86}!elxsi!rw

bjj@psueclb.BITNET (09/12/87)

This is in response to the claim that a VAX is not readily pipelined,
or at least that there are limits to the pipelining.  The same problem
didn't seem to stop the designers of the Harris HCX7.

From: johnw@astroatc.UUCP (John F. Wardale)
>>> The 1st byte of each 370 instruction tells the length of the instruction!

>> You have pin-pointed one of the VAX's problems.  This does not prevent,
>> absolutely, pipelining.

> Ok, but it ties the designers hands and one foot behind his back!!!

The Harris HCX7 is roughly similar to the VAX 8700.  You have to look real
close at the instruction set to see the differences between the VAX and
HCX instruction sets.  Like the VAX, the HCX opcode does not indicate
the instruction length, rather each operand encodes its own length.

The HCX7 seems to have solved the instruction decode problem with
a separate instruction cache.  The cache contains DECODED instructions
stored as fixed length 73 bit words.

Assuming a reasonable hit ratio in the 4K word instruction cache,
the extra complexity of instruction decode won't affect speed.

The HCX7 has a 100nS clock cycle.  Most instructions execute in 1 cycle.
Harris describes a 3 level pipeline which simultaneously processes:
        1)  Instruction Fetch
        2)  Address Calculation
        3)  Instruction Execution

> I believe the encoding of VAX instructions prevents one from
> making it go fast, while still being affordable.  (Its a
> point-of-diminishing returns question.)  Comments...Anyone think it'd be
> (economically) worth building a VAX 3X or 10X the current top-vax?
> Or has it [as I feel it has] reached the limit for current technology.

I shouldn't think a cache of decoded instructions would be unaffordable,
you have to cache them somewhere anyway.

Some of us have long been amazed at what others will pay to buy IBM.
When DEC starts charging for a VAX CPU what IBM charges for a 3090,
then it will be easier compare affordability.

guy@sun.uucp (Guy Harris) (09/13/87)

> This is in response to the claim that a VAX is not readily pipelined,
> or at least that there are limits to the pipelining.  The same problem
> didn't seem to stop the designers of the Harris HCX7.

Credit where credit is due department:  the HCX-7 either is an OEM'ed CCI
Power 6/32, or is derived from it; the decoded instruction cache, etc. came
from the 6/32.  (The 6/32 instruction set is, indeed, similar to the VAXes,
although the addressing modes are different.  The 6/32 is big-endian (for
compatibility with CCI's smaller 68K-family machines); some 6/32 instructions
that are just like the VAX ones have the VAX instructions' opcodes, except
that the nibbles are swapped....)
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

ehj@mordor.s1.gov (Eric H Jensen) (09/14/87)

In article <818@PSUECLB> bjj@psueclb.BITNET writes:
>The HCX7 seems to have solved the instruction decode problem with
>a separate instruction cache.  The cache contains DECODED instructions
>stored as fixed length 73 bit words.
>
>Assuming a reasonable hit ratio in the 4K word instruction cache,
>the extra complexity of instruction decode won't affect speed.

This last statement is not strictly true - refill time can be
adversely affected in the following ways:

1) In most cases there is at least one additional pipe stage to do the
decode.  If there isn't it would seem to me that there are other
problems/limitations with the arch/design.

2) Games that can be played with loading > 1 32-bit (pick your
favorite RISC) instruction simultaneously into an icache line are
2-4(*) times as 'expensive' to apply to a pre-decoded icache - 2-4x
the amount of decode logic, 2-4x the number of fast rams, 2-4x the
number of muxes (multi-set icache), etc etc.  64 bits (2*32) is a lot
more attractive than 146 bits (2*73).

3) From discussions with other designers it appears evident that
program loops can not be relied on to hide non-aggressive icache
design.  I would consider the above approach to be complex but not
aggressive.  The complexity is applied to overcome instruction set
limitations and uses up much of the design space that could be used
for an aggressive icache design.

(*) simultaneously loading 2-4 32-bit instructions seems within reason
with the ECL technologies I design with.